作者:
A, s, h 等49人
分类:
c, s, ., C, L, ,, , c, s, ., L, G
📝 论文摘要
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.