Attention is all you need

Authors Details :
Ashish Vaswani,
Noam Shazeer,
Niki Parmar,
Jakob Uszkoreit,
Llion Jones,
Aidan N. Gomez,
Illia Polosukhin,
Lukasz Kaiser

2.1K Views Research reports

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best-performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 Englishto-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature.

Article DOI & Crossmark Data

DOI : https://doi.org/10.48550/arXiv.1706.03762

Article Subject Details

Article Keywords Details

Attention Is All You Need

Article File

Full Text PDF

Attention is all you need

Article DOI & Crossmark Data

Article Subject Details

Article Keywords Details

Article File

More Article by Kamal Singh

More Information technology Articles

Blue brain technology

Привлечение прямых иностранных инвестиций – гарантия внедрения новой технологии в рыночную экономику

A comparative study of social and economic aspect of migration

A comparative study of social and economic aspect of migration

Intersection of caste and gender based subjugation

Metapuf: a challenge response pair generator

Study of temperature variation in human peripheral region during wound healing process due to plastic surgery

Intersection of caste and gender based subjugation

Speech stress analysis based on lie detector for loyalty test

Securing video files using steganography method in android mobile