Article,

Why self-attention? a targeted evaluation of neural machine translation architectures

G. Tang, M. Müller, A. Rios, and R. Sennrich.
arXiv preprint arXiv:1808.08946, (2018)

Meta data

BibTeX key: tang2018self
entry type: article
year: 2018
journal: arXiv preprint arXiv:1808.08946

Tags

Users

Comments and Reviewsshow / hide

@s363405 4 years ago
The authors where able to show that the amount of heads used during multihead attention impacts the ability to model long-range dependencies.
References
Bookmarks
deleting comment

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

search on