Article,

Why self-attention? a targeted evaluation of neural machine translation architectures

, , , and .
arXiv preprint arXiv:1808.08946, (2018)

Meta data

Tags

Users

  • @habereder
  • @s363405

Comments and Reviewsshow / hide

  • @s363405
    @s363405 4 years ago
    The authors where able to show that the amount of heads used during multihead attention impacts the ability to model long-range dependencies.
Please log in to take part in the discussion (add own reviews or comments).