Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

Y. Chai, S. Jin, и X. Hou.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, стр. 6887--6900. Online, Association for Computational Linguistics, (июля 2020)

Аннотация

Self-attention mechanisms have made striking state-of-the-art (SOTA) progress in various sequence learning tasks, standing on the multi-headed dot product attention by attending to all the global contexts at different locations. Through a pseudo information highway, we introduce a gated component self-dependency units (SDU) that incorporates LSTM-styled gating units to replenish internal semantic importance within the multi-dimensional latent space of individual representations. The subsidiary content-based SDU gates allow for the information flow of modulated latent embeddings through skipped connections, leading to a clear margin of convergence speed with gradient descent algorithms. We may unveil the role of gating mechanism to aid in the context-based Transformer modules, with hypothesizing that SDU gates, especially on shallow layers, could push it faster to step towards suboptimal points during the optimization process.

ключ BibTeX: chai-etal-2020-highway
тип записи: inproceedings
адрес: Online
название книги: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
год: 2020
месяц: jul
страницы: 6887--6900
издательство: Association for Computational Linguistics
url: https://www.aclweb.org/anthology/2020.acl-main.616

тэги

Пользователи данного ресурса

Комментарии и рецензиипоказать / перейти в невидимый режим

Пожалуйста, войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)

BibSonomy