J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, und P. Abbeel. (2017)cite arxiv:1703.06907Comment: 8 pages, 7 figures. Submitted to 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017).
S. Merity. (2019)cite arxiv:1911.11423Comment: Addition of citations and contextual results (no attention head, single attention head, attention per layer), removal of wordpiece WikiText-103 numbers due to normalization issues, fix of SHA attention figure Q arrow, other minor fixes.