Y. Gong, Y. Chung, and J. Glass. (2021)cite arxiv:2102.01243Comment: Published in IEEE/ACM Transactions on Audio Speech and Language Processing. Code at https://github.com/YuanGongND/psla.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein and 2 other author(s). (2014)cite arxiv:1409.0575Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional references.