Today, speech technology is only available for a small fraction of the thousands of languages spoken around the world because traditional systems need to be trained on large amounts of annotated speech audio with transcriptions. Obtaining that kind of data for every human language and dialect is almost impossible.
Wav2vec works around this limitation by requiring little to no transcribed data. The model uses self-supervision to push the boundaries by learning from unlabeled training data. This enables speech recognition systems for many more languages and dialects, such as Kyrgyz and Swahili, which don’t have a lot of transcribed speech audio. Self-supervision is the key to leveraging unannotated data and building better systems.
A. Razavi, S. Matwin, D. Inkpen, and A. Kouznetsov. ICDMW '09: Proceedings of the 2009 IEEE International Conference on Data Mining Workshops, page 471--476. Washington, DC, USA, IEEE Computer Society, (2009)
M. Richardson, A. Prakash, and E. Brill. Proceedings of the 15th international conference on World Wide Web, page 707--715. Edinburgh, Scotland, ACM Press, (May 2006)
S. Riedel, L. Yao, and A. McCallum. Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III, page 148--163. Berlin, Heidelberg, Springer-Verlag, (2010)
U. Riegel. Reli – keine Lust und keine Ahnung?, volume 35 of Jahrbuch der Religionspädagogik (JRP), Vandenhoeck & Ruprecht, Göttingen, https://doi.org/10.13109/9783666720055.76. (ALLBUS).(2019)
K. Rosa, and J. Ellen. Proceedings of the 2009 International Conference on Machine Learning and Applications, page 710--714. Washington, DC, USA, IEEE Computer Society, (2009)
C. Rose, A. Roque, D. Bhembe, and K. VanLehn. Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing - Volume 2, page 68--75. Stroudsburg, PA, USA, Association for Computational Linguistics, (2003)
D. Rusu, B. Fortuna, and D. Mladenic. 4th Linked Data on the Web Workshop (LDOW 2011), 20th World Wide Web Conference (WWW 2011)., Hyderabad, India, (2011)
J. Sahni. 13th International Quality Festival, May 29th - June 1st, Kragujevac, Serbia, page 815-824. Kragujevac, (2019)http://www.cqm.rs/2019/papers_iqc/86.pdf. (ALLBUS).
M. Sanderson, and W. Croft. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'99, page 206--213. (1999)
P. Sangameshwar, and G. Palshikar. Natural Language Processing and Information Systems, 18th International Conference on Applications of Natural Language to Information Systems, volume 7934 of Lecture Notes in Computer Science, page 417-420. Springer Berlin Heidelberg, (2013)
P. Sangameshwar, and G. Palshikar. Natural Language Processing and Information Systems, 18th International Conference on Applications of Natural Language to Information Systems, volume 7934 of Lecture Notes in Computer Science, page 417-420. Springer Berlin Heidelberg, (2013)
P. Schonhofen. WI '06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, page 456--462. Washington, DC, USA, IEEE Computer Society, (2006)