Today, speech technology is only available for a small fraction of the thousands of languages spoken around the world because traditional systems need to be trained on large amounts of annotated speech audio with transcriptions. Obtaining that kind of data for every human language and dialect is almost impossible.
Wav2vec works around this limitation by requiring little to no transcribed data. The model uses self-supervision to push the boundaries by learning from unlabeled training data. This enables speech recognition systems for many more languages and dialects, such as Kyrgyz and Swahili, which don’t have a lot of transcribed speech audio. Self-supervision is the key to leveraging unannotated data and building better systems.
Join Technovation Girls and learn how to use technology like mobile apps and AI to solve a community problem YOU care about. You'll work as part of a team of girls like you and get support from a mentor who will help keep you motivated and on track.
Live captions and note-taking for any online or in-person dialogue. For use in education and business settings, Caption.Ed makes media more accessible.
T. Daradoumis, R. Bassi, F. Xhafa, и S. Caballé. P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2013 Eighth International Conference on, стр. 208-213. IEEE, (2013)
Y. Feinstein, C. Goldman, Y. Mor, и J. Rosenschein. Proceedings of the Practical Application of Knowledge Discovery and Data Mining (PADD97), стр. 125--136. Springer, (1997)
C. Goldman, Y. Mor, и J. Rosenschein. Proceedings of the First International Conference on Practical Applications of Intelligent Agents and Multi-Agents Technology (PAAM96), стр. 837-842. London, UK, (1996)
Y. Mor, и J. Rosenschein. Proceedings of the First International Conference on Multiagent Systems (ICMAS95), стр. 276-282. Menlo park, California, AAAI Press / MIT Press, (1995)
C. Nehaniv. Narrative Intelligence: Papers from the 1999 AAAI Fall Symposium, (5-7 November 1999 - North Falmouth, Massachusetts), стр. 101-104. AAAI Press, Technical Report FS-99-01, (1999)
A. Newell, и H. Simon. Communications of the ACM, 19 (3):
113-126(марта 1976)p. 116:
"The Physical Symbol System Hypothesis. A physical
symbol system has the necessary and sufficient
means for general intelligent action."
p. 120:
"Heuristic Search Hypothesis. The solutions to
problems are represented as symbol structures.
A physical symbol system exercises its intelligence
in problem solving by search--that is, by
generating and progressively modifying symbol
structures until it produces a solution structure."
p. 121:
"To state a problem is to designate (1) a test
for a class of symbol structures (solutions of the
problem), and (2) a generator of symbol structures
(potential solutions). To solve a problem is
to generate a structure, using (2), that satisfies
the test of (1).".
A. Paiva, I. Machado, и R. Prada. IUI '01: Proceedings of the 6th international conference on Intelligent user interfaces, стр. 129--136. New York, NY, USA, ACM Press, (2001)
J. Sarmiento, S. Trausan-Matu, и G. Stahl. presented at the International Symposium on Organizational Learning and Knowledge Work Management (OL-KWM 2005), стр. 88-99. Bucharest, Romania, (2005)
R. Schank, и R. Abelson. Thinking: Readings in Cognitive Science, Proceedings of the Fourth International Joint Conference on Artificial Intelligence, стр. 151-157. Tbilisi, USSR, (1975)
H. Simon. The MIT Press, Cambridge, MA, (октября 1996)Äs soon as we introduce “synthesis” as well as “artifice,” we enter the realm of engineering. For “synthetic” is often used in the broader sense of “designed” or “composed.” We speak of engineering as concerned with “synthesis,” while science is concerne.
A. Stern, и M. Mateas. the International DiGRA Conference, June 16th - 20th, 2005, Vancouver, British Columbia, Canada (http://www.gamesconference.org/digra2005/overview.php), (2005)