Today, speech technology is only available for a small fraction of the thousands of languages spoken around the world because traditional systems need to be trained on large amounts of annotated speech audio with transcriptions. Obtaining that kind of data for every human language and dialect is almost impossible.
Wav2vec works around this limitation by requiring little to no transcribed data. The model uses self-supervision to push the boundaries by learning from unlabeled training data. This enables speech recognition systems for many more languages and dialects, such as Kyrgyz and Swahili, which don’t have a lot of transcribed speech audio. Self-supervision is the key to leveraging unannotated data and building better systems.
This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive resources to facilitate discoveries when wrangling, visualising, and exploring data.
El Objetivo del paquete aprendeR es facilitar que nuevas personas puedan R moderno con una curva de aprendizaje baja, y evitando que el inglés sea una barrera para que se puedan centrar en el aprendizaje competencial de R. Se incluyen traducciones al castellano de tutoriales presentes en otros paquetes (learnr, tutorial.helpers, r4ds.tutorials, ...), y eventualmente nuevos tutoriales más adelante.