Abstract
Automatic Speech Recognition (ASR) has historically
been a driving force behind many machine learning (ML)
techniques, including the ubiquitously used hidden
Markov model, discriminative learning, structured
sequence learning, Bayesian learning, and adaptive
learning. Moreover, ML can and occasionally does use
ASR as a large-scale, realistic application to
rigorously test the effectiveness of a given technique,
and to inspire new problems arising from the inherently
sequential and dynamic nature of speech. On the other
hand, even though ASR is available commercially for
some applications, it is largely an unsolved problem -
for almost all applications, the performance of ASR is
not on par with human performance. New insight from
modern ML methodology shows great promise to advance
the state-of-the-art in ASR technology. This overview
article provides readers with an overview of modern ML
techniques as utilized in the current and as relevant
to future ASR research and systems. The intent is to
foster further cross-pollination between the ML and ASR
communities than has occurred in the past. The article
is organized according to the major ML paradigms that
are either popular already or have potential for making
significant contributions to ASR technology. The
paradigms presented and elaborated in this overview
include: generative and discriminative learning;
supervised, unsupervised, semi-supervised, and active
learning; adaptive and multi-task learning; and
Bayesian learning. These learning paradigms are
motivated and discussed in the context of ASR
technology and applications. We finally present and
analyze recent developments of deep learning and
learning with sparse representations, focusing on their
direct relevance to advancing ASR technology.
Links and resources
Tags
community