Article,

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

S. Davis, and P. Mermelstein.
IEEE Transactions on Acoustics, Speech and Signal Processing, 28 (4): 357-366 (August 1980)
DOI: 10.1109/TASSP.1980.1163420

Abstract

Several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary included many phonetically similar monosyllabic words, therefore the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations. For each parameter set (based on a mel-frequency cepstrum, a linear frequency cepstrum, a linear prediction cepstrum, a linear prediction spectrum, or a set of reflection coefficients), word templates were generated using an efficient dynamic warping method, and test data were time registered with the templates. A set of ten mel-frequency cepstrum coefficients computed every 6.4 ms resulted in the best performance, namely 96.5 percent and 95.0 percent recognition with each of two speakers. The superior performance of the mel-frequency cepstrum coefficients may be attributed to the fact that they better represent the perceptually relevant aspects of the short-term speech spectrum.

BibTeX key: Davis1980
entry type: article
year: 1980
month: aug
journal: IEEE Transactions on Acoustics, Speech and Signal Processing
number: 4
pages: 357-366
volume: 28
owner: schabus
file: :pdfs/davis_transassp_1980.pdf:PDF
issn: 0096-3518
DOI: 10.1109/TASSP.1980.1163420

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{Davis1980, abstract = {Several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary included many phonetically similar monosyllabic words, therefore the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations. For each parameter set (based on a mel-frequency cepstrum, a linear frequency cepstrum, a linear prediction cepstrum, a linear prediction spectrum, or a set of reflection coefficients), word templates were generated using an efficient dynamic warping method, and test data were time registered with the templates. A set of ten mel-frequency cepstrum coefficients computed every 6.4 ms resulted in the best performance, namely 96.5 percent and 95.0 percent recognition with each of two speakers. The superior performance of the mel-frequency cepstrum coefficients may be attributed to the fact that they better represent the perceptually relevant aspects of the short-term speech spectrum.}, added-at = {2021-02-01T10:51:23.000+0100}, author = {Davis, Steven B. and Mermelstein, Paul}, biburl = {https://www.bibsonomy.org/bibtex/23cb6af420973ee17c44b2bd26d8c6717/m-toman}, doi = {10.1109/TASSP.1980.1163420}, file = {:pdfs/davis_transassp_1980.pdf:PDF}, interhash = {c5c740c692ec53bd45b8bbd882ea67e4}, intrahash = {3cb6af420973ee17c44b2bd26d8c6717}, issn = {0096-3518}, journal = {IEEE Transactions on Acoustics, Speech and Signal Processing}, keywords = {Acoustic analysis;Speech filters;Cepstrum;Filtering;Laboratories;Loudspeakers;Nonlinear filters;Speech measurements;Acoustic pass recognition testing;Band}, month = aug, number = 4, owner = {schabus}, pages = {357-366}, timestamp = {2021-02-01T10:51:23.000+0100}, title = {Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences}, volume = 28, year = 1980 }

BibSonomy

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on