Article,

Recognizing GSM digital speech

A. Gallardo-Antolin, C. Pelaez-Moreno, and F. de Maria.
Speech and Audio Processing, IEEE Transactions on, 13 (6): 1186--1205 (November 2005)
DOI: 10.1109/TSA.2005.853210

Abstract

The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source coding distortion and transmission errors. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bitstream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant advantages. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion as a result of the encoding-decoding process. Second, when transmission errors occur, our front-end becomes more effective since it is not affected by errors in bits allocated to the excitation signal. We have considered the half and the full-rate standard codecs and compared the proposed front-end with the conventional approach in two ASR tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated channel conditions. Furthermore, the disparity increases as the network conditions worsen.

BibTeX key: gallardo-recognizing-gsm-speech-2005
entry type: article
year: 2005
month: nov
journal: Speech and Audio Processing, IEEE Transactions on
number: 6
pages: 1186--1205
volume: 13
issn: 1063-6676
DOI: 10.1109/TSA.2005.853210
url: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1518918

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 gallardo-recognizing-gsm-speech-2005 %A Gallardo-Antolin, A. %A Pelaez-Moreno, C. %A de Maria, F. Diaz %D 2005 %J Speech and Audio Processing, IEEE Transactions on %K speech_recognition %N 6 %P 1186--1205 %R 10.1109/TSA.2005.853210 %T Recognizing GSM digital speech %U http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1518918 %V 13 %X The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source coding distortion and transmission errors. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bitstream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant advantages. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion as a result of the encoding-decoding process. Second, when transmission errors occur, our front-end becomes more effective since it is not affected by errors in bits allocated to the excitation signal. We have considered the half and the full-rate standard codecs and compared the proposed front-end with the conventional approach in two ASR tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated channel conditions. Furthermore, the disparity increases as the network conditions worsen.

@article{gallardo-recognizing-gsm-speech-2005, abstract = {The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source coding distortion and transmission errors. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bitstream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant advantages. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion as a result of the encoding-decoding process. Second, when transmission errors occur, our front-end becomes more effective since it is not affected by errors in bits allocated to the excitation signal. We have considered the half and the full-rate standard codecs and compared the proposed front-end with the conventional approach in two ASR tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated channel conditions. Furthermore, the disparity increases as the network conditions worsen.}, added-at = {2016-07-12T19:24:18.000+0200}, author = {Gallardo-Antolin, A. and Pelaez-Moreno, C. and de Maria, F. Diaz}, biburl = {https://www.bibsonomy.org/bibtex/2ecf22edc0c6485137eabd686d67dd98a/mhwombat}, doi = {10.1109/TSA.2005.853210}, interhash = {0f2036acaa7828815ba916daf9ab7cf2}, intrahash = {ecf22edc0c6485137eabd686d67dd98a}, issn = {1063-6676}, journal = {Speech and Audio Processing, IEEE Transactions on}, keywords = {speech_recognition}, month = nov, number = 6, pages = {1186--1205}, timestamp = {2016-07-12T19:25:30.000+0200}, title = {Recognizing {GSM} digital speech}, url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1518918}, volume = 13, year = 2005 }

BibSonomy

Recognizing GSM digital speech

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on