PHONOLOGICAL FEATURE BASED VARIABLE FRAME RATE SCHEME FOR IMPROVED SPEECH RECOGNITION
A. Sangwan, and J. Hansen. Automatic Speech Recognition and Understanding (ASRU), page 582-586. (December 2007)
Abstract
In this paper, we propose a new scheme for variable frame
rate (VFR) feature processing based on high level segmentation
(HLS) of speech into broad phone classes. Traditional
fixed-rate processing is not capable of accurately reflecting
the dynamics of continuous speech. On the other hand, the
proposed VFR scheme adapts the temporal representation of
the speech signal by tying the framing strategy with the detected
phone class sequence. The phone classes are detected
and segmented by using appropriately trained phonological
features (PFs). In this manner, the proposed scheme is capable
of tracking the evolution of speech due to the underlying
phonetic content, and exploiting the non-uniform information
flow-rate of speech by using a variable framing strategy. The
new VFR scheme is applied to automatic speech recognition
of TIMIT and NTIMIT corpora, where it is compared to
a traditional fixed window-size/frame-rate scheme. Our experiments
yield encouraging results with relative reductions
of 24% and 8% in WER (word error rate) for TIMIT and
NTIMIT tasks, respectively.
%0 Conference Paper
%1 phonovfr
%A Sangwan, Abhijeet
%A Hansen, John H. L.
%B Automatic Speech Recognition and Understanding (ASRU)
%D 2007
%K features frame phonological rate recognition speech variable
%P 582-586
%T PHONOLOGICAL FEATURE BASED VARIABLE FRAME RATE SCHEME FOR IMPROVED SPEECH RECOGNITION
%U http://sites.google.com/site/publicationsabhijeetsangwan/Home/publication-pdfs/ASRU_Final_Submission.pdf?attredirects=0
%X In this paper, we propose a new scheme for variable frame
rate (VFR) feature processing based on high level segmentation
(HLS) of speech into broad phone classes. Traditional
fixed-rate processing is not capable of accurately reflecting
the dynamics of continuous speech. On the other hand, the
proposed VFR scheme adapts the temporal representation of
the speech signal by tying the framing strategy with the detected
phone class sequence. The phone classes are detected
and segmented by using appropriately trained phonological
features (PFs). In this manner, the proposed scheme is capable
of tracking the evolution of speech due to the underlying
phonetic content, and exploiting the non-uniform information
flow-rate of speech by using a variable framing strategy. The
new VFR scheme is applied to automatic speech recognition
of TIMIT and NTIMIT corpora, where it is compared to
a traditional fixed window-size/frame-rate scheme. Our experiments
yield encouraging results with relative reductions
of 24% and 8% in WER (word error rate) for TIMIT and
NTIMIT tasks, respectively.
@inproceedings{phonovfr,
abstract = {In this paper, we propose a new scheme for variable frame
rate (VFR) feature processing based on high level segmentation
(HLS) of speech into broad phone classes. Traditional
fixed-rate processing is not capable of accurately reflecting
the dynamics of continuous speech. On the other hand, the
proposed VFR scheme adapts the temporal representation of
the speech signal by tying the framing strategy with the detected
phone class sequence. The phone classes are detected
and segmented by using appropriately trained phonological
features (PFs). In this manner, the proposed scheme is capable
of tracking the evolution of speech due to the underlying
phonetic content, and exploiting the non-uniform information
flow-rate of speech by using a variable framing strategy. The
new VFR scheme is applied to automatic speech recognition
of TIMIT and NTIMIT corpora, where it is compared to
a traditional fixed window-size/frame-rate scheme. Our experiments
yield encouraging results with relative reductions
of 24% and 8% in WER (word error rate) for TIMIT and
NTIMIT tasks, respectively.},
added-at = {2008-10-19T22:48:25.000+0200},
author = {Sangwan, Abhijeet and Hansen, John H. L.},
biburl = {https://www.bibsonomy.org/bibtex/2ea3d15600c8298872de119aff8d38de2/abhijeet.sangwan},
booktitle = {Automatic Speech Recognition and Understanding (ASRU)},
interhash = {b49abd99292eca449a443224984cd87f},
intrahash = {ea3d15600c8298872de119aff8d38de2},
keywords = {features frame phonological rate recognition speech variable},
month = {December},
pages = {582-586},
timestamp = {2008-10-19T22:48:25.000+0200},
title = {PHONOLOGICAL FEATURE BASED VARIABLE FRAME RATE SCHEME FOR IMPROVED SPEECH RECOGNITION},
url = {http://sites.google.com/site/publicationsabhijeetsangwan/Home/publication-pdfs/ASRU_Final_Submission.pdf?attredirects=0},
year = 2007
}