Аннотация
The Global System for Mobile (GSM) environment
encompasses three main problems for automatic speech
recognition (ASR) systems: noisy scenarios, source
coding distortion, and transmission errors. The first
one has already received much attention; however,
source coding distortion and transmission errors must
be explicitly addressed. In this paper, we propose an
alternative front-end for speech recognition over GSM
networks. This front-end is specially conceived to be
effective against source coding distortion and
transmission errors. Specifically, we suggest
extracting the recognition feature vectors directly
from the encoded speech (i.e., the bitstream) instead
of decoding it and subsequently extracting the feature
vectors. This approach offers two significant
advantages. First, the recognition system is only
affected by the quantization distortion of the spectral
envelope. Thus, we are avoiding the influence of other
sources of distortion as a result of the
encoding-decoding process. Second, when transmission
errors occur, our front-end becomes more effective
since it is not affected by errors in bits allocated to
the excitation signal. We have considered the half and
the full-rate standard codecs and compared the proposed
front-end with the conventional approach in two ASR
tasks, namely, speaker-independent isolated digit
recognition and speaker-independent continuous speech
recognition. In general, our approach outperforms the
conventional procedure, for a variety of simulated
channel conditions. Furthermore, the disparity
increases as the network conditions worsen.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)