Abstract
Silent speech decoding (SSD), based on articulatory neuromuscular
activities, has become a prevalent task of brain-computer
interfaces (BCIs) in recent years. Many works have been devoted
to decoding surface electromyography (sEMG) from articulatory
neuromuscular activities. However, restoring silent speech in
tonal languages such as Mandarin Chinese is still difficult. This
paper proposes an optimized sequence-to-sequence (Seq2Seq)
approach to synthesize voice from the sEMG-based silent speech.
We extract duration information to regulate the sEMG-based silent
speech using the audio length. Then, we provide a deep-learning
model with an encoder-decoder structure and a state-of-the-art
vocoder to generate the audio waveform. Experiments based on six
Mandarin Chinese speakers demonstrate that the proposed model can
successfully decode silent speech in Mandarin Chinese and achieve
a character error rate (CER) of 6.41\% on average with human
evaluation.
Users
Please
log in to take part in the discussion (add own reviews or comments).