SottoVoce: An Ultrasound Imaging-Based Silent Speech
Interaction Using Deep Neural Networks
N. Kimura, M. Kono, and J. Rekimoto. Proceedings of the 2019 CHI Conference on Human Factors in
Computing Systems, Paper 146, page 1--11. New York, NY, USA, Association for Computing Machinery, (May 2019)
Abstract
The availability of digital devices operated by voice is
expanding rapidly. However, the applications of voice interfaces
are still restricted. For example, speaking in public places
becomes an annoyance to the surrounding people, and secret
information should not be uttered. Environmental noise may
reduce the accuracy of speech recognition. To address these
limitations, a system to detect a user's unvoiced utterance is
proposed. From internal information observed by an ultrasonic
imaging sensor attached to the underside of the jaw, our
proposed system recognizes the utterance contents without the
user's uttering voice. Our proposed deep neural network model is
used to obtain acoustic features from a sequence of ultrasound
images. We confirmed that audio signals generated by our system
can control the existing smart speakers. We also observed that a
user can adjust their oral movement to learn and improve the
accuracy of their voice recognition.
%0 Conference Paper
%1 Kimura2019-tx
%A Kimura, Naoki
%A Kono, Michinari
%A Rekimoto, Jun
%B Proceedings of the 2019 CHI Conference on Human Factors in
Computing Systems
%C New York, NY, USA
%D 2019
%I Association for Computing Machinery
%K deep human-ai imaging, integration, interaction, networks, neural silent speech speech;subvocal ultrasonic
%N Paper 146
%P 1--11
%T SottoVoce: An Ultrasound Imaging-Based Silent Speech
Interaction Using Deep Neural Networks
%X The availability of digital devices operated by voice is
expanding rapidly. However, the applications of voice interfaces
are still restricted. For example, speaking in public places
becomes an annoyance to the surrounding people, and secret
information should not be uttered. Environmental noise may
reduce the accuracy of speech recognition. To address these
limitations, a system to detect a user's unvoiced utterance is
proposed. From internal information observed by an ultrasonic
imaging sensor attached to the underside of the jaw, our
proposed system recognizes the utterance contents without the
user's uttering voice. Our proposed deep neural network model is
used to obtain acoustic features from a sequence of ultrasound
images. We confirmed that audio signals generated by our system
can control the existing smart speakers. We also observed that a
user can adjust their oral movement to learn and improve the
accuracy of their voice recognition.
@inproceedings{Kimura2019-tx,
abstract = {The availability of digital devices operated by voice is
expanding rapidly. However, the applications of voice interfaces
are still restricted. For example, speaking in public places
becomes an annoyance to the surrounding people, and secret
information should not be uttered. Environmental noise may
reduce the accuracy of speech recognition. To address these
limitations, a system to detect a user's unvoiced utterance is
proposed. From internal information observed by an ultrasonic
imaging sensor attached to the underside of the jaw, our
proposed system recognizes the utterance contents without the
user's uttering voice. Our proposed deep neural network model is
used to obtain acoustic features from a sequence of ultrasound
images. We confirmed that audio signals generated by our system
can control the existing smart speakers. We also observed that a
user can adjust their oral movement to learn and improve the
accuracy of their voice recognition.},
added-at = {2023-06-06T00:19:46.000+0200},
address = {New York, NY, USA},
author = {Kimura, Naoki and Kono, Michinari and Rekimoto, Jun},
biburl = {https://www.bibsonomy.org/bibtex/2ea62bc42ceb99fa8187cffdba07bce36/willwade},
booktitle = {Proceedings of the 2019 {CHI} Conference on Human Factors in
Computing Systems},
interhash = {9e4d71101e024cb5ff254c7555aa6ff4},
intrahash = {ea62bc42ceb99fa8187cffdba07bce36},
keywords = {deep human-ai imaging, integration, interaction, networks, neural silent speech speech;subvocal ultrasonic},
location = {Glasgow, Scotland Uk},
month = may,
number = {Paper 146},
pages = {1--11},
publisher = {Association for Computing Machinery},
series = {CHI '19},
timestamp = {2023-06-06T00:20:15.000+0200},
title = {{SottoVoce}: An Ultrasound {Imaging-Based} Silent Speech
Interaction Using Deep Neural Networks},
year = 2019
}