copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Information Enquiry Kiosk with Multimodal User Interface

A. Karpov, and A. Ronzhin. Pattern Recognition and Image Analysis, 19 (3): 546-558 (September 2009)
DOI: 10.1134/S1054661809030225

Abstract

A multimodal interactive dialogue automaton (kiosk) for self-service is presented in the paper. Multimodal user interface allow people to interact with the kiosk by natural speech, gestures additionally to the standard input and output devices. Architecture of the kiosk contains key modules of speech processing and computer vision. An array of four microphones is applied for far-field capturing and recording of user's speech commands, it allows the kiosk to detect voice activity, to localize sources of desired speech signals, and to eliminate environmental acoustical noises. A noise robust speaker-independent recognition system is applied to automatic interpretation and understanding of continuous Russian speech. The distant speech recognizer uses grammar of voice queries as well as garbage and silence models to improve recognition accuracy. Pair of portable video-cameras are applied for vision-based detection and tracking of user's head and body position inside of the working area. Russian-speaking talking head serves both for bimodal audio-visual speech synthesis and for improvement of communication intelligibility by turning the head to an approaching client. Dialogue manager controls the flow of dialogue and synchronizes sub-modules for input modalities fusion and output modalities fission. The experiments made with the multimodal kiosk were directed to cognitive and usability studies of human-computer interaction by different communication means.

Cite this publication

@article{KarpovRonzhin09pria, abstract = {A multimodal interactive dialogue automaton (kiosk) for self-service is presented in the paper. Multimodal user interface allow people to interact with the kiosk by natural speech, gestures additionally to the standard input and output devices. Architecture of the kiosk contains key modules of speech processing and computer vision. An array of four microphones is applied for far-field capturing and recording of user's speech commands, it allows the kiosk to detect voice activity, to localize sources of desired speech signals, and to eliminate environmental acoustical noises. A noise robust speaker-independent recognition system is applied to automatic interpretation and understanding of continuous Russian speech. The distant speech recognizer uses grammar of voice queries as well as garbage and silence models to improve recognition accuracy. Pair of portable video-cameras are applied for vision-based detection and tracking of user's head and body position inside of the working area. Russian-speaking talking head serves both for bimodal audio-visual speech synthesis and for improvement of communication intelligibility by turning the head to an approaching client. Dialogue manager controls the flow of dialogue and synchronizes sub-modules for input modalities fusion and output modalities fission. The experiments made with the multimodal kiosk were directed to cognitive and usability studies of human-computer interaction by different communication means.}, added-at = {2012-05-30T10:48:50.000+0200}, author = {Karpov, Alexey A. and Ronzhin, Andrey L.}, biburl = {https://www.bibsonomy.org/bibtex/200318d46814eb92803f48542fa2574ad/flint63}, doi = {10.1134/S1054661809030225}, file = {SpringerLink:2009/KarpovRonzhin09pria.pdf:PDF}, groups = {public}, interhash = {df562ff71d8f03907ccc239cdac0191f}, intrahash = {00318d46814eb92803f48542fa2574ad}, issn = {1054-6618}, journal = {Pattern Recognition and Image Analysis}, keywords = {v1205 paper ai multimodal interaction user interface dialog zzz.mmi}, month = {#sep#}, number = 3, pages = {546-558}, timestamp = {2018-04-16T11:46:40.000+0200}, title = {Information Enquiry Kiosk with Multimodal User Interface}, username = {flint63}, volume = 19, year = 2009 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Information Enquiry Kiosk with Multimodal User Interface

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Information Enquiry Kiosk with Multimodal User Interface

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Information Enquiry Kiosk with Multimodal User Interface

Comments and Reviews
(0)