Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A Speech Mashup Framework for Multimodal Mobile Services

G. Di Fabbrizio, J. Wilpon, und T. Okken. Proceedings of the 11th International Conference on Multimodal Interfaces and the 6th Workshop on Machine Learning for Multimodal Interfaces (ICMI-MLMI '09), Cambridge, MA, USA, Seite 71-78. (2009)
DOI: 10.1145/1647314.1647329

Zusammenfassung

Amid today's proliferation of Web content and mobile phones with broadband data access, interacting with small-form factor devices is still cumbersome. Spoken interaction could overcome the input limitations of mobile devices, but running an automatic speech recognizer with the limited computational capabilities of a mobile device becomes an impossible challenge when large vocabularies for speech recognition must often be updated with dynamic content. One popular option is to move the speech processing resources into the network by concentrating the heavy computation load onto server farms. Although successful services have exploited this approach, it is unclear how such a model can be generalized to a large range of mobile applications and how to scale it for large deployments. To address these challenges we introduce the AT&T speech mashup architecture, a novel approach to speech services that leverages web services and cloud computing to make it easier to combine web content and speech processing. We show that this new compositional method is suitable for integrating automatic speech recognition and text-to-speech synthesis resources into real multimodal mobile services. The generality of this method allows researchers and speech practitioners to explore a countless variety of mobile multimodal services with a finer grain of control and richer multimedia interfaces. Moreover, we demonstrate that the speech mashup is scalable and particularly optimized to minimize round trips in the mobile network, reducing latency for better user experience.

Links und Ressourcen

BibTeX-Schlüssel: difabbrizio2009speech
Eintragstyp: inproceedings
Buchtitel: Proceedings of the 11th International Conference on Multimodal Interfaces and the 6th Workshop on Machine Learning for Multimodal Interfaces (ICMI-MLMI '09), Cambridge, MA, USA
Jahr: 2009
Seiten: 71-78
username: flint63
file: ACM Digital Library:2009/DiFabbrizioWilponOkken09ICMI.pdf:PDF
groups: public
DOI: 10.1145/1647314.1647329

@rnesselraths Tags hervorgehoben

Zitieren Sie diese Publikation

%0 Conference Paper %1 difabbrizio2009speech %A Di Fabbrizio, Giuseppe %A Wilpon, Jay %A Okken, Thomas %B Proceedings of the 11th International Conference on Multimodal Interfaces and the 6th Workshop on Machine Learning for Multimodal Interfaces (ICMI-MLMI '09), Cambridge, MA, USA %D 2009 %K framework interface mobile multimodal recognition service speech synthesis user web %P 71-78 %R 10.1145/1647314.1647329 %T A Speech Mashup Framework for Multimodal Mobile Services %X Amid today's proliferation of Web content and mobile phones with broadband data access, interacting with small-form factor devices is still cumbersome. Spoken interaction could overcome the input limitations of mobile devices, but running an automatic speech recognizer with the limited computational capabilities of a mobile device becomes an impossible challenge when large vocabularies for speech recognition must often be updated with dynamic content. One popular option is to move the speech processing resources into the network by concentrating the heavy computation load onto server farms. Although successful services have exploited this approach, it is unclear how such a model can be generalized to a large range of mobile applications and how to scale it for large deployments. To address these challenges we introduce the AT&T speech mashup architecture, a novel approach to speech services that leverages web services and cloud computing to make it easier to combine web content and speech processing. We show that this new compositional method is suitable for integrating automatic speech recognition and text-to-speech synthesis resources into real multimodal mobile services. The generality of this method allows researchers and speech practitioners to explore a countless variety of mobile multimodal services with a finer grain of control and richer multimedia interfaces. Moreover, we demonstrate that the speech mashup is scalable and particularly optimized to minimize round trips in the mobile network, reducing latency for better user experience.

@inproceedings{difabbrizio2009speech, abstract = {Amid today's proliferation of Web content and mobile phones with broadband data access, interacting with small-form factor devices is still cumbersome. Spoken interaction could overcome the input limitations of mobile devices, but running an automatic speech recognizer with the limited computational capabilities of a mobile device becomes an impossible challenge when large vocabularies for speech recognition must often be updated with dynamic content. One popular option is to move the speech processing resources into the network by concentrating the heavy computation load onto server farms. Although successful services have exploited this approach, it is unclear how such a model can be generalized to a large range of mobile applications and how to scale it for large deployments. To address these challenges we introduce the AT\&T speech mashup architecture, a novel approach to speech services that leverages web services and cloud computing to make it easier to combine web content and speech processing. We show that this new compositional method is suitable for integrating automatic speech recognition and text-to-speech synthesis resources into real multimodal mobile services. The generality of this method allows researchers and speech practitioners to explore a countless variety of mobile multimodal services with a finer grain of control and richer multimedia interfaces. Moreover, we demonstrate that the speech mashup is scalable and particularly optimized to minimize round trips in the mobile network, reducing latency for better user experience.}, added-at = {2015-06-03T15:32:47.000+0200}, author = {Di Fabbrizio, Giuseppe and Wilpon, Jay and Okken, Thomas}, biburl = {https://www.bibsonomy.org/bibtex/2cf2dedc4df2c28eb7e3e0c27a4ca0cb7/rnesselrath}, booktitle = {Proceedings of the 11th International Conference on Multimodal Interfaces and the 6th Workshop on Machine Learning for Multimodal Interfaces (ICMI-MLMI '09), Cambridge, MA, USA}, doi = {10.1145/1647314.1647329}, file = {ACM Digital Library:2009/DiFabbrizioWilponOkken09ICMI.pdf:PDF}, groups = {public}, interhash = {665b1c969830406d6456464fbada3c63}, intrahash = {cf2dedc4df2c28eb7e3e0c27a4ca0cb7}, keywords = {framework interface mobile multimodal recognition service speech synthesis user web}, pages = {71-78}, timestamp = {2015-06-03T15:32:47.000+0200}, title = {A Speech Mashup Framework for Multimodal Mobile Services}, username = {flint63}, year = 2009 }

BibSonomy

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A Speech Mashup Framework for Multimodal Mobile Services

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

BibSonomy

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML A Speech Mashup Framework for Multimodal Mobile Services

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

A Speech Mashup Framework for Multimodal Mobile Services

Kommentare und Rezensionen
(0)