Artikel,

Emphatic Visual Speech Synthesis

J. Melenchón, E. Martínez, F. De La Torre, und J. Montero.
IEEE Transactions on Audio, Speech, and Language Processing, 17 (3): 459-468 (März 2009)
DOI: 10.1109/TASL.2008.2010213

Zusammenfassung

The synthesis of talking heads has been a flourishing research area over the last few years. Since human beings have an uncanny ability to read people's faces, most related applications (e.g., advertising, video-teleconferencing) require absolutely realistic photometric and behavioral synthesis of faces. This paper proposes a person-specific facial synthesis framework that allows high realism and includes a novel way to control visual emphasis (e.g., level of exaggeration of visible articulatory movements of the vocal tract). There are three main contributions: a geodesic interpolation with visual unit selection, a parameterization of visual emphasis, and the design of minimum size corpora. Perceptual tests with human subjects reveal high realism properties, achieving similar perceptual scores as real samples. Furthermore, the visual emphasis level and two communication styles show a statistical interaction relationship.

BibTeX-Schlüssel: Melenchon2009
Eintragstyp: article
Jahr: 2009
Monat: mar
Zeitschrift: IEEE Transactions on Audio, Speech, and Language Processing
Nummer: 3
Seiten: 459-468
Band: 17
owner: schabus
file: :pdfs/melenchon_ieeetransaudio_2009.pdf:PDF
issn: 1558-7916
DOI: 10.1109/TASL.2008.2010213

BibSonomy

Emphatic Visual Speech Synthesis

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf