Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Accurate visible speech synthesis based on concatenating variable length motion capture data

J. Ma, R. Cole, B. Pellom, W. Ward, und B. Wise. IEEE Transactions on Visualization and Computer Graphics, 12 (2): 266-276 (2006)
DOI: 10.1109/TVCG.2006.18

Zusammenfassung

We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is described. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergartens through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.

Links und Ressourcen

BibTeX-Schlüssel: Ma2006
Eintragstyp: article
Jahr: 2006
Zeitschrift: IEEE Transactions on Visualization and Computer Graphics
Nummer: 2
Seiten: 266-276
Band: 12
owner: schabus
file: :pdfs/ma_ieeetransvis_2006.pdf:PDF
issn: 1077-2626
DOI: 10.1109/TVCG.2006.18

@m-tomans Tags hervorgehoben

Zitieren Sie diese Publikation

%0 Journal Article %1 Ma2006 %A Ma, Jiyong %A Cole, Ron %A Pellom, Bryan %A Ward, Wayne %A Wise, Barbara %D 2006 %J IEEE Transactions on Visualization and Computer Graphics %K (artificial Automated;Reproducibility Biological;Movement;Pattern Computer-Assisted;Imaging, Enhancement;Image Graphics;Computer Intelligence;Computer Interface;Video Interpretation, Measurement;User-Computer Production Recognition, Recording Results;Sensitivity Retrieval;Models, Simulation;Face;Image Specificity;Speech;Speech Storage Three-Dimensional;Information algorithm;speech analysis;Speech analysis;learning and animation animation;Humans;Large-scale animation;character animation;coarticulation animation;face capture codes;Facial computer data;optimal effect;facial effect;virtual face human.;visible intelligence);search learning learning;Prototypes;Speech model;coarticulation modelling;speech motion motion;lip motion;machine of problems;solid processing;Speech prototype;Concatenated recognition;image sequence;search speech speech;Algorithms;Artificial speech;visual synthesis;3D synthesis;Face synthesis;visual system;visible systems;Lips;Machine technique;motion %N 2 %P 266-276 %R 10.1109/TVCG.2006.18 %T Accurate visible speech synthesis based on concatenating variable length motion capture data %V 12 %X We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is described. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergartens through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.

@article{Ma2006, abstract = {We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is described. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergartens through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.}, added-at = {2021-02-01T10:51:23.000+0100}, author = {Ma, Jiyong and Cole, Ron and Pellom, Bryan and Ward, Wayne and Wise, Barbara}, biburl = {https://www.bibsonomy.org/bibtex/2e6814f9c3ebdf435359f31e679668df6/m-toman}, doi = {10.1109/TVCG.2006.18}, file = {:pdfs/ma_ieeetransvis_2006.pdf:PDF}, interhash = {c6071abe223856c6ec8122e9cfa31132}, intrahash = {e6814f9c3ebdf435359f31e679668df6}, issn = {1077-2626}, journal = {IEEE Transactions on Visualization and Computer Graphics}, keywords = {(artificial Automated;Reproducibility Biological;Movement;Pattern Computer-Assisted;Imaging, Enhancement;Image Graphics;Computer Intelligence;Computer Interface;Video Interpretation, Measurement;User-Computer Production Recognition, Recording Results;Sensitivity Retrieval;Models, Simulation;Face;Image Specificity;Speech;Speech Storage Three-Dimensional;Information algorithm;speech analysis;Speech analysis;learning and animation animation;Humans;Large-scale animation;character animation;coarticulation animation;face capture codes;Facial computer data;optimal effect;facial effect;virtual face human.;visible intelligence);search learning learning;Prototypes;Speech model;coarticulation modelling;speech motion motion;lip motion;machine of problems;solid processing;Speech prototype;Concatenated recognition;image sequence;search speech speech;Algorithms;Artificial speech;visual synthesis;3D synthesis;Face synthesis;visual system;visible systems;Lips;Machine technique;motion}, number = 2, owner = {schabus}, pages = {266-276}, timestamp = {2021-02-01T10:51:23.000+0100}, title = {Accurate visible speech synthesis based on concatenating variable length motion capture data}, volume = 12, year = 2006 }

BibSonomy

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Accurate visible speech synthesis based on concatenating variable length motion capture data

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

BibSonomy

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Accurate visible speech synthesis based on concatenating variable length motion capture data

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Accurate visible speech synthesis based on concatenating variable length motion capture data

Kommentare und Rezensionen
(0)