Learning optimal audiovisual phasing for a HMM-based control model for facial animation
O. Govokhina, G. Bailly, and G. Breton. Proceedings of the 6th ISCA Workshop on Speech Synthesis (SSW), page 1-4. Bonn, Germany, (Aug 22, 2007)
Abstract
We propose here an HMM-based trajectory formation sys- tem that predicts articulatory trajectories of a talking face from phonetic input. In order to add flexibility to the acoustic/ gestural alignment and take into account anticipatory ges- tures, a phasing model has been developed that predicts the delays between the acoustic boundaries of allophones to be synthesized and the gestural boundaries of HMM triphones. The HMM triphones and the phasing model are trained simultaneously using an iterative analysis-synthesis loop. Con- vergence is obtained within a few iterations. We demonstrate here that the phasing model improves significantly the pre- diction error and captures subtle context-dependent antici- patory phenomena.
Proceedings of the 6th ISCA Workshop on Speech Synthesis (SSW)
year
2007
month
aug
day
22
pages
1-4
owner
schabus
audience
international
file
:pdfs/govokhina_ssw_2007.pdf:PDF
affiliation
Grenoble Images Parole Signal Automatique - GIPSA-lab - CNRS : UMR5216 - Université Joseph Fourier - Grenoble I - Université Pierre Mendès-France - Grenoble II - Université Stendhal - Grenoble III - Institut Polytechnique de Grenoble - Orange R&D - ORANGE R&D - France Télécom
%0 Conference Paper
%1 Govokhina2007
%A Govokhina, Oxana
%A Bailly, Gérard
%A Breton, Gaspard
%B Proceedings of the 6th ISCA Workshop on Speech Synthesis (SSW)
%C Bonn, Germany
%D 2007
%K animation; audiovisual facial formation speech synchronization synthesis; system; trajectory {A}udiovisual {HMM}-based
%P 1-4
%T Learning optimal audiovisual phasing for a HMM-based control model for facial animation
%U http://www.isca-speech.org/archive_open/ssw6/ssw6_001.html
%X We propose here an HMM-based trajectory formation sys- tem that predicts articulatory trajectories of a talking face from phonetic input. In order to add flexibility to the acoustic/ gestural alignment and take into account anticipatory ges- tures, a phasing model has been developed that predicts the delays between the acoustic boundaries of allophones to be synthesized and the gestural boundaries of HMM triphones. The HMM triphones and the phasing model are trained simultaneously using an iterative analysis-synthesis loop. Con- vergence is obtained within a few iterations. We demonstrate here that the phasing model improves significantly the pre- diction error and captures subtle context-dependent antici- patory phenomena.
@inproceedings{Govokhina2007,
abstract = {{W}e propose here an {HMM}-based trajectory formation sys- tem that predicts articulatory trajectories of a talking face from phonetic input. {I}n order to add flexibility to the acoustic/ gestural alignment and take into account anticipatory ges- tures, a phasing model has been developed that predicts the delays between the acoustic boundaries of allophones to be synthesized and the gestural boundaries of {HMM} triphones. {T}he {HMM} triphones and the phasing model are trained simultaneously using an iterative analysis-synthesis loop. {C}on- vergence is obtained within a few iterations. {W}e demonstrate here that the phasing model improves significantly the pre- diction error and captures subtle context-dependent antici- patory phenomena.},
added-at = {2021-02-01T10:51:23.000+0100},
address = {Bonn, Germany},
affiliation = {{G}renoble {I}mages {P}arole {S}ignal {A}utomatique - {GIPSA}-lab - {CNRS} : {UMR}5216 - {U}niversit{\'e} {J}oseph {F}ourier - {G}renoble {I} - {U}niversit{\'e} {P}ierre {M}end{\`e}s-{F}rance - {G}renoble {II} - {U}niversit{\'e} {S}tendhal - {G}renoble {III} - {I}nstitut {P}olytechnique de {G}renoble - {O}range {R}\&{D} - {ORANGE} {R}\&{D} - {F}rance {T}{\'e}l{\'e}com},
audience = {international},
author = {Govokhina, Oxana and Bailly, Gérard and Breton, Gaspard},
biburl = {https://www.bibsonomy.org/bibtex/2ee339fc656142eac6f42623abeb31a7d/m-toman},
booktitle = {Proceedings of the 6th ISCA Workshop on Speech Synthesis (SSW)},
collaboration = {{C}ontrat de recherche {FT} {R}\&{D}},
day = 22,
file = {:pdfs/govokhina_ssw_2007.pdf:PDF},
interhash = {09e4667c1acdb941609394cca8619f73},
intrahash = {ee339fc656142eac6f42623abeb31a7d},
keywords = {animation; audiovisual facial formation speech synchronization synthesis; system; trajectory {A}udiovisual {HMM}-based},
month = aug,
owner = {schabus},
pages = {1-4},
timestamp = {2021-02-01T10:51:23.000+0100},
title = {Learning optimal audiovisual phasing for a {HMM}-based control model for facial animation},
url = {http://www.isca-speech.org/archive_open/ssw6/ssw6_001.html},
year = 2007
}