copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method

J. Tao, L. Xin, and P. Yin. IEEE Transactions on Audio, Speech, and Language Processing, 17 (3): 469-477 (March 2009)
DOI: 10.1109/TASL.2008.2011538

Abstract

This paper presents a realistic visual speech synthesis based on the hybrid concatenation method. Unlike previous methods based on phoneme level unit selection or hidden Markov model (HMM), etc., the hybrid concatenation method uses a frame level-based unit selection method combined with a fused HMM, and is able to generate more expressive and stable facial animations. The fused HMM can be used to explicitly model the loose synchronization of tightly coupled streams, with much better results than a normal HMM for audiovisual mapping. After fused HMM is created, facial animation is generated via the unit selection method at the frame level by using the fused HMM output probabilities. To accelerate the computing efficiency of the unit selection on a large corpus, this paper also proposes a two-layer Viterbi search method in which only the subsets that have been selected in the first layer are further checked in the second layer. Using this idea, the system has been successfully integrated into real-time applications. Furthermore, the paper also proposes a mapping method to generate emotional facial expressions from neutral facial expressions based on Gaussian mixture models (GMMs). Final experiments prove that the method described can output synthesized facial parameters with high quality. Compared with other audiovisual mapping methods, our method has better performance with respect to expressiveness, stability, and system running speed.

Links and resources

BibTeX key: Tao2009
entry type: article
year: 2009
month: mar
journal: IEEE Transactions on Audio, Speech, and Language Processing
number: 3
pages: 469-477
volume: 17
owner: schabus
file: :pdfs/tao_transaslp_2009.pdf:PDF
issn: 1558-7916
DOI: 10.1109/TASL.2008.2011538

Cite this publication

%0 Journal Article %1 Tao2009 %A Tao, Jianhua %A Xin, Le %A Yin, Panrong %D 2009 %J IEEE Transactions on Audio, Speech, and Language Processing %K (HMM);inversion;speech-driven Gaussian HMM;hybrid Markov Viterbi algorithm;Fused animation;Hidden animation;frame animation;unit applications;two-layer concatenation concatenation;visual facial hidden level-based media;Viterbi method;fused method;real-time method;visual methods;Speech mixture model model;facial models;Hybrid models;search power problems;speech processes;emotion recognition;hidden search selection speech synthesis synthesis;Acceleration;Facial synthesis;Gaussian synthesis;Stability;Streaming systems;Real systems;Search time unit %N 3 %P 469-477 %R 10.1109/TASL.2008.2011538 %T Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method %V 17 %X This paper presents a realistic visual speech synthesis based on the hybrid concatenation method. Unlike previous methods based on phoneme level unit selection or hidden Markov model (HMM), etc., the hybrid concatenation method uses a frame level-based unit selection method combined with a fused HMM, and is able to generate more expressive and stable facial animations. The fused HMM can be used to explicitly model the loose synchronization of tightly coupled streams, with much better results than a normal HMM for audiovisual mapping. After fused HMM is created, facial animation is generated via the unit selection method at the frame level by using the fused HMM output probabilities. To accelerate the computing efficiency of the unit selection on a large corpus, this paper also proposes a two-layer Viterbi search method in which only the subsets that have been selected in the first layer are further checked in the second layer. Using this idea, the system has been successfully integrated into real-time applications. Furthermore, the paper also proposes a mapping method to generate emotional facial expressions from neutral facial expressions based on Gaussian mixture models (GMMs). Final experiments prove that the method described can output synthesized facial parameters with high quality. Compared with other audiovisual mapping methods, our method has better performance with respect to expressiveness, stability, and system running speed.

@article{Tao2009, abstract = {This paper presents a realistic visual speech synthesis based on the hybrid concatenation method. Unlike previous methods based on phoneme level unit selection or hidden Markov model (HMM), etc., the hybrid concatenation method uses a frame level-based unit selection method combined with a fused HMM, and is able to generate more expressive and stable facial animations. The fused HMM can be used to explicitly model the loose synchronization of tightly coupled streams, with much better results than a normal HMM for audiovisual mapping. After fused HMM is created, facial animation is generated via the unit selection method at the frame level by using the fused HMM output probabilities. To accelerate the computing efficiency of the unit selection on a large corpus, this paper also proposes a two-layer Viterbi search method in which only the subsets that have been selected in the first layer are further checked in the second layer. Using this idea, the system has been successfully integrated into real-time applications. Furthermore, the paper also proposes a mapping method to generate emotional facial expressions from neutral facial expressions based on Gaussian mixture models (GMMs). Final experiments prove that the method described can output synthesized facial parameters with high quality. Compared with other audiovisual mapping methods, our method has better performance with respect to expressiveness, stability, and system running speed.}, added-at = {2021-02-01T10:51:23.000+0100}, author = {Tao, Jianhua and Xin, Le and Yin, Panrong}, biburl = {https://www.bibsonomy.org/bibtex/27019d41f79e7f779e2fb909284429ea3/m-toman}, doi = {10.1109/TASL.2008.2011538}, file = {:pdfs/tao_transaslp_2009.pdf:PDF}, interhash = {0365db1ac5f29fa24e4e71a0b9ecd2f2}, intrahash = {7019d41f79e7f779e2fb909284429ea3}, issn = {1558-7916}, journal = {IEEE Transactions on Audio, Speech, and Language Processing}, keywords = {(HMM);inversion;speech-driven Gaussian HMM;hybrid Markov Viterbi algorithm;Fused animation;Hidden animation;frame animation;unit applications;two-layer concatenation concatenation;visual facial hidden level-based media;Viterbi method;fused method;real-time method;visual methods;Speech mixture model model;facial models;Hybrid models;search power problems;speech processes;emotion recognition;hidden search selection speech synthesis synthesis;Acceleration;Facial synthesis;Gaussian synthesis;Stability;Streaming systems;Real systems;Search time unit}, month = mar, number = 3, owner = {schabus}, pages = {469-477}, timestamp = {2021-02-01T10:51:23.000+0100}, title = {Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method}, volume = 17, year = 2009 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method

Comments and Reviews
(0)