@m-toman

Visual Speech Synthesis Based On Parameter Generation From HMM: Speech-Driven And Text-And-Speech-Driven Approaches

, , , and . Proceedings of the 2nd International Conference on Auditory-Visual Speech Processing (AVSP), page 221-226. Terrigal, Sydney, Austraila, (December 1998)

Abstract

This paper describes a technique for synthesizing synchronized lip movements from auditory input speech signal. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. Audio-visual speech unit HMMs, namely, syllable HMMs are trained with parameter vector sequences that represent both auditory and visual speech features. Input speech is recognized using the syllable HMMs and converted into a transcription and a state sequence. A sentence HMM is constructed by concatenating the syllable HMMs corresponding to the transcription for the input speech. Then an optimum visual speech parameter sequence is generated from the sentence HMM in ML sense. Since the generated parameter sequence reflects statistical information of both static and dynamic features of several phonemes before and after the current phonemes, synthetic lip motion becomes smooth and realistic. We show experimental results which demonstrate the effectiveness of the proposed technique.

Links and resources

Tags

community

  • @m-toman
  • @dblp
@m-toman's tags highlighted