copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Simultaneous Modeling of Phonetic and Prosodic Parameters, and Characteristic Conversion for HMM-Based Text-to-Speech Systems

T. Yoshimura. Nagoya Institute of Technology, Nagoya, Japan, (2002)

Abstract

A text-to-speech(TTS) system is one of the human-machine interfaces using speech. In recent years, TTS system is developed as an output device of human-machine interfaces, and it is used in many application such as a car navigation system, in- formation retrieval over the telephone, voice mail, a speech-to-speech translation system and so on. However, although most text-to-speech systems still cannot syn- thesize speech with various voice characteristics such as speaker individualities and emotions. To obtain various voice characteristics in text-to-speech systems based on the selection and concatenation of acoustical units, a large amount of speech data is necessary. However, it is difficult to collect, segment, and store it. From these points of view, in order to construct a speech synthesis system which can generate various voice characteristics, an HMM-based text-to-speech system has been proposed. This dissertation presents the construction of the HMM-based text-to-speech system, in which spectrum, fundamental frequency and duration are modeled simultaneously in a unified framework of HMM. In the system, mainly three techniques are used; (1) a mel-cepstral analysis/synthesis technique, (2) speech parameter modeling using HMM and (3) a speech parameter generation algorithm from HMM. Since the system uses above three techniques, the system has several capabilities. First, since the TTS system uses the speech parameter generation algorithm, the generated spectral and pitch paramters from the trained HMMs can be similar to those of real speech. Second, by transforming HMM parameters appropriately, voice characteristics of synthetic speech can be changed since the system generates speech from the HMMs. Third, this system is trainable. In this thesis, first, the above three techniques are presented, and simultaneous modeling of phonetic and prosodic parameters in a framework of HMM is proposed. Next, to improve of the quality of synthesized speech, the mixed excitation model of the speech coder MELP and postfilter are incorporated into the system. Experimen- tal results show that the mixed excitation model and postfilter significantly improve the quality of synthesized speech. Finally, for the purpose of synthesizing speech with various voice characteristics such as speaker individualities and emotions, the TTS system based on speaker interpolation is presented.

Links and resources

BibTeX key: Yoshimura2002
entry type: phdthesis
address: Nagoya, Japan
year: 2002
school: Nagoya Institute of Technology
owner: schabus
file: :pdfs/yoshimura_phd_2002.pdf:PDF

Cite this publication

%0 Thesis %1 Yoshimura2002 %A Yoshimura, Takayoshi %C Nagoya, Japan %D 2002 %K imported %T Simultaneous Modeling of Phonetic and Prosodic Parameters, and Characteristic Conversion for HMM-Based Text-to-Speech Systems %X A text-to-speech(TTS) system is one of the human-machine interfaces using speech. In recent years, TTS system is developed as an output device of human-machine interfaces, and it is used in many application such as a car navigation system, in- formation retrieval over the telephone, voice mail, a speech-to-speech translation system and so on. However, although most text-to-speech systems still cannot syn- thesize speech with various voice characteristics such as speaker individualities and emotions. To obtain various voice characteristics in text-to-speech systems based on the selection and concatenation of acoustical units, a large amount of speech data is necessary. However, it is difficult to collect, segment, and store it. From these points of view, in order to construct a speech synthesis system which can generate various voice characteristics, an HMM-based text-to-speech system has been proposed. This dissertation presents the construction of the HMM-based text-to-speech system, in which spectrum, fundamental frequency and duration are modeled simultaneously in a unified framework of HMM. In the system, mainly three techniques are used; (1) a mel-cepstral analysis/synthesis technique, (2) speech parameter modeling using HMM and (3) a speech parameter generation algorithm from HMM. Since the system uses above three techniques, the system has several capabilities. First, since the TTS system uses the speech parameter generation algorithm, the generated spectral and pitch paramters from the trained HMMs can be similar to those of real speech. Second, by transforming HMM parameters appropriately, voice characteristics of synthetic speech can be changed since the system generates speech from the HMMs. Third, this system is trainable. In this thesis, first, the above three techniques are presented, and simultaneous modeling of phonetic and prosodic parameters in a framework of HMM is proposed. Next, to improve of the quality of synthesized speech, the mixed excitation model of the speech coder MELP and postfilter are incorporated into the system. Experimen- tal results show that the mixed excitation model and postfilter significantly improve the quality of synthesized speech. Finally, for the purpose of synthesizing speech with various voice characteristics such as speaker individualities and emotions, the TTS system based on speaker interpolation is presented.

@phdthesis{Yoshimura2002, abstract = {A text-to-speech(TTS) system is one of the human-machine interfaces using speech. In recent years, TTS system is developed as an output device of human-machine interfaces, and it is used in many application such as a car navigation system, in- formation retrieval over the telephone, voice mail, a speech-to-speech translation system and so on. However, although most text-to-speech systems still cannot syn- thesize speech with various voice characteristics such as speaker individualities and emotions. To obtain various voice characteristics in text-to-speech systems based on the selection and concatenation of acoustical units, a large amount of speech data is necessary. However, it is difficult to collect, segment, and store it. From these points of view, in order to construct a speech synthesis system which can generate various voice characteristics, an HMM-based text-to-speech system has been proposed. This dissertation presents the construction of the HMM-based text-to-speech system, in which spectrum, fundamental frequency and duration are modeled simultaneously in a unified framework of HMM. In the system, mainly three techniques are used; (1) a mel-cepstral analysis/synthesis technique, (2) speech parameter modeling using HMM and (3) a speech parameter generation algorithm from HMM. Since the system uses above three techniques, the system has several capabilities. First, since the TTS system uses the speech parameter generation algorithm, the generated spectral and pitch paramters from the trained HMMs can be similar to those of real speech. Second, by transforming HMM parameters appropriately, voice characteristics of synthetic speech can be changed since the system generates speech from the HMMs. Third, this system is trainable. In this thesis, first, the above three techniques are presented, and simultaneous modeling of phonetic and prosodic parameters in a framework of HMM is proposed. Next, to improve of the quality of synthesized speech, the mixed excitation model of the speech coder MELP and postfilter are incorporated into the system. Experimen- tal results show that the mixed excitation model and postfilter significantly improve the quality of synthesized speech. Finally, for the purpose of synthesizing speech with various voice characteristics such as speaker individualities and emotions, the TTS system based on speaker interpolation is presented.}, added-at = {2021-02-01T10:51:23.000+0100}, address = {Nagoya, Japan}, author = {Yoshimura, Takayoshi}, biburl = {https://www.bibsonomy.org/bibtex/2910b67db16c05dc58f55bccb47a6cfd0/m-toman}, file = {:pdfs/yoshimura_phd_2002.pdf:PDF}, interhash = {b5064ba241669ba29f24acf6964dc679}, intrahash = {910b67db16c05dc58f55bccb47a6cfd0}, keywords = {imported}, owner = {schabus}, school = {Nagoya Institute of Technology}, timestamp = {2021-02-01T10:51:23.000+0100}, title = {Simultaneous Modeling of Phonetic and Prosodic Parameters, and Characteristic Conversion for HMM-Based Text-to-Speech Systems}, year = 2002 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Simultaneous Modeling of Phonetic and Prosodic Parameters, and Characteristic Conversion for HMM-Based Text-to-Speech Systems

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Simultaneous Modeling of Phonetic and Prosodic Parameters, and Characteristic Conversion for HMM-Based Text-to-Speech Systems

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Simultaneous Modeling of Phonetic and Prosodic Parameters, and Characteristic Conversion for HMM-Based Text-to-Speech Systems

Comments and Reviews
(0)