XIMERA: A New TTS from ATR Based on Corpus-Based Technologies
H. Kawai, T. Toda, J. Ni, M. Tsuzaki, and K. Tokuda. Proceedings of the 5th ISCA Workshop on Speech Synthesis (SSW), page 179-184. Pittsburgh, PA, USA, (June 2004)
Abstract
This paper describes a new concatenative TTS system under development at ATR. The system, named XIMERA, is based on corpus-based technologies, as was the case for the preceding TTS systems from ATR, namely ν-talk and CHATR. The prominent features of XIMERA are (1) large corpora (a 110-hours corpus of a Japanese male, a 60-hours corpus of a Japanese female, and a 20-hours corpus of a Chinese female), (2) HMM-based generation of prosodic parameters, and (3) a cost function for segment selection optimized based on perceptual experiments. A perception test that evaluated the naturalness of synthetic speech for XIMERA and 10 TTS products, including CHATR, showed that XIMERA outperformed the other ten.
%0 Conference Paper
%1 Kawai2004
%A Kawai, Hisashi
%A Toda, Tomoki
%A Ni, Jinfu
%A Tsuzaki, Minoru
%A Tokuda, Keiichi
%B Proceedings of the 5th ISCA Workshop on Speech Synthesis (SSW)
%C Pittsburgh, PA, USA
%D 2004
%K imported
%P 179-184
%T XIMERA: A New TTS from ATR Based on Corpus-Based Technologies
%U http://www.isca-speech.org/archive_open/ssw5/ssw5_179.html
%X This paper describes a new concatenative TTS system under development at ATR. The system, named XIMERA, is based on corpus-based technologies, as was the case for the preceding TTS systems from ATR, namely ν-talk and CHATR. The prominent features of XIMERA are (1) large corpora (a 110-hours corpus of a Japanese male, a 60-hours corpus of a Japanese female, and a 20-hours corpus of a Chinese female), (2) HMM-based generation of prosodic parameters, and (3) a cost function for segment selection optimized based on perceptual experiments. A perception test that evaluated the naturalness of synthetic speech for XIMERA and 10 TTS products, including CHATR, showed that XIMERA outperformed the other ten.
@inproceedings{Kawai2004,
abstract = {This paper describes a new concatenative TTS system under development at ATR. The system, named XIMERA, is based on corpus-based technologies, as was the case for the preceding TTS systems from ATR, namely ν-talk and CHATR. The prominent features of XIMERA are (1) large corpora (a 110-hours corpus of a Japanese male, a 60-hours corpus of a Japanese female, and a 20-hours corpus of a Chinese female), (2) HMM-based generation of prosodic parameters, and (3) a cost function for segment selection optimized based on perceptual experiments. A perception test that evaluated the naturalness of synthetic speech for XIMERA and 10 TTS products, including CHATR, showed that XIMERA outperformed the other ten.},
added-at = {2021-02-01T10:51:23.000+0100},
address = {Pittsburgh, PA, USA},
author = {Kawai, Hisashi and Toda, Tomoki and Ni, Jinfu and Tsuzaki, Minoru and Tokuda, Keiichi},
biburl = {https://www.bibsonomy.org/bibtex/23454670452bd464531909c0c6809f6f3/m-toman},
booktitle = {Proceedings of the 5th ISCA Workshop on Speech Synthesis (SSW)},
file = {:pdfs/kawai_ssw_2004.pdf:PDF},
interhash = {f506028d47618de58dcad34d5c84c853},
intrahash = {3454670452bd464531909c0c6809f6f3},
keywords = {imported},
month = jun,
owner = {schabus},
pages = {179-184},
timestamp = {2021-02-01T10:51:23.000+0100},
title = {XIMERA: A New TTS from ATR Based on Corpus-Based Technologies},
url = {http://www.isca-speech.org/archive_open/ssw5/ssw5_179.html},
year = 2004
}