copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Unit selection in a concatenative speech synthesis system using a large speech database

A. Hunt, and A. Black. Proceedings of the 1996 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1, page 373-376. Atlanta, GA, USA, (May 1996)
DOI: 10.1109/ICASSP.1996.541110

Abstract

One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database. This approach to waveform synthesis permits training from natural speech: two methods for training from speech are presented which provide weights which produce more natural speech than can be obtained by hand-tuning

Links and resources

BibTeX key: Hunt1996
entry type: inproceedings
address: Atlanta, GA, USA
booktitle: Proceedings of the 1996 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
year: 1996
month: may
pages: 373-376
volume: 1
owner: schabus
file: :pdfs/hunt_icassp_1996.pdf:PDF
issn: 1520-6149
DOI: 10.1109/ICASSP.1996.541110

Cite this publication

@inproceedings{Hunt1996, abstract = {One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database. This approach to waveform synthesis permits training from natural speech: two methods for training from speech are presented which provide weights which produce more natural speech than can be obtained by hand-tuning}, added-at = {2021-02-01T10:51:23.000+0100}, address = {Atlanta, GA, USA}, author = {Hunt, Andrew J. and Black, Alan W.}, biburl = {https://www.bibsonomy.org/bibtex/215558da362f59d7fdd21eb4a7768b0b4/m-toman}, booktitle = {Proceedings of the 1996 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, doi = {10.1109/ICASSP.1996.541110}, file = {:pdfs/hunt_icassp_1996.pdf:PDF}, interhash = {b1658aa9854d8ac5f504120525b280a4}, intrahash = {15558da362f59d7fdd21eb4a7768b0b4}, issn = {1520-6149}, keywords = {Viterbi algorithm context cost;state cost;waveform database;natural database;training;transition decoding;concatenative decoding;search estimation;Viterbi information;prosodic information;pruned languages;Network network;synthesis occupancy problems;speech recognition;Speech search;state sequence;phonetic speech speech;natural-sounding speech;phoneme synthesis synthesis;Control synthesis;Costs;Databases;Laboratories;Natural synthesis;Speech synthesis;State synthesis;Viterbi synthesized system system;database transition unit;large}, month = may, owner = {schabus}, pages = {373-376}, timestamp = {2021-02-01T10:51:23.000+0100}, title = {Unit selection in a concatenative speech synthesis system using a large speech database}, volume = 1, year = 1996 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Unit selection in a concatenative speech synthesis system using a large speech database

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Unit selection in a concatenative speech synthesis system using a large speech database

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Unit selection in a concatenative speech synthesis system using a large speech database

Comments and Reviews
(0)