copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai. IEEE Transactions on Audio, Speech, and Language Processing, 17 (1): 66-83 (January 2009)
DOI: 10.1109/TASL.2008.2006647

Abstract

In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speaker-dependent models, gender-independent models, and the simultaneous use of the gender-dependent models to single use of the gender-dependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMM-based speech synthesis.

Links and resources

BibTeX key: Yamagishi2009
entry type: article
year: 2009
month: jan
journal: IEEE Transactions on Audio, Speech, and Language Processing
number: 1
pages: 66-83
volume: 17
owner: mtoman
file: :pdfs/yamagishi_ieeetransaudio_2009.pdf:PDF
issn: 1558-7916
DOI: 10.1109/TASL.2008.2006647

Cite this publication

%0 Journal Article %1 Yamagishi2009 %A Yamagishi, Junichi %A Kobayashi, Takao %A Nakano, Yuji %A Ogata, Katsumi %A Isogai, Juri %D 2009 %J IEEE Transactions on Audio, Speech, and Language Processing %K (HMM)-based Adaptation Markov a adaptation adaptation,speaker algorithms,regression algorithms,speaker-dependent analysis,Average analysis,Speech analysis,speaker and construction,regression conversion criteria,gender-independent data,Vectors,constrained design estimation,Speech functions,voice likelihood linear,estimation matrix,Hidden maximum model model,Algorithm models,Linear models,hidden models,transform posteriori regression,Maximum speech structural synthesis,Training synthesis,model voice,Covariance %N 1 %P 66-83 %R 10.1109/TASL.2008.2006647 %T Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm %V 17 %X In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speaker-dependent models, gender-independent models, and the simultaneous use of the gender-dependent models to single use of the gender-dependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMM-based speech synthesis.

@article{Yamagishi2009, abstract = {In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speaker-dependent models, gender-independent models, and the simultaneous use of the gender-dependent models to single use of the gender-dependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMM-based speech synthesis.}, added-at = {2021-02-01T10:51:23.000+0100}, author = {Yamagishi, Junichi and Kobayashi, Takao and Nakano, Yuji and Ogata, Katsumi and Isogai, Juri}, biburl = {https://www.bibsonomy.org/bibtex/28feb88cdab99ddb96b5f1411cbceac29/m-toman}, doi = {10.1109/TASL.2008.2006647}, file = {:pdfs/yamagishi_ieeetransaudio_2009.pdf:PDF}, interhash = {ad630e4a9100009ae5be8bf9d0491f51}, intrahash = {8feb88cdab99ddb96b5f1411cbceac29}, issn = {1558-7916}, journal = {IEEE Transactions on Audio, Speech, and Language Processing}, keywords = {(HMM)-based Adaptation Markov a adaptation adaptation,speaker algorithms,regression algorithms,speaker-dependent analysis,Average analysis,Speech analysis,speaker and construction,regression conversion criteria,gender-independent data,Vectors,constrained design estimation,Speech functions,voice likelihood linear,estimation matrix,Hidden maximum model model,Algorithm models,Linear models,hidden models,transform posteriori regression,Maximum speech structural synthesis,Training synthesis,model voice,Covariance}, month = jan, number = 1, owner = {mtoman}, pages = {66-83}, timestamp = {2021-02-01T10:51:23.000+0100}, title = {Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm}, volume = 17, year = 2009 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

Comments and Reviews
(0)