In this paper we describe the voices we submitted to the 2010 Blizzard Challenge, a yearly challenge to evaluate auditory speech synthesis on common data. One of the goals of a data-driven synthesizer, such as ours, is to generalize the speech database in such a way that it allows a realistic rendition of unseen input text. The two main changes to our system, compared to previous submissions, are the inclusion of an HMM-based acoustic prosody model, and the automatic training of context-dependent target cost weights. These weights are estimated for each individual target during synthesis, and depend on the linguistic features of these targets which encompass their broader linguistic context. Another new aspect of our synthesizer is the ability to synthesize Mandarin Chinese speech. Its evaluation helps us assess the quality of our synthesizer for languages unfamiliar to the voice developers. Evaluation results and possible improvements to our synthesizer are also discussed.
Latacz, L, Mattheyses, W & Verhelst, W 2010, The VUB Blizzard Challenge 2010 Entry: Towards Automatic Voice Building. in Blizzard Challenge 2010, Kansai Science City, Japan. Finds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet, Stockholm, Sweden, 21/09/09. <http://www.etro.vub.ac.be/Research/DSSP/PUB_FILES/int_conf/Blizzard2010_Latacz.pdf>
Latacz, L., Mattheyses, W., & Verhelst, W. (2010). The VUB Blizzard Challenge 2010 Entry: Towards Automatic Voice Building. In Blizzard Challenge 2010, Kansai Science City, Japan http://www.etro.vub.ac.be/Research/DSSP/PUB_FILES/int_conf/Blizzard2010_Latacz.pdf
@inproceedings{607ff2bc921d4317bbfb634460ba0770,
title = "The VUB Blizzard Challenge 2010 Entry: Towards Automatic Voice Building",
abstract = "In this paper we describe the voices we submitted to the 2010 Blizzard Challenge, a yearly challenge to evaluate auditory speech synthesis on common data. One of the goals of a data-driven synthesizer, such as ours, is to generalize the speech database in such a way that it allows a realistic rendition of unseen input text. The two main changes to our system, compared to previous submissions, are the inclusion of an HMM-based acoustic prosody model, and the automatic training of context-dependent target cost weights. These weights are estimated for each individual target during synthesis, and depend on the linguistic features of these targets which encompass their broader linguistic context. Another new aspect of our synthesizer is the ability to synthesize Mandarin Chinese speech. Its evaluation helps us assess the quality of our synthesizer for languages unfamiliar to the voice developers. Evaluation results and possible improvements to our synthesizer are also discussed.",
keywords = "speech synthesis, unit selection, weight training, evaluation",
author = "Lukas Latacz and Wesley Mattheyses and Werner Verhelst",
year = "2010",
month = sep,
day = "25",
language = "English",
booktitle = "Blizzard Challenge 2010, Kansai Science City, Japan",
note = "Finds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet ; Conference date: 21-09-2009 Through 25-09-2009",
}