ETROVUB

Lukas Latacz, Wesley Mattheyses, Werner Verhelst

Chapter in Book/ Report/ Conference proceeding

Abstract ■

A pronunciation lexicon for speech synthesis is a key component of a modern speech synthesizer, containing the orthography and phonemic transcriptions of a large number of words. A lexicon may contain words with multiple pronunciations, such as reduced and full versions of (function) words, homographs, or other types of words with multiple acceptable pronunciations such as foreign words or names. Pronunciation variants should therefore be taken into account during voice-building (e.g. segmentation and labeling of a speech database), as well as during synthesis. In this paper we outline a strategy to automatically deal with these variants, resulting in a speaker-specific pronunciation. Based on a labeled speech database, the pronunciation lexicon is pruned in order to remove as much as possible pronunciation variation from the lexicon. This pruned lexicon can be used to train speaker-specific letter-to-sound rules. If the speaker has uttered a word in different ways, then these variants are not pruned. Instead, decision trees are trained for each of those words, which are used to select the most suitable pronunciation during synthesis. We tested our approach on five speech databases, and two lexicons per speech database. The automatic selection of pronunciation variants yielded a small improvement over the baseline (selecting always the most common variant).

Reference ■

Latacz, L, Mattheyses, W & Verhelst, W 2013, Speaker-specific Pronunciation for Speech Synthesis. in I Habernal & V Matousek (eds), Text, Speech, and Dialogue. 16th International Conference TSD 2013. Proceedings: LNCS 8082. Springer Verlag, Berlin, Germany, pp. 501-508, 16th International Conference, TSD 2013, Pilsen, Czech Republic, 1/09/13.

Latacz, L., Mattheyses, W., & Verhelst, W. (2013). Speaker-specific Pronunciation for Speech Synthesis. In I. Habernal, & V. Matousek (Eds.), Text, Speech, and Dialogue. 16th International Conference TSD 2013. Proceedings: LNCS 8082 (pp. 501-508). Springer Verlag.

@inproceedings{e66655464906406e97344a65658f4d1b,
title = "Speaker-specific Pronunciation for Speech Synthesis",
abstract = "A pronunciation lexicon for speech synthesis is a key component of a modern speech synthesizer, containing the orthography and phonemic transcriptions of a large number of words. A lexicon may contain words with multiple pronunciations, such as reduced and full versions of (function) words, homographs, or other types of words with multiple acceptable pronunciations such as foreign words or names. Pronunciation variants should therefore be taken into account during voice-building (e.g. segmentation and labeling of a speech database), as well as during synthesis. In this paper we outline a strategy to automatically deal with these variants, resulting in a speaker-specific pronunciation. Based on a labeled speech database, the pronunciation lexicon is pruned in order to remove as much as possible pronunciation variation from the lexicon. This pruned lexicon can be used to train speaker-specific letter-to-sound rules. If the speaker has uttered a word in different ways, then these variants are not pruned. Instead, decision trees are trained for each of those words, which are used to select the most suitable pronunciation during synthesis. We tested our approach on five speech databases, and two lexicons per speech database. The automatic selection of pronunciation variants yielded a small improvement over the baseline (selecting always the most common variant).",
keywords = "speaker-specific pronunciatio, speech synthesis, pronunciation lexicon, phonemic transcriptions, speaker-specific letter-to-sound rules",
author = "Lukas Latacz and Wesley Mattheyses and Werner Verhelst",
note = "Habernal, I.; Matousek, V.; 16th International Conference, TSD 2013 ; Conference date: 01-09-2013 Through 05-09-2013",
year = "2013",
month = sep,
day = "1",
language = "English",
isbn = "978-3-642-40584-6",
pages = "501--508",
editor = "I. Habernal and V. Matousek",
booktitle = "Text, Speech, and Dialogue. 16th International Conference TSD 2013. Proceedings: LNCS 8082",
publisher = "Springer Verlag",
address = "Germany",
}