Vocal Tract Length Normalization is a widely deployed speaker normalization technique, which compensates for vocal tract length differences among speakers by appropriately warping the frequency axis of the speech signal. In this work, we study the use of this technique on the time synchronization paradigm. An efficient bilinear frequency warping procedure is proposed, in which the amount of warping is iteratively optimized in accordance with a criterion that is directly related to the output of the standard Dynamic Time Warping algorithm. Subjective listening tests performed on mixed gender time-aligned results obtained with a subset of data from the English EUROM1 Many Talker Set have shown that the proposed procedure significantly improves the overall speech quality and the time synchronization accuracy with 85% and 91%, respectively.
Soens, P & Verhelst, W 2012, An Iterative Bilinear Frequency Warping Approach To Robust Speaker-Independent Time Synchronization. in Proceedings of the 20th European Signal Processing Conference (EUSIPCO). Signal Processing Conference (EUSIPCO) Proceedings, vol. 20, IEEE, pp. 355-359, 20th European Signal Processing Conference (EUSIPCO-2012), Bucharest, Romania, 27/08/12.
Soens, P., & Verhelst, W. (2012). An Iterative Bilinear Frequency Warping Approach To Robust Speaker-Independent Time Synchronization. In Proceedings of the 20th European Signal Processing Conference (EUSIPCO) (pp. 355-359). (Signal Processing Conference (EUSIPCO) Proceedings; Vol. 20). IEEE.
@inproceedings{d868aa4a698b43448f7c1dfa3e7f12a1,
title = "An Iterative Bilinear Frequency Warping Approach To Robust Speaker-Independent Time Synchronization",
abstract = "Vocal Tract Length Normalization is a widely deployed speaker normalization technique, which compensates for vocal tract length differences among speakers by appropriately warping the frequency axis of the speech signal. In this work, we study the use of this technique on the time synchronization paradigm. An efficient bilinear frequency warping procedure is proposed, in which the amount of warping is iteratively optimized in accordance with a criterion that is directly related to the output of the standard Dynamic Time Warping algorithm. Subjective listening tests performed on mixed gender time-aligned results obtained with a subset of data from the English EUROM1 Many Talker Set have shown that the proposed procedure significantly improves the overall speech quality and the time synchronization accuracy with 85% and 91%, respectively.",
keywords = "Time Synchronization, Vocal Tract Length Normalization, Dynamic Time Warping",
author = "Pieter Soens and Werner Verhelst",
year = "2012",
language = "English",
isbn = "978-1-4673-1068-0",
series = "Signal Processing Conference (EUSIPCO) Proceedings",
publisher = "IEEE",
pages = "355--359",
booktitle = "Proceedings of the 20th European Signal Processing Conference (EUSIPCO)",
note = "20th European Signal Processing Conference (EUSIPCO-2012) ; Conference date: 27-08-2012 Through 31-08-2012",
url = "http://www.eusipco2012.org/home.php",
}