An Iterative Bilinear Frequency Warping Approach To Robust Speaker-Independent Time Synchronization
 
An Iterative Bilinear Frequency Warping Approach To Robust Speaker-Independent Time Synchronization 
 
Pieter Soens, Werner Verhelst
 
Abstract 

Vocal Tract Length Normalization is a widely deployed speaker normalization technique, which compensates for vocal tract length differences among speakers by appropriately warping the frequency axis of the speech signal. In this work, we study the use of this technique on the time synchronization paradigm. An efficient bilinear frequency warping procedure is proposed, in which the amount of warping is iteratively optimized in accordance with a criterion that is directly related to the output of the standard Dynamic Time Warping algorithm. Subjective listening tests performed on mixed gender time-aligned results obtained with a subset of data from the English EUROM1 Many Talker Set have shown that the proposed procedure significantly improves the overall speech quality and the time synchronization accuracy with 85% and 91%, respectively.