In this paper, we present a robust system for the temporal alignment of 2 renditions of the same speech utterance. The system operates in 2 steps: during analysis, the timing relationships between the speech segments of the utterance that serves as a timing reference and the corresponding speech segments in the replacement utterance are measured by means of a dedicated dynamic time warping algorithm. The obtained warping paths are then processed and used to synthesize a high-quality speech utterance that is time-aligned with the reference. Subjective audio-visual listening tests performed within the context of a difficult Automatic Dialogue Replacement task demonstrated that the proposed system achieves a significant improvement compared to the industry-standard benchmark, both in terms of achieved lip-synchronization accuracy as well as in overall sound quality of the synthesized utterances.
Soens, P & Verhelst, W 2010, 'Robust temporal alignment of spontaneous and dubbed speech and its application for Automatic Dialogue Replacement', Proceedings of EUSIPCO, vol. 18, pp. 80-84. <http://www.etro.vub.ac.be/Research/DSSP/DEMO/ADR/>
Soens, P., & Verhelst, W. (2010). Robust temporal alignment of spontaneous and dubbed speech and its application for Automatic Dialogue Replacement. Proceedings of EUSIPCO, 18, 80-84. http://www.etro.vub.ac.be/Research/DSSP/DEMO/ADR/
@article{f901aad02b4347f987f3d046ba34ae9d,
title = "Robust temporal alignment of spontaneous and dubbed speech and its application for Automatic Dialogue Replacement",
abstract = "In this paper, we present a robust system for the temporal alignment of 2 renditions of the same speech utterance. The system operates in 2 steps: during analysis, the timing relationships between the speech segments of the utterance that serves as a timing reference and the corresponding speech segments in the replacement utterance are measured by means of a dedicated dynamic time warping algorithm. The obtained warping paths are then processed and used to synthesize a high-quality speech utterance that is time-aligned with the reference. Subjective audio-visual listening tests performed within the context of a difficult Automatic Dialogue Replacement task demonstrated that the proposed system achieves a significant improvement compared to the industry-standard benchmark, both in terms of achieved lip-synchronization accuracy as well as in overall sound quality of the synthesized utterances.",
keywords = "automatic temporal alignment, Automatic Dialogue Replacement, Dynamic Time Warping",
author = "Pieter Soens and Werner Verhelst",
year = "2010",
month = aug,
day = "23",
language = "English",
volume = "18",
pages = "80--84",
journal = "Proceedings of EUSIPCO",
issn = "2219-5491",
note = "EUSIPCO-2010: 18th European Signal Processing Conference, EUSIPCO 2010 ; Conference date: 23-08-2010 Through 27-08-2010",
url = "http://www.eusipco2010.org/",
}