In this article, we present LipSynch, a software tool that can be used for the automatic replacement of speech dialogues in motion pictures, video or television series. The system operates in two steps: during analysis, the timing relationships between the speech segments of the dialogues that serve as a timing reference and the corresponding speech segments in the replacement dialogues are measured by means of a split Dynamic Time Warping algorithm. The obtained warping paths are then processed and used to synthesize high-quality natural-sounding speech dialogues that are precisely time-synchronized with the reference dialogues. Subjective audio-visual listening tests performed within the context of a difficult Automatic Dialogue Replacement task demonstrated that LipSynch achieves a significant improvement compared to the industry-standard benchmark VocALign, both in terms of achieved lip-synchronization accuracy as well as in overall speech quality of the synthesized dialogues.
Soens, P & Verhelst, W 2012, 'On split Dynamic Time Warping for robust Automatic Dialogue Replacement', Signal Processing, vol. 92, no. 2, pp. 439-454. <http://www.etro.vub.ac.be/Research/DSSP/DEMO/ADR/>
Soens, P., & Verhelst, W. (2012). On split Dynamic Time Warping for robust Automatic Dialogue Replacement. Signal Processing, 92(2), 439-454. http://www.etro.vub.ac.be/Research/DSSP/DEMO/ADR/
@article{b5b35a09d83a4fe592bfd30affe4d7f1,
title = "On split Dynamic Time Warping for robust Automatic Dialogue Replacement",
abstract = "In this article, we present LipSynch, a software tool that can be used for the automatic replacement of speech dialogues in motion pictures, video or television series. The system operates in two steps: during analysis, the timing relationships between the speech segments of the dialogues that serve as a timing reference and the corresponding speech segments in the replacement dialogues are measured by means of a split Dynamic Time Warping algorithm. The obtained warping paths are then processed and used to synthesize high-quality natural-sounding speech dialogues that are precisely time-synchronized with the reference dialogues. Subjective audio-visual listening tests performed within the context of a difficult Automatic Dialogue Replacement task demonstrated that LipSynch achieves a significant improvement compared to the industry-standard benchmark VocALign, both in terms of achieved lip-synchronization accuracy as well as in overall speech quality of the synthesized dialogues.",
keywords = "Dynamic Time Warping, Automatic Dialogue Replacement, Time synchronization",
author = "Pieter Soens and Werner Verhelst",
year = "2012",
month = feb,
day = "1",
language = "English",
volume = "92",
pages = "439--454",
journal = "Signal Processing",
issn = "0165-1684",
publisher = "Elsevier",
number = "2",
}