Speech-to-text, also known as Speech Recognition, is a technology that is able to recognize and transcribe spoken language into text. In subsequent steps, this transcription can be used to complete a multitude of tasks, such as providing automatic subtitles or parsing voice commands. In recent years, Speech-to-Text models have dramatically improved thanks partially to advances in Deep Learning methods. Starting from the open-source project DeepSpeech, we train speech-to-text models for Dutch, using the Corpus Gesproken Nederlands (CGN). First, we contribute a pre-processing pipeline for this dataset, to make it suitable for the task at hand, obtaining a ready-to-use speech-to-text dataset for Dutch. Second, we investigate the performance of Dutch and Flemish models trained from scratch, establishing a baseline for the CGN dataset for this task. Finally, we investigate the issue of transferring speech-to-text models between related languages. In this case, we analyse how a pre-trained English model can be transferred and fine-tuned for Dutch.
Röpke, W, Radulescu, R, Efthymiadis, K & Nowe, A 2019, Training a Speech-to-Text Model for Dutch on the Corpus Gesproken Nederlands. in K Beuls, B Bogaerts, G Bontempi, P Geurts, N Harley, B Lebichot, T Lenaerts, G Louppe & P Van Eecke (eds), Proceedings of the 31st Benelux Conference on Artificial Intelligence (BNAIC 2019). vol. 2491, CEUR Workshop Proceedings, CEUR Workshop Proceedings, 31st Benelux Conference on Artificial Intelligence, Brussels, Belgium, 6/11/19. <http://ceur-ws.org/Vol-2491/paper60.pdf>
Röpke, W., Radulescu, R., Efthymiadis, K., & Nowe, A. (2019). Training a Speech-to-Text Model for Dutch on the Corpus Gesproken Nederlands. In K. Beuls, B. Bogaerts, G. Bontempi, P. Geurts, N. Harley, B. Lebichot, T. Lenaerts, G. Louppe, & P. Van Eecke (Eds.), Proceedings of the 31st Benelux Conference on Artificial Intelligence (BNAIC 2019) (Vol. 2491). (CEUR Workshop Proceedings). CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2491/paper60.pdf
@inproceedings{8119760e56844343b9fa6a8cb8b06136,
title = "Training a Speech-to-Text Model for Dutch on the Corpus Gesproken Nederlands",
abstract = "Speech-to-text, also known as Speech Recognition, is a technology that is able to recognize and transcribe spoken language into text. In subsequent steps, this transcription can be used to complete a multitude of tasks, such as providing automatic subtitles or parsing voice commands. In recent years, Speech-to-Text models have dramatically improved thanks partially to advances in Deep Learning methods. Starting from the open-source project DeepSpeech, we train speech-to-text models for Dutch, using the Corpus Gesproken Nederlands (CGN). First, we contribute a pre-processing pipeline for this dataset, to make it suitable for the task at hand, obtaining a ready-to-use speech-to-text dataset for Dutch. Second, we investigate the performance of Dutch and Flemish models trained from scratch, establishing a baseline for the CGN dataset for this task. Finally, we investigate the issue of transferring speech-to-text models between related languages. In this case, we analyse how a pre-trained English model can be transferred and fine-tuned for Dutch.",
author = "Willem R{\"o}pke and Roxana Radulescu and Kyriakos Efthymiadis and Ann Nowe",
year = "2019",
month = nov,
day = "6",
language = "English",
volume = "2491",
series = "CEUR Workshop Proceedings",
publisher = "CEUR Workshop Proceedings",
editor = "Katrien Beuls and Bart Bogaerts and Gianluca Bontempi and Pierre Geurts and Nick Harley and Bertrand Lebichot and Tom Lenaerts and Gilles Louppe and {Van Eecke}, Paul",
booktitle = "Proceedings of the 31st Benelux Conference on Artificial Intelligence (BNAIC 2019)",
note = "31st Benelux Conference on Artificial Intelligence, BNAIC ; Conference date: 06-11-2019 Through 08-11-2019",
url = "https://ai-synergies.be",
}