We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment. This work is grounded in an almost zero-resource scenario where only a few terms have so far been identified, involving two endangered languages. We show that in the early stages of transcription, when the available data is insufficient to train a robust ASR system, it is possible to take advantage of the transcription of a small number of isolated words in order to bootstrap the transcription of a speech collection.
|Title of host publication||Proceedings of the 28th International Conference on Computational Linguistics|
|Number of pages||7|
|Publication status||Published - 2020|
|Event||The 28th International Conference on Computational Linguistics: COLING 2020 - Barcelona, Spain|
Duration: 8 Dec 2020 → 13 Dec 2020
|Conference||The 28th International Conference on Computational Linguistics|
|Period||8/12/20 → 13/12/20|