Abstract
We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment. This work is grounded in an almost zero-resource scenario where only a few terms have so far been identified, involving two endangered languages. We show that in the early stages of transcription, when the available data is insufficient to train a robust ASR system, it is possible to take advantage of the transcription of a small number of isolated words in order to bootstrap the transcription of a speech collection.
Original language | English |
---|---|
Title of host publication | Proceedings of the 28th International Conference on Computational Linguistics |
Editors | Donia Scott, Nuria Bel, Chengqing Zong |
Place of Publication | Czech Republic |
Pages | 3422-3428 |
Number of pages | 7 |
Volume | 1 |
ISBN (Electronic) | 978-1-952148-27-9 |
DOIs | |
Publication status | Published - 2020 |
Event | The 28th International Conference on Computational Linguistics: COLING 2020 - Barcelona, Spain Duration: 8 Dec 2020 → 13 Dec 2020 https://www.aclweb.org/anthology/2020.coling-main.0.pdf |
Conference
Conference | The 28th International Conference on Computational Linguistics |
---|---|
Country/Territory | Spain |
City | Barcelona |
Period | 8/12/20 → 13/12/20 |
Internet address |