Abstract
We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment. This work is grounded in an almost zero-resource scenario where only a few terms have so far been identified, involving two endangered languages. We show that in the early stages of transcription, when the available data is insufficient to train a robust ASR system, it is possible to take advantage of the transcription of a small number of isolated words in order to bootstrap the transcription of a speech collection.
Original language | English |
---|---|
Title of host publication | COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference |
Editors | Donia Scott, Nuria Bel, Chengqing Zong |
Place of Publication | Czech Republic |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 3422-3428 |
Number of pages | 7 |
Volume | 1 |
ISBN (Electronic) | 9781952148279 |
DOIs | |
Publication status | Published - 2020 |
Event | 28th International Conference on Computational Linguistics, COLING 2020 - Virtual, Online, Spain Duration: 8 Dec 2020 → 13 Dec 2020 |
Publication series
Name | COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference |
---|
Conference
Conference | 28th International Conference on Computational Linguistics, COLING 2020 |
---|---|
Country/Territory | Spain |
City | Virtual, Online |
Period | 8/12/20 → 13/12/20 |
Bibliographical note
Funding Information:We are grateful to the Bininj people of Northern Australia for the opportunity to work in their community, and particularly to artists at Injalak Arts and Craft (Gunbalanya) and to the Warddeken Rangers (Kabulwarnamyo). Our thanks to several anonymous reviewers for helpful feedback on earlier versions of this paper. The lexical confirmation app presented in this paper has been designed by Mat Bettinson, at Charles Darwin University. This research was covered by a research permit from the Northern Land Council, ethics approved from CDU and was supported by the Australian government through a PhD scholarship, and grants from the Australian Research Council and the Indigenous Language and Arts Program.
Publisher Copyright:
© 2020 COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference. All rights reserved.