Abstract
Most low resource language technology development is premised on the need to collect data for training statistical models. When we follow the typical process of recording and transcribing text for small Indigenous languages, we hit up against the so-called “transcription bottleneck.” Therefore it is worth exploring new ways of engaging with speakers which generate data while avoiding the transcription bottleneck. We have deployed a prototype app for speakers to use for confirming system guesses in an approach to transcription based on word spotting. However, in the process of testing the app we encountered many new problems for engagement with speakers. This paper presents a close-up study of the process of deploying data capture technology on the ground in an Australian Aboriginal community. We reflect on our interactions with participants and draw lessons that apply to anyone seeking to develop methods for language data collection in an Indigenous community.
Original language | English |
---|---|
Title of host publication | ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) |
Editors | Smaranda Muresan, Preslav Nakov, Aline Villavicencio |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 4988-4998 |
Number of pages | 11 |
Volume | 1 |
ISBN (Electronic) | 9781955917216 |
DOIs | |
Publication status | Published - 2022 |
Event | 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, Ireland Duration: 22 May 2022 → 27 May 2022 |
Publication series
Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
---|---|
Volume | 1 |
ISSN (Print) | 0736-587X |
Conference
Conference | 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 |
---|---|
Country/Territory | Ireland |
City | Dublin |
Period | 22/05/22 → 27/05/22 |
Bibliographical note
Publisher Copyright:© 2022 Association for Computational Linguistics.