TY - GEN
T1 - Learning From Failure
T2 - 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
AU - Le Ferrand, Eric
AU - Bird, Steven
AU - Besacier, Laurent
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Most low resource language technology development is premised on the need to collect data for training statistical models. When we follow the typical process of recording and transcribing text for small Indigenous languages, we hit up against the so-called “transcription bottleneck.” Therefore it is worth exploring new ways of engaging with speakers which generate data while avoiding the transcription bottleneck. We have deployed a prototype app for speakers to use for confirming system guesses in an approach to transcription based on word spotting. However, in the process of testing the app we encountered many new problems for engagement with speakers. This paper presents a close-up study of the process of deploying data capture technology on the ground in an Australian Aboriginal community. We reflect on our interactions with participants and draw lessons that apply to anyone seeking to develop methods for language data collection in an Indigenous community.
AB - Most low resource language technology development is premised on the need to collect data for training statistical models. When we follow the typical process of recording and transcribing text for small Indigenous languages, we hit up against the so-called “transcription bottleneck.” Therefore it is worth exploring new ways of engaging with speakers which generate data while avoiding the transcription bottleneck. We have deployed a prototype app for speakers to use for confirming system guesses in an approach to transcription based on word spotting. However, in the process of testing the app we encountered many new problems for engagement with speakers. This paper presents a close-up study of the process of deploying data capture technology on the ground in an Australian Aboriginal community. We reflect on our interactions with participants and draw lessons that apply to anyone seeking to develop methods for language data collection in an Indigenous community.
UR - http://www.scopus.com/inward/record.url?scp=85149140637&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.acl-long.342
DO - 10.18653/v1/2022.acl-long.342
M3 - Conference Paper published in Proceedings
AN - SCOPUS:85149140637
VL - 1
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 4988
EP - 4998
BT - ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
A2 - Muresan, Smaranda
A2 - Nakov, Preslav
A2 - Villavicencio, Aline
PB - Association for Computational Linguistics (ACL)
Y2 - 22 May 2022 through 27 May 2022
ER -