Collecting bilingual audio in remote indigenous communities

Steven Bird, Lauren Gawne, Katie Gelbart, Isaac Mcalister

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in ProceedingsResearchpeer-review

Abstract

Most of the world's languages are under-resourced, and most under-resourced languages lack a writing system and literary tradition. As these languages fall out of use, we lose important sources of data that contribute to our understanding of human language. The first, urgent step is to collect and orally translate a large quantity of spoken language. This can be digitally archived and later transcribed, annotated, and subjected to the full range of speech and language processing tasks, at any time in future. We have been investigating a mobile application for recording and translating unwritten languages. We visited indigenous communities in Brazil and Nepal and taught people to use smartphones for recording spoken language and for orally interpreting it into the national language, and collected bilingual phrase-aligned speech recordings. In spite of several technical and social issues, we found that the technology enabled an effective workflow for speech data collection. Based on this experience, we argue that the use of special-purpose software on smartphones is an effective and scalable method for large-scale collection of bilingual audio, and ultimately bilingual text, for languages spoken in remote indigenous communities.

Original languageEnglish
Title of host publicationCOLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014
Subtitle of host publicationTechnical Papers
Place of PublicationDublin, Ireland
PublisherAssociation for Computational Linguistics, ACL Anthology
Pages1015-1024
Number of pages10
ISBN (Electronic)9781941643266
Publication statusPublished - 1 Jan 2014
Externally publishedYes
Event25th International Conference on Computational Linguistics, COLING 2014 - Dublin, Ireland
Duration: 23 Aug 201429 Aug 2014

Conference

Conference25th International Conference on Computational Linguistics, COLING 2014
CountryIreland
CityDublin
Period23/08/1429/08/14

Fingerprint

language
spoken language
community
recording
workflow
Nepal
social issue
Language
Indigenous Communities
Brazil
lack
Spoken Language
experience
Data Collection
Literary Tradition
Human Language
Software
Teaching
National Language
Writing Systems

Cite this

Bird, S., Gawne, L., Gelbart, K., & Mcalister, I. (2014). Collecting bilingual audio in remote indigenous communities. In COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers (pp. 1015-1024). Dublin, Ireland: Association for Computational Linguistics, ACL Anthology.
Bird, Steven ; Gawne, Lauren ; Gelbart, Katie ; Mcalister, Isaac. / Collecting bilingual audio in remote indigenous communities. COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers. Dublin, Ireland : Association for Computational Linguistics, ACL Anthology, 2014. pp. 1015-1024
@inproceedings{42bd17dcd1984946a23c7b262ed47a9f,
title = "Collecting bilingual audio in remote indigenous communities",
abstract = "Most of the world's languages are under-resourced, and most under-resourced languages lack a writing system and literary tradition. As these languages fall out of use, we lose important sources of data that contribute to our understanding of human language. The first, urgent step is to collect and orally translate a large quantity of spoken language. This can be digitally archived and later transcribed, annotated, and subjected to the full range of speech and language processing tasks, at any time in future. We have been investigating a mobile application for recording and translating unwritten languages. We visited indigenous communities in Brazil and Nepal and taught people to use smartphones for recording spoken language and for orally interpreting it into the national language, and collected bilingual phrase-aligned speech recordings. In spite of several technical and social issues, we found that the technology enabled an effective workflow for speech data collection. Based on this experience, we argue that the use of special-purpose software on smartphones is an effective and scalable method for large-scale collection of bilingual audio, and ultimately bilingual text, for languages spoken in remote indigenous communities.",
author = "Steven Bird and Lauren Gawne and Katie Gelbart and Isaac Mcalister",
year = "2014",
month = "1",
day = "1",
language = "English",
pages = "1015--1024",
booktitle = "COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014",
publisher = "Association for Computational Linguistics, ACL Anthology",

}

Bird, S, Gawne, L, Gelbart, K & Mcalister, I 2014, Collecting bilingual audio in remote indigenous communities. in COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers. Association for Computational Linguistics, ACL Anthology, Dublin, Ireland, pp. 1015-1024, 25th International Conference on Computational Linguistics, COLING 2014, Dublin, Ireland, 23/08/14.

Collecting bilingual audio in remote indigenous communities. / Bird, Steven; Gawne, Lauren; Gelbart, Katie; Mcalister, Isaac.

COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers. Dublin, Ireland : Association for Computational Linguistics, ACL Anthology, 2014. p. 1015-1024.

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in ProceedingsResearchpeer-review

TY - GEN

T1 - Collecting bilingual audio in remote indigenous communities

AU - Bird, Steven

AU - Gawne, Lauren

AU - Gelbart, Katie

AU - Mcalister, Isaac

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Most of the world's languages are under-resourced, and most under-resourced languages lack a writing system and literary tradition. As these languages fall out of use, we lose important sources of data that contribute to our understanding of human language. The first, urgent step is to collect and orally translate a large quantity of spoken language. This can be digitally archived and later transcribed, annotated, and subjected to the full range of speech and language processing tasks, at any time in future. We have been investigating a mobile application for recording and translating unwritten languages. We visited indigenous communities in Brazil and Nepal and taught people to use smartphones for recording spoken language and for orally interpreting it into the national language, and collected bilingual phrase-aligned speech recordings. In spite of several technical and social issues, we found that the technology enabled an effective workflow for speech data collection. Based on this experience, we argue that the use of special-purpose software on smartphones is an effective and scalable method for large-scale collection of bilingual audio, and ultimately bilingual text, for languages spoken in remote indigenous communities.

AB - Most of the world's languages are under-resourced, and most under-resourced languages lack a writing system and literary tradition. As these languages fall out of use, we lose important sources of data that contribute to our understanding of human language. The first, urgent step is to collect and orally translate a large quantity of spoken language. This can be digitally archived and later transcribed, annotated, and subjected to the full range of speech and language processing tasks, at any time in future. We have been investigating a mobile application for recording and translating unwritten languages. We visited indigenous communities in Brazil and Nepal and taught people to use smartphones for recording spoken language and for orally interpreting it into the national language, and collected bilingual phrase-aligned speech recordings. In spite of several technical and social issues, we found that the technology enabled an effective workflow for speech data collection. Based on this experience, we argue that the use of special-purpose software on smartphones is an effective and scalable method for large-scale collection of bilingual audio, and ultimately bilingual text, for languages spoken in remote indigenous communities.

UR - http://www.scopus.com/inward/record.url?scp=84959896240&partnerID=8YFLogxK

M3 - Conference Paper published in Proceedings

SP - 1015

EP - 1024

BT - COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014

PB - Association for Computational Linguistics, ACL Anthology

CY - Dublin, Ireland

ER -

Bird S, Gawne L, Gelbart K, Mcalister I. Collecting bilingual audio in remote indigenous communities. In COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers. Dublin, Ireland: Association for Computational Linguistics, ACL Anthology. 2014. p. 1015-1024