Large-scale text collection for unwritten languages

Florian Hanke, Steven Bird

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedings

Abstract

Existing methods for collecting texts from endangered languages are not creating the quantity of data that is needed for corpus studies and natural language processing tasks. This is because the process of transcribing and translating from audio recordings is too onerous. A more effective method, we argue, is to involve local speakers in the field location, using an audio-only translation interface that is
portable and easy to use. We present encouraging early results of an experimental investigation of the efficiency of creating translations using this method, and report
on the quality of the resulting content.
Original languageEnglish
Title of host publicationProceedings of the 6th International Joint Conference on Natural Language Processing
PublisherAsian Federation of Natural Language Processing
Pages1134-1138
Number of pages5
Publication statusPublished - 2013
Externally publishedYes
EventInternational Joint Conference on Natural Language Processing - Nagoya, Japan
Duration: 14 Oct 201318 Oct 2013
Conference number: 6th

Conference

ConferenceInternational Joint Conference on Natural Language Processing
CountryJapan
CityNagoya
Period14/10/1318/10/13

Fingerprint Dive into the research topics of 'Large-scale text collection for unwritten languages'. Together they form a unique fingerprint.

  • Cite this

    Hanke, F., & Bird, S. (2013). Large-scale text collection for unwritten languages. In Proceedings of the 6th International Joint Conference on Natural Language Processing (pp. 1134-1138). Asian Federation of Natural Language Processing.