Large-scale text collection for unwritten languages

Florian Hanke, Steven Bird

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedingspeer-review

Abstract

Existing methods for collecting texts from endangered languages are not creating the quantity of data that is needed for corpus studies and natural language processing tasks. This is because the process of transcribing and translating from audio recordings is too onerous. A more effective method, we argue, is to involve local speakers in the field location, using an audio-only translation interface that is
portable and easy to use. We present encouraging early results of an experimental investigation of the efficiency of creating translations using this method, and report
on the quality of the resulting content.
Original languageEnglish
Title of host publicationProceedings of the 6th International Joint Conference on Natural Language Processing
PublisherAsian Federation of Natural Language Processing
Pages1134-1138
Number of pages5
Publication statusPublished - 2013
Externally publishedYes
EventInternational Joint Conference on Natural Language Processing - Nagoya, Japan
Duration: 14 Oct 201318 Oct 2013
Conference number: 6th

Conference

ConferenceInternational Joint Conference on Natural Language Processing
Country/TerritoryJapan
CityNagoya
Period14/10/1318/10/13

Fingerprint

Dive into the research topics of 'Large-scale text collection for unwritten languages'. Together they form a unique fingerprint.

Cite this