Machine translation for language preservation

Steven Bird, David Chiang

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedingspeer-review

Abstract

Statistical machine translation has been remarkably successful for the world’s well-resourced languages, and much effort is focussed on creating and exploiting rich resources such as treebanks and wordnets. Machine translation can also support the urgent task of documenting the world’s endangered languages. The primary object of statistical translation models, bilingual aligned text, closely coincides with interlinear text, the primary artefact collected in
documentary linguistics. It ought to be possible to exploit this similarity in order to improve the quantity and quality of documentation for a language. Yet there are many technical and logistical problems to be addressed, starting with the problem that – for most of the languages in question – no texts or lexicons exist. In this position paper, we examine these challenges, and report on a data collection effort involving 15 endangered languages spoken in the highlands of
Papua New Guinea.
Original languageEnglish
Title of host publicationProceedings of the 24th International Conference on Computational Linguistics
Pages125-134
Number of pages10
Publication statusPublished - 2012
Externally publishedYes
EventInternational Conference on Computational Linguistics - Mumbai, India
Duration: 8 Dec 201215 Dec 2012
Conference number: 24th

Conference

ConferenceInternational Conference on Computational Linguistics
Abbreviated titleCOLING 2012
Country/TerritoryIndia
CityMumbai
Period8/12/1215/12/12

Cite this