Inducing bilingual lexicons from small quantities of sentence-aligned phonemic transcriptions

Oliver Adams, Graham Neubig, Trevor Cohn, Steven Bird

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedings

Abstract

We investigate induction of a bilingual lexicon from a corpus of phonemic transcriptions that have been sentence-aligned with English translations. We evaluate existing models that have been used for this purpose and report on two additional models, which demonstrate performance improvements. The first performs monolingual segmentation followed by alignment, while the second performs both tasks jointly. We show that monolingual and bilingual lexical entries can be learnt with high precision from corpora having just 1k 10k sentences. We explain how our results support the application of alignment algorithms to the task of documenting endangered languages.
Original languageEnglish
Title of host publicationProceedings of the International Workshop on Spoken Language Translation
Number of pages8
Publication statusPublished - 2015
Externally publishedYes

Fingerprint Dive into the research topics of 'Inducing bilingual lexicons from small quantities of sentence-aligned phonemic transcriptions'. Together they form a unique fingerprint.

  • Cite this

    Adams, O., Neubig, G., Cohn, T., & Bird, S. (2015). Inducing bilingual lexicons from small quantities of sentence-aligned phonemic transcriptions. In Proceedings of the International Workshop on Spoken Language Translation