Abstract
Bilingual corpora offer a promising bridge between resource-rich and resource-poor languages, enabling the development of natural language processing systems for the latter. English is often selected as the resource-rich language, but another choice might give better performance. In this paper, we consider the task of unsupervised cross-lingual POS tagging, and construct a model that predicts the best source language for a given target language. In experiments on 9 languages, this model improves on using a single fixed source language. We then show that further improvements can be made by combining information from multiple source languages.
Original language | English |
---|---|
Title of host publication | Proceedings of the Sixth International Joint Conference on Natural Language Processing |
Pages | 1243-1249 |
Number of pages | 7 |
Publication status | Published - 2013 |
Externally published | Yes |
Event | International Joint Conference on Natural Language Processing - Nagoya, Japan Duration: 14 Oct 2013 → 18 Oct 2013 Conference number: 6th |
Conference
Conference | International Joint Conference on Natural Language Processing |
---|---|
Country/Territory | Japan |
City | Nagoya |
Period | 14/10/13 → 18/10/13 |