Abstract
In this paper we address the problem of multilingual part-of-speech tagging for resource-poor languages. We use parallel data to transfer part-of-speech information from resource-rich to resourcepoor languages. Additionally, we use a small amount of annotated data to learn to "correct" errors from projected approach such as tagset mismatch between languages, achieving state-of-the-art performance (91.3%) across 8 languages. Our approach is based on modest data requirements, and uses minimum divergence classification. For situations where no universal tagset mapping is available, we propose an alternate method, resulting in state-of-the-art 85.6% accuracy on the resource-poor language Malagasy.
Original language | English |
---|---|
Title of host publication | EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference |
Place of Publication | Doha, Qatar |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 886-897 |
Number of pages | 12 |
ISBN (Electronic) | 9781937284961 |
Publication status | Published - 1 Jan 2014 |
Externally published | Yes |
Event | 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 - Doha, Qatar Duration: 25 Oct 2014 → 29 Oct 2014 |
Conference
Conference | 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 |
---|---|
Country/Territory | Qatar |
City | Doha |
Period | 25/10/14 → 29/10/14 |