Increasing the quality and quantity of source language data for unsupervised cross-lingual POS tagging

Long Duong, Paul Cook, Steven Bird, Pavel Pecina

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedings

Abstract

Bilingual corpora offer a promising bridge between resource-rich and resource-poor languages, enabling the development of natural language processing systems for the latter. English is often selected as the resource-rich language, but another choice might give better performance. In this paper, we consider the task of unsupervised cross-lingual POS tagging, and construct a model that predicts the best source language for a given target language. In experiments on 9 languages, this model improves on using a single fixed source language. We then show that further improvements can be made by combining information from multiple source languages.
Original languageEnglish
Title of host publicationProceedings of the Sixth International Joint Conference on Natural Language Processing
Pages1243-1249
Number of pages7
Publication statusPublished - 2013
Externally publishedYes
EventInternational Joint Conference on Natural Language Processing - Nagoya, Japan
Duration: 14 Oct 201318 Oct 2013
Conference number: 6th

Conference

ConferenceInternational Joint Conference on Natural Language Processing
CountryJapan
CityNagoya
Period14/10/1318/10/13

Fingerprint Dive into the research topics of 'Increasing the quality and quantity of source language data for unsupervised cross-lingual POS tagging'. Together they form a unique fingerprint.

  • Cite this

    Duong, L., Cook, P., Bird, S., & Pecina, P. (2013). Increasing the quality and quantity of source language data for unsupervised cross-lingual POS tagging. In Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 1243-1249)