TY - GEN
T1 - Learning crosslingual word embeddings without bilingual corpora
AU - Duong, Long
AU - Kanayama, Hiroshi
AU - Ma, Tengfei
AU - Bird, Steven
AU - Cohn, Trevor
PY - 2016/1/1
Y1 - 2016/1/1
N2 - Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools. However, previous attempts had expensive resource requirements, difficulty incorporating monolingual data or were unable to handle polysemy. We address these drawbacks in our method which takes advantage of a high coverage dictionary in an EM style training algorithm over monolingual corpora in two languages. Our model achieves state-of-the-art performance on bilingual lexicon induction task exceeding models using large bilingual corpora, and competitive results on the monolingual word similarity and cross-lingual document classification task.
AB - Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools. However, previous attempts had expensive resource requirements, difficulty incorporating monolingual data or were unable to handle polysemy. We address these drawbacks in our method which takes advantage of a high coverage dictionary in an EM style training algorithm over monolingual corpora in two languages. Our model achieves state-of-the-art performance on bilingual lexicon induction task exceeding models using large bilingual corpora, and competitive results on the monolingual word similarity and cross-lingual document classification task.
UR - http://www.scopus.com/inward/record.url?scp=85072837282&partnerID=8YFLogxK
U2 - 10.18653/v1/D16-1136
DO - 10.18653/v1/D16-1136
M3 - Conference Paper published in Proceedings
VL - 1
T3 - EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 1285
EP - 1295
BT - EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings
A2 - Su, Jian
A2 - Duh, Kevin
A2 - Carreras, Xavier
PB - Association for Computational Linguistics (ACL)
CY - Pennsylvania
T2 - 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016
Y2 - 1 November 2016 through 5 November 2016
ER -