Learning crosslingual word embeddings without bilingual corpora

Long Duong, Hiroshi Kanayama, Tengfei Ma, Steven Bird, Trevor Cohn

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedingspeer-review

71 Citations (Scopus)
33 Downloads (Pure)

Abstract

Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools. However, previous attempts had expensive resource requirements, difficulty incorporating monolingual data or were unable to handle polysemy. We address these drawbacks in our method which takes advantage of a high coverage dictionary in an EM style training algorithm over monolingual corpora in two languages. Our model achieves state-of-the-art performance on bilingual lexicon induction task exceeding models using large bilingual corpora, and competitive results on the monolingual word similarity and cross-lingual document classification task.

Original languageEnglish
Title of host publicationEMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings
EditorsJian Su, Kevin Duh, Xavier Carreras
Place of PublicationPennsylvania
PublisherAssociation for Computational Linguistics (ACL)
Pages1285-1295
Number of pages11
Volume1
ISBN (Electronic)9781945626258
DOIs
Publication statusPublished - 1 Jan 2016
Externally publishedYes
Event2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016 - Austin, United States
Duration: 1 Nov 20165 Nov 2016

Publication series

NameEMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016
Country/TerritoryUnited States
CityAustin
Period1/11/165/11/16

Fingerprint

Dive into the research topics of 'Learning crosslingual word embeddings without bilingual corpora'. Together they form a unique fingerprint.

Cite this