Bootstrapping techniques for polysynthetic morphological analysis

William Lane, Steven Bird

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedingspeer-review

30 Downloads (Pure)

Abstract


Polysynthetic languages have exceptionally large and sparse vocabularies, thanks to the number of morpheme slots and combinations in a word. This complexity, together with a general scarcity of written data, poses a challenge to the development of natural language technologies. To address this challenge, we offer linguistically-informed approaches for bootstrapping a neural morphological analyzer, and demonstrate its application to Kunwinjku, a polysynthetic Australian language. We generate data from a finite state transducer to train an encoder-decoder model. We improve the model by" hallucinating" missing linguistic structure into the training data, and by resampling from a Zipf distribution to simulate a more natural distribution of morphemes. The best model accounts for all instances of reduplication in the test set and achieves an accuracy of 94.7% overall, a 10 percentage point improvement over the FST baseline. This process demonstrates the feasibility of bootstrapping a neural morph analyzer from minimal resources.
Original languageEnglish
Title of host publicationACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
EditorsDan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Place of PublicationPennsylvania
PublisherAssociation for Computational Linguistics (ACL)
Pages6652-6661
Number of pages15
Volume1
ISBN (Electronic)9781952148255
DOIs
Publication statusPublished - Jul 2020
Event58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Virtual, Online, United States
Duration: 5 Jul 202010 Jul 2020

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference58th Annual Meeting of the Association for Computational Linguistics, ACL 2020
Country/TerritoryUnited States
CityVirtual, Online
Period5/07/2010/07/20

Fingerprint

Dive into the research topics of 'Bootstrapping techniques for polysynthetic morphological analysis'. Together they form a unique fingerprint.

Cite this