Local Word Discovery for Interactive Transcription

William Abbott Lane, Steven Bird

    Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedingspeer-review

    3 Citations (Scopus)

    Abstract

    Human expertise and the participation of speech communities are essential factors in the success of technologies for low-resource languages. Accordingly, we propose a new computational task which is tuned to the available knowledge and interests in an Indigenous community, and which supports the construction of high quality texts and lexicons. The task is illustrated for Kunwinjku, a morphologically-complex Australian language. We combine a finite state implementation of a published grammar with a partial lexicon, and apply this to a noisy phone representation of the signal. We locate known lexemes in the signal and use the morphological transducer to build these out into hypothetical, morphologically-complex words for human validation. We show that applying a single iteration of this method results in a relative transcription density gain of 17%. Further, we find that 75% of breath groups in the test set receive at least one correct partial or full-word suggestion.

    Original languageEnglish
    Title of host publicationEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
    Place of PublicationStroudsburg
    PublisherAssociation for Computational Linguistics (ACL)
    Pages2058-2067
    Number of pages10
    ISBN (Electronic)9781955917094
    Publication statusPublished - Nov 2021
    Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic
    Duration: 7 Nov 202111 Nov 2021

    Publication series

    NameEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

    Conference

    Conference2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
    Country/TerritoryDominican Republic
    CityVirtual, Punta Cana
    Period7/11/2111/11/21

    Fingerprint

    Dive into the research topics of 'Local Word Discovery for Interactive Transcription'. Together they form a unique fingerprint.

    Cite this