Evaluating phonemic transcription of low-resource tonal languages for language documentation

Oliver Adams, Trevor Cohn, Graham Neubig, Hilaria Cruz, Steven Bird, Alexis Michaud

    Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedingspeer-review

    42 Downloads (Pure)

    Abstract

    Transcribing speech is an important part of language documentation, yet speech recognition technology has not been widely harnessed to aid linguists. We explore the use of a neural network architecture with the connectionist temporal classification loss function for phonemic and tonal transcription in a language documentation setting. In this framework, we explore jointly modelling phonemes and tones versus modelling them separately, and assess the importance of pitch information versus phonemic context for tonal prediction. Experiments on two tonal languages, Yongning Na and Eastern Chatino, show the changes in recognition performance as training data is scaled from 10 minutes up to 50 minutes for Chatino, and up to 224 minutes for Na. We discuss the findings from incorporating this technology into the linguistic workflow for documenting Yongning Na, which show the method's promise in improving efficiency, minimizing typographical errors, and maintaining the transcription's faithfulness to the acoustic signal, while highlighting phonetic and phonemic facts for linguistic consideration.

    Original languageEnglish
    Title of host publicationLREC 2018 - 11th International Conference on Language Resources and Evaluation
    EditorsHitoshi Isahara, Bente Maegaard, Stelios Piperidis, Christopher Cieri, Thierry Declerck, Koiti Hasida, Helene Mazo, Khalid Choukri, Sara Goggi, Joseph Mariani, Asuncion Moreno, Nicoletta Calzolari, Jan Odijk, Takenobu Tokunaga
    PublisherEuropean Language Resources Association (ELRA)
    Pages3356-3365
    Number of pages10
    Edition1
    ISBN (Electronic)9791095546009
    Publication statusPublished - 1 Jan 2019
    Event11th International Conference on Language Resources and Evaluation, LREC 2018 - Miyazaki, Japan
    Duration: 7 May 201812 May 2018

    Conference

    Conference11th International Conference on Language Resources and Evaluation, LREC 2018
    Country/TerritoryJapan
    CityMiyazaki
    Period7/05/1812/05/18

    Fingerprint

    Dive into the research topics of 'Evaluating phonemic transcription of low-resource tonal languages for language documentation'. Together they form a unique fingerprint.

    Cite this