Abstract
Transcribing speech is an important part of language documentation, yet speech recognition technology has not been widely harnessed to aid linguists. We explore the use of a neural network architecture with the connectionist temporal classification loss function for phonemic and tonal transcription in a language documentation setting. In this framework, we explore jointly modelling phonemes and tones versus modelling them separately, and assess the importance of pitch information versus phonemic context for tonal prediction. Experiments on two tonal languages, Yongning Na and Eastern Chatino, show the changes in recognition performance as training data is scaled from 10 minutes up to 50 minutes for Chatino, and up to 224 minutes for Na. We discuss the findings from incorporating this technology into the linguistic workflow for documenting Yongning Na, which show the method's promise in improving efficiency, minimizing typographical errors, and maintaining the transcription's faithfulness to the acoustic signal, while highlighting phonetic and phonemic facts for linguistic consideration.
Language | English |
---|---|
Title of host publication | LREC 2018 - 11th International Conference on Language Resources and Evaluation |
Editors | Hitoshi Isahara, Bente Maegaard, Stelios Piperidis, Christopher Cieri, Thierry Declerck, Koiti Hasida, Helene Mazo, Khalid Choukri, Sara Goggi, Joseph Mariani, Asuncion Moreno, Nicoletta Calzolari, Jan Odijk, Takenobu Tokunaga |
Publisher | European Language Resources Association (ELRA) |
Pages | 3356-3365 |
Number of pages | 10 |
ISBN (Electronic) | 9791095546009 |
State | Published - 1 Jan 2019 |
Event | 11th International Conference on Language Resources and Evaluation, LREC 2018 - Miyazaki, Japan Duration: 7 May 2018 → 12 May 2018 |
Conference
Conference | 11th International Conference on Language Resources and Evaluation, LREC 2018 |
---|---|
Country | Japan |
City | Miyazaki |
Period | 7/05/18 → 12/05/18 |
Fingerprint
Cite this
}
Evaluating phonemic transcription of low-resource tonal languages for language documentation. / Adams, Oliver; Cohn, Trevor; Neubig, Graham; Cruz, Hilaria; Bird, Steven; Michaud, Alexis.
LREC 2018 - 11th International Conference on Language Resources and Evaluation. ed. / Hitoshi Isahara; Bente Maegaard; Stelios Piperidis; Christopher Cieri; Thierry Declerck; Koiti Hasida; Helene Mazo; Khalid Choukri; Sara Goggi; Joseph Mariani; Asuncion Moreno; Nicoletta Calzolari; Jan Odijk; Takenobu Tokunaga. European Language Resources Association (ELRA), 2019. p. 3356-3365.Research output: Chapter in Book/Report/Conference proceeding › Conference Paper published in Proceedings › Research › peer-review
TY - GEN
T1 - Evaluating phonemic transcription of low-resource tonal languages for language documentation
AU - Adams,Oliver
AU - Cohn,Trevor
AU - Neubig,Graham
AU - Cruz,Hilaria
AU - Bird,Steven
AU - Michaud,Alexis
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Transcribing speech is an important part of language documentation, yet speech recognition technology has not been widely harnessed to aid linguists. We explore the use of a neural network architecture with the connectionist temporal classification loss function for phonemic and tonal transcription in a language documentation setting. In this framework, we explore jointly modelling phonemes and tones versus modelling them separately, and assess the importance of pitch information versus phonemic context for tonal prediction. Experiments on two tonal languages, Yongning Na and Eastern Chatino, show the changes in recognition performance as training data is scaled from 10 minutes up to 50 minutes for Chatino, and up to 224 minutes for Na. We discuss the findings from incorporating this technology into the linguistic workflow for documenting Yongning Na, which show the method's promise in improving efficiency, minimizing typographical errors, and maintaining the transcription's faithfulness to the acoustic signal, while highlighting phonetic and phonemic facts for linguistic consideration.
AB - Transcribing speech is an important part of language documentation, yet speech recognition technology has not been widely harnessed to aid linguists. We explore the use of a neural network architecture with the connectionist temporal classification loss function for phonemic and tonal transcription in a language documentation setting. In this framework, we explore jointly modelling phonemes and tones versus modelling them separately, and assess the importance of pitch information versus phonemic context for tonal prediction. Experiments on two tonal languages, Yongning Na and Eastern Chatino, show the changes in recognition performance as training data is scaled from 10 minutes up to 50 minutes for Chatino, and up to 224 minutes for Na. We discuss the findings from incorporating this technology into the linguistic workflow for documenting Yongning Na, which show the method's promise in improving efficiency, minimizing typographical errors, and maintaining the transcription's faithfulness to the acoustic signal, while highlighting phonetic and phonemic facts for linguistic consideration.
KW - Asian languages
KW - Language documentation
KW - Low-resource languages
KW - Mesoamerican languages
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85058650789&partnerID=8YFLogxK
M3 - Conference Paper published in Proceedings
SP - 3356
EP - 3365
BT - LREC 2018 - 11th International Conference on Language Resources and Evaluation
PB - European Language Resources Association (ELRA)
ER -