Abstract
The task of documenting the world’s languages is a mainstream activity in linguistics which is yet to spill over into computational linguistics. We propose a new task of transcription normalisation as an algorithmic method for speeding up the process of transcribing audio sources, leading to text collections of usable quality. We report on the application of sentence and word alignment algorithms to this task, before describing a new algorithm. All of the algorithms are evaluated over synthetic datasets. Although the results are nuanced, the transcription normalisation task is suggested as an NLP contribution to the grand challenge of documenting the world’s languages.
Original language | English |
---|---|
Title of host publication | Proceedings of 5th International Joint Conference on Natural Language Processing |
Pages | 527-535 |
Number of pages | 9 |
Publication status | Published - 2011 |
Externally published | Yes |
Event | International Joint Conference on Natural Language Processing - Chiang Mai, Thailand Duration: 8 Nov 2011 → 13 Nov 2011 Conference number: 5th |
Conference
Conference | International Joint Conference on Natural Language Processing |
---|---|
Country/Territory | Thailand |
City | Chiang Mai |
Period | 8/11/11 → 13/11/11 |