Normalising audio transcriptions for unwritten languages

Adel Foda, Steven Bird

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedings

Abstract

The task of documenting the world’s languages is a mainstream activity in linguistics which is yet to spill over into computational linguistics. We propose a new task of transcription normalisation as an algorithmic method for speeding up the process of transcribing audio sources, leading to text collections of usable quality. We report on the application of sentence and word alignment algorithms to this task, before describing a new algorithm. All of the algorithms are evaluated over synthetic datasets. Although the results are nuanced, the transcription normalisation task is suggested as an NLP contribution to the grand challenge of documenting the world’s languages.
Original languageEnglish
Title of host publicationProceedings of 5th International Joint Conference on Natural Language Processing
Pages527-535
Number of pages9
Publication statusPublished - 2011
Externally publishedYes
EventInternational Joint Conference on Natural Language Processing - Chiang Mai, Thailand
Duration: 8 Nov 201113 Nov 2011
Conference number: 5th

Conference

ConferenceInternational Joint Conference on Natural Language Processing
CountryThailand
CityChiang Mai
Period8/11/1113/11/11

Fingerprint Dive into the research topics of 'Normalising audio transcriptions for unwritten languages'. Together they form a unique fingerprint.

  • Cite this

    Foda, A., & Bird, S. (2011). Normalising audio transcriptions for unwritten languages. In Proceedings of 5th International Joint Conference on Natural Language Processing (pp. 527-535)