Reconsidering language identification for written language resources

Baden Hughes, Timothy Baldwin, Steven Bird, Jeremy Nicholson, Andrew MacKinlay

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedings

Abstract

The task of identifying the language in which a given document (ranging from a sentence to thousands of pages) is written has been relatively well studied over several decades. Automated approaches to written language identification are used widely throughout research and industrial contexts, over both oral and written source materials. Despite this widespread acceptance, a review of previous research in written language identification reveals a number of questions which remain open and ripe for further investigation.

Original languageEnglish
Title of host publicationProceedings, 5th International Conference on Language Resources and Evaluation (LREC2006)
PublisherEuropean Language Resources Association (ELRA)
Pages485-488
Number of pages4
Publication statusPublished - 2006
Externally publishedYes
Event5th International Conference on Language Resources and Evaluation, LREC 2006 - Genoa, Italy
Duration: 22 May 200628 May 2006

Conference

Conference5th International Conference on Language Resources and Evaluation, LREC 2006
CountryItaly
CityGenoa
Period22/05/0628/05/06

Fingerprint Dive into the research topics of 'Reconsidering language identification for written language resources'. Together they form a unique fingerprint.

  • Cite this

    Hughes, B., Baldwin, T., Bird, S., Nicholson, J., & MacKinlay, A. (2006). Reconsidering language identification for written language resources. In Proceedings, 5th International Conference on Language Resources and Evaluation (LREC2006) (pp. 485-488). European Language Resources Association (ELRA).