Abstract
Most web content exists in a few dozen languages. Hundreds of other languages - the 'low-density languages' - are only represented in scarce quantities on the web. How can we locate, store and describe these low-density resources? In particular, how can we identify linguistically interesting resources, such as translation sets and multilingual documents? In this paper we describe ongoing research in which we integrate a number of discrete systems (language data crawler, automated metadata generation tools, language data repositories and federated search services) to address the identification, retrieval, description, storage and access issues for low-density language materials from the web.
Original language | English |
---|---|
Title of host publication | Proceedings of the 12th Australasian Web Conference |
Publisher | Southern Cross University |
Publication status | Published - Dec 2006 |
Externally published | Yes |
Event | 12th Australasian World Wide Web Conference, AusWeb 2006 - Noosa, QLD, Australia Duration: 1 Jul 2006 → 5 Jul 2006 |
Conference
Conference | 12th Australasian World Wide Web Conference, AusWeb 2006 |
---|---|
Country/Territory | Australia |
City | Noosa, QLD |
Period | 1/07/06 → 5/07/06 |