Mining Language Resources from Institutional Repositories

Gary F Simons, Steven Bird, Christopher Hirt, Joshua Hou, Sven Pedersen

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedings


Language resources are the bread and butter of language documentation and linguistic investigation. They include the primary objects of study such as texts and recordings, the outputs of research such as dictionaries and grammars, and the enabling technologies such as software tools and interchange standards. Increasingly, these resources are maintained in digital form and distributed via the web. However, searching on the web for language resources is a hit-and-miss affair. One problem is that many online resources are hidden behind interfaces to databases with the result that only a fraction of these resources are being indexed by search engines (He and others 2007). Even when resources are exposed to online search engines, they may not be discoverable since they are described in ad hoc ways that prevent searches from retrieving the desired results with high recall or precision.
Original languageEnglish
Title of host publicationProceedings of Digital Humanities 2011
Number of pages2
Publication statusPublished - 2011
Externally publishedYes
EventDigital Humanities 2011 - Stanford University
Duration: 19 Jun 201122 Jun 2011


ConferenceDigital Humanities 2011


Dive into the research topics of 'Mining Language Resources from Institutional Repositories'. Together they form a unique fingerprint.

Cite this