TY - GEN
T1 - Fast query for large treebanks
AU - Ghodke, Sumukh
AU - Bird, Steven
PY - 2010/12/1
Y1 - 2010/12/1
N2 - A variety of query systems have been developed for interrogating parsed corpora, or tree-banks. With the arrival of efficient, wide-coverage parsers, it is feasible to create very large databases of trees. However, existing approaches that use in-memory search, or relational or XML database technologies, do not scale up. We describe a method for storage, indexing, and query of treebanks that uses an information retrieval engine. Several experiments with a large treebank demonstrate excellent scaling characteristics for a wide range of query types. This work facilitates the curation of much larger treebanks, and enables them to be used effectively in a variety of scientific and engineering tasks.
AB - A variety of query systems have been developed for interrogating parsed corpora, or tree-banks. With the arrival of efficient, wide-coverage parsers, it is feasible to create very large databases of trees. However, existing approaches that use in-memory search, or relational or XML database technologies, do not scale up. We describe a method for storage, indexing, and query of treebanks that uses an information retrieval engine. Several experiments with a large treebank demonstrate excellent scaling characteristics for a wide range of query types. This work facilitates the curation of much larger treebanks, and enables them to be used effectively in a variety of scientific and engineering tasks.
UR - http://www.scopus.com/inward/record.url?scp=80053274769&partnerID=8YFLogxK
M3 - Conference Paper published in Proceedings
AN - SCOPUS:80053274769
SN - 1932432655
SN - 9781932432657
T3 - NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference
SP - 267
EP - 275
BT - NAACL HLT 2010 - Human Language Technologies
T2 - 2010 Human Language Technologies Conference ofthe North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010
Y2 - 2 June 2010 through 4 June 2010
ER -