Abstract
The efficiency and robustness of statistical parsers has made it possible to create very large treebanks. These serve as the starting point for further work including enrichment, extraction, and curation: semantic annotations are added, syntactic features are mined, erroneous analyses are corrected. In many such cases manual processing is required, and this must operate efficiently on the largest scale. We report on an efficient web-based system for querying very large treebanks called Fangorn. It implements an XPath-like query language which is extended with a linguistic operator to capture proximity in the terminal sequence. Query results are displayed using scalable vector graphics and decorated with the original query, making it easy for queries to be modified and resubmitted. Fangorn is built on the Apache Lucene text search engine and is available under the Apache License
Original language | English |
---|---|
Title of host publication | Proceedings of COLING 2012 |
Subtitle of host publication | Demonstration Papers |
Pages | 175-182 |
Number of pages | 8 |
Publication status | Published - 2012 |
Externally published | Yes |
Event | International Conference on Computational Linguistics - Mumbai, India Duration: 8 Dec 2012 → 15 Dec 2012 Conference number: 24th |
Conference
Conference | International Conference on Computational Linguistics |
---|---|
Abbreviated title | COLING 2012 |
Country/Territory | India |
City | Mumbai |
Period | 8/12/12 → 15/12/12 |