Fangorn: A system for querying very large treebanks

Sumukh Ghodke, Steven Bird

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedingspeer-review


The efficiency and robustness of statistical parsers has made it possible to create very large treebanks. These serve as the starting point for further work including enrichment, extraction, and curation: semantic annotations are added, syntactic features are mined, erroneous analyses are corrected. In many such cases manual processing is required, and this must operate efficiently on the largest scale. We report on an efficient web-based system for querying very large treebanks called Fangorn. It implements an XPath-like query language which is extended with a linguistic operator to capture proximity in the terminal sequence. Query results are displayed using scalable vector graphics and decorated with the original query, making it easy for queries to be modified and resubmitted. Fangorn is built on the Apache Lucene text search engine and is available under the Apache License
Original languageEnglish
Title of host publicationProceedings of COLING 2012
Subtitle of host publicationDemonstration Papers
Number of pages8
Publication statusPublished - 2012
Externally publishedYes
EventInternational Conference on Computational Linguistics - Mumbai, India
Duration: 8 Dec 201215 Dec 2012
Conference number: 24th


ConferenceInternational Conference on Computational Linguistics
Abbreviated titleCOLING 2012


Dive into the research topics of 'Fangorn: A system for querying very large treebanks'. Together they form a unique fingerprint.

Cite this