A breadth-first representation for tree matching in large scale forest-based translation

Sumukh Ghodke, Steven Bird, Rui Zhang

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedings

Abstract

Efficient data structures are necessary for searching large translation rule dictionaries in forest-based machine translation. We propose a breadth-first representation of tree structures that allows trees to be stored and accessed efficiently. We describe an algorithm that allows incremental search for trees in a forest and show that its performance is orders of magnitude faster than iterative search. A B-tree index is used to store the rule dictionaries. Prefix-compressed indexes with a large page size are found to provide a balance of fast search and disk space utilisation.
Original languageEnglish
Title of host publicationProceedings of 5th International Joint Conference on Natural Language Processing
Pages785-793
Number of pages9
Publication statusPublished - 2011
Externally publishedYes
EventInternational Joint Conference on Natural Language Processing - Chiang Mai, Thailand
Duration: 8 Nov 201113 Nov 2011
Conference number: 5th

Conference

ConferenceInternational Joint Conference on Natural Language Processing
CountryThailand
CityChiang Mai
Period8/11/1113/11/11

Fingerprint Dive into the research topics of 'A breadth-first representation for tree matching in large scale forest-based translation'. Together they form a unique fingerprint.

  • Cite this

    Ghodke, S., Bird, S., & Zhang, R. (2011). A breadth-first representation for tree matching in large scale forest-based translation. In Proceedings of 5th International Joint Conference on Natural Language Processing (pp. 785-793)