Abstract
Efficient data structures are necessary for searching large translation rule dictionaries in forest-based machine translation. We propose a breadth-first representation of tree structures that allows trees to be stored and accessed efficiently. We describe an algorithm that allows incremental search for trees in a forest and show that its performance is orders of magnitude faster than iterative search. A B-tree index is used to store the rule dictionaries. Prefix-compressed indexes with a large page size are found to provide a balance of fast search and disk space utilisation.
Original language | English |
---|---|
Title of host publication | Proceedings of 5th International Joint Conference on Natural Language Processing |
Pages | 785-793 |
Number of pages | 9 |
Publication status | Published - 2011 |
Externally published | Yes |
Event | International Joint Conference on Natural Language Processing - Chiang Mai, Thailand Duration: 8 Nov 2011 → 13 Nov 2011 Conference number: 5th |
Conference
Conference | International Joint Conference on Natural Language Processing |
---|---|
Country/Territory | Thailand |
City | Chiang Mai |
Period | 8/11/11 → 13/11/11 |