Managing fieldwork data with toolbox and the natural language toolkit

Stuart Robinson, Greg Aumann, Steven Bird

Research output: Contribution to journalArticlepeer-review

65 Downloads (Pure)

Abstract

This paper shows how fieldwork data can be managed using the program Toolbox together with the Natural Language Toolkit (NLTK) for the Python programming language. It provides background information about Toolbox and describes how it can be downloaded and installed. The basic functionality of the program for lexicons and texts is described, and its strengths and weaknesses are reviewed. Its underlying data format is briefly discussed, and Toolbox processing capabilities of NLTK are introduced, showing ways in which it can be used to extend the functionality of Toolbox. This is illustrated with a few simple scripts that demonstrate basic data management tasks relevant to language documentation, such as printing out the contents of a lexicon as HTML.
Original languageEnglish
Pages (from-to)44-57
Number of pages14
JournalLanguage Documentation and Conservation
Volume1
Issue number1
Publication statusPublished - 2007
Externally publishedYes

Fingerprint

Dive into the research topics of 'Managing fieldwork data with toolbox and the natural language toolkit'. Together they form a unique fingerprint.

Cite this