Abstract
Interlinear texts come in many forms and can be represented digitally in many ways, e.g. plain text with hard spacing, tables, special markup, and special-purpose data structures. There are various methods for linking to audio data and lexical entries, and for including footnotes and other marginalia. This diversity of form presents problems for general purpose software for searching, exchanging, displaying and enriching interlinear texts.
In this paper, we survey several existing tools, models and formats for interlinear text. We argue that
a general purpose abstract data model for interlinear text is necessary in order to abstract away from all the physical storage formats and display styles. We propose a new formal framework for interlinear text based on the annotation graph model [Bird and Liberman, 2001]. This model has several desirable properties for the development of practical tools. It scales well, so that extended, richly layered texts can be stored and manipulated efficiently. The model has a direct representation as a relationaltable, permitting efficient query. Incomplete information can be represented naturally, so partially analyzed texts are well-formed structures and are well-behaved with respect to query. For the same reason, essentially arbitrary parts of an interlinear text, such as particular layers, can be projected and remain well-formed, and this facilitates analysis by external programs. Finally, this model makes it trivial to construe interlinear text as an annotation of time-series data, so that tools can give access to the primary audio data while a text is being transcribed and annotated.
As partial demonstration of these claims, we present a prototype interlinear text editor based on the
annotation graph model. This tool is being developed in conjunction with a general architecture of tools for transcribing and annotating time-series data within the framework of annotation graphs. By using this architecture we can reuse and integrate interlinear text software components with other tools. The tool is available in open source form.
In this paper, we survey several existing tools, models and formats for interlinear text. We argue that
a general purpose abstract data model for interlinear text is necessary in order to abstract away from all the physical storage formats and display styles. We propose a new formal framework for interlinear text based on the annotation graph model [Bird and Liberman, 2001]. This model has several desirable properties for the development of practical tools. It scales well, so that extended, richly layered texts can be stored and manipulated efficiently. The model has a direct representation as a relationaltable, permitting efficient query. Incomplete information can be represented naturally, so partially analyzed texts are well-formed structures and are well-behaved with respect to query. For the same reason, essentially arbitrary parts of an interlinear text, such as particular layers, can be projected and remain well-formed, and this facilitates analysis by external programs. Finally, this model makes it trivial to construe interlinear text as an annotation of time-series data, so that tools can give access to the primary audio data while a text is being transcribed and annotated.
As partial demonstration of these claims, we present a prototype interlinear text editor based on the
annotation graph model. This tool is being developed in conjunction with a general architecture of tools for transcribing and annotating time-series data within the framework of annotation graphs. By using this architecture we can reuse and integrate interlinear text software components with other tools. The tool is available in open source form.
Original language | English |
---|---|
Number of pages | 19 |
Publication status | Published - 2000 |
Externally published | Yes |
Event | Workshop on Web-Based Language Documentation and Description - Philadelphia, United States Duration: 12 Dec 2000 → 15 Dec 2000 |
Conference
Conference | Workshop on Web-Based Language Documentation and Description |
---|---|
Country/Territory | United States |
City | Philadelphia |
Period | 12/12/00 → 15/12/00 |