A Formal Framework for Interlinear Text

Kazuaki Maeda, Steven Bird

Research output: Contribution to conferenceConference paper presented at Conference (not in Proceedings)

Abstract

Interlinear texts come in many forms and can be represented digitally in many ways, e.g. plain text with hard spacing, tables, special markup, and special-purpose data structures. There are various methods for linking to audio data and lexical entries, and for including footnotes and other marginalia. This diversity of form presents problems for general purpose software for searching, exchanging, displaying and enriching interlinear texts.

In this paper, we survey several existing tools, models and formats for interlinear text. We argue that
a general purpose abstract data model for interlinear text is necessary in order to abstract away from all the physical storage formats and display styles. We propose a new formal framework for interlinear text based on the annotation graph model [Bird and Liberman, 2001]. This model has several desirable properties for the development of practical tools. It scales well, so that extended, richly layered texts can be stored and manipulated efficiently. The model has a direct representation as a relationaltable, permitting efficient query. Incomplete information can be represented naturally, so partially analyzed texts are well-formed structures and are well-behaved with respect to query. For the same reason, essentially arbitrary parts of an interlinear text, such as particular layers, can be projected and remain well-formed, and this facilitates analysis by external programs. Finally, this model makes it trivial to construe interlinear text as an annotation of time-series data, so that tools can give access to the primary audio data while a text is being transcribed and annotated.

As partial demonstration of these claims, we present a prototype interlinear text editor based on the
annotation graph model. This tool is being developed in conjunction with a general architecture of tools for transcribing and annotating time-series data within the framework of annotation graphs. By using this architecture we can reuse and integrate interlinear text software components with other tools. The tool is available in open source form.
Original languageEnglish
Number of pages19
Publication statusPublished - 2000
Externally publishedYes
EventWorkshop on Web-Based Language Documentation and Description - Philadelphia, United States
Duration: 12 Dec 200015 Dec 2000

Conference

ConferenceWorkshop on Web-Based Language Documentation and Description
Country/TerritoryUnited States
CityPhiladelphia
Period12/12/0015/12/00

Bibliographical note

http://www.ldc.upenn.edu/exploration/expl2000/papers/

Fingerprint

Dive into the research topics of 'A Formal Framework for Interlinear Text'. Together they form a unique fingerprint.

Cite this