TalkBank: Building an open unified multimodal database of communicative interaction

Brian MacWhinney, Steven Bird, Christopher Cieri, Craig Martell

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in Proceedings

Abstract

The goal of the TalkBank project (http://talkbank.org) is to support data-sharing and direct, community-wide access to naturalistic recordings and transcripts of human and animal communication. Toward this end, we have constructed a web accessible database of transcripts linked to audio and video media within fields such as conversation analysis, classroom discourse, animal communication, gesture, meetings, second language acquisition, first language acquisition, bilingualism, tutoring, and legal oral argumentation. We discuss how we have taken discrepant databases from dozens of individual projects and merged them together into a well-structured uniform database in which transcripts can be opened online through browsers, allowing direct multimedia playback. To achieve translation across corpora, we have defined a general XML schema. The validity of this schema is checked by bidirectional conversion from alternative input formats to XML and back. The resultant transcripts are then linked to hinted media and XSLT is used to format web readable browsable multimedia transcripts playable through SMIL. A parallel pathway is used to support collaborative commentary and publication of PDF linked to media through special issues of journals in the relevant fields.

Original languageEnglish
Title of host publicationProceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004
EditorsMaria Francisca Xavier, Rute Costa, Fatima Ferreira, Maria Teresa Lino, Raquel Silva
PublisherEuropean Language Resources Association (ELRA)
Pages525-528
Number of pages4
ISBN (Electronic)2951740816, 9782951740815
Publication statusPublished - 2004
Event4th International Conference on Language Resources and Evaluation, LREC 2004 - Lisbon, Portugal
Duration: 26 May 200428 May 2004

Publication series

NameProceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004

Conference

Conference4th International Conference on Language Resources and Evaluation, LREC 2004
CountryPortugal
CityLisbon
Period26/05/0428/05/04

Fingerprint Dive into the research topics of 'TalkBank: Building an open unified multimodal database of communicative interaction'. Together they form a unique fingerprint.

  • Cite this

    MacWhinney, B., Bird, S., Cieri, C., & Martell, C. (2004). TalkBank: Building an open unified multimodal database of communicative interaction. In M. F. Xavier, R. Costa, F. Ferreira, M. T. Lino, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004 (pp. 525-528). (Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004). European Language Resources Association (ELRA).