TY - GEN
T1 - TalkBank
T2 - 4th International Conference on Language Resources and Evaluation, LREC 2004
AU - MacWhinney, Brian
AU - Bird, Steven
AU - Cieri, Christopher
AU - Martell, Craig
PY - 2004
Y1 - 2004
N2 - The goal of the TalkBank project (http://talkbank.org) is to support data-sharing and direct, community-wide access to naturalistic recordings and transcripts of human and animal communication. Toward this end, we have constructed a web accessible database of transcripts linked to audio and video media within fields such as conversation analysis, classroom discourse, animal communication, gesture, meetings, second language acquisition, first language acquisition, bilingualism, tutoring, and legal oral argumentation. We discuss how we have taken discrepant databases from dozens of individual projects and merged them together into a well-structured uniform database in which transcripts can be opened online through browsers, allowing direct multimedia playback. To achieve translation across corpora, we have defined a general XML schema. The validity of this schema is checked by bidirectional conversion from alternative input formats to XML and back. The resultant transcripts are then linked to hinted media and XSLT is used to format web readable browsable multimedia transcripts playable through SMIL. A parallel pathway is used to support collaborative commentary and publication of PDF linked to media through special issues of journals in the relevant fields.
AB - The goal of the TalkBank project (http://talkbank.org) is to support data-sharing and direct, community-wide access to naturalistic recordings and transcripts of human and animal communication. Toward this end, we have constructed a web accessible database of transcripts linked to audio and video media within fields such as conversation analysis, classroom discourse, animal communication, gesture, meetings, second language acquisition, first language acquisition, bilingualism, tutoring, and legal oral argumentation. We discuss how we have taken discrepant databases from dozens of individual projects and merged them together into a well-structured uniform database in which transcripts can be opened online through browsers, allowing direct multimedia playback. To achieve translation across corpora, we have defined a general XML schema. The validity of this schema is checked by bidirectional conversion from alternative input formats to XML and back. The resultant transcripts are then linked to hinted media and XSLT is used to format web readable browsable multimedia transcripts playable through SMIL. A parallel pathway is used to support collaborative commentary and publication of PDF linked to media through special issues of journals in the relevant fields.
UR - http://www.scopus.com/inward/record.url?scp=43449128775&partnerID=8YFLogxK
M3 - Conference Paper published in Proceedings
AN - SCOPUS:43449128775
T3 - Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004
SP - 525
EP - 528
BT - Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004
A2 - Xavier, Maria Francisca
A2 - Costa, Rute
A2 - Ferreira, Fatima
A2 - Lino, Maria Teresa
A2 - Silva, Raquel
PB - European Language Resources Association (ELRA)
Y2 - 26 May 2004 through 28 May 2004
ER -