An attentional model for speech translation without transcription

Long Duong, Antonios Anastasopoulos, David Chiang, Steven Bird, Trevor Cohn

    Research output: Chapter in Book/Report/Conference proceedingConference Paper published in ProceedingsResearchpeer-review

    Abstract

    For many low-resource languages, spoken language resources are more likely to be annotated with translations than transcriptions. This bilingual speech data can be used for word-spotting, spoken document retrieval, and even for documentation of endangered languages. We experiment with the neural, attentional model applied to this data. On phoneto-word alignment and translation reranking tasks, we achieve large improvements relative to several baselines. On the more challenging speech-to-word alignment task, our model nearly matches GIZA++'s performance on gold transcriptions, but without recourse to transcriptions or to a lexicon.

    Original languageEnglish
    Title of host publication2016 Conference of the North American Chapter of the Association for Computational Linguistics
    Subtitle of host publicationHuman Language Technologies, NAACL HLT 2016 - Proceedings of the Conference
    Place of PublicationSan Diego, United States
    PublisherAssociation for Computational Linguistics (ACL)
    Pages949-959
    Number of pages11
    ISBN (Electronic)9781941643914
    Publication statusPublished - Jun 2016
    Event15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - San Diego, United States
    Duration: 12 Jun 201617 Jun 2016

    Conference

    Conference15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016
    CountryUnited States
    CitySan Diego
    Period12/06/1617/06/16

    Fingerprint

    Transcription
    recourse
    spoken language
    language
    gold
    resources
    documentation
    experiment
    Gold
    performance
    Experiments
    Resources
    Alignment

    Cite this

    Duong, L., Anastasopoulos, A., Chiang, D., Bird, S., & Cohn, T. (2016). An attentional model for speech translation without transcription. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (pp. 949-959). San Diego, United States: Association for Computational Linguistics (ACL).
    Duong, Long ; Anastasopoulos, Antonios ; Chiang, David ; Bird, Steven ; Cohn, Trevor. / An attentional model for speech translation without transcription. 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference. San Diego, United States : Association for Computational Linguistics (ACL), 2016. pp. 949-959
    @inproceedings{76f36befa076433fb56cd1c34d4ee2c0,
    title = "An attentional model for speech translation without transcription",
    abstract = "For many low-resource languages, spoken language resources are more likely to be annotated with translations than transcriptions. This bilingual speech data can be used for word-spotting, spoken document retrieval, and even for documentation of endangered languages. We experiment with the neural, attentional model applied to this data. On phoneto-word alignment and translation reranking tasks, we achieve large improvements relative to several baselines. On the more challenging speech-to-word alignment task, our model nearly matches GIZA++'s performance on gold transcriptions, but without recourse to transcriptions or to a lexicon.",
    author = "Long Duong and Antonios Anastasopoulos and David Chiang and Steven Bird and Trevor Cohn",
    year = "2016",
    month = "6",
    language = "English",
    pages = "949--959",
    booktitle = "2016 Conference of the North American Chapter of the Association for Computational Linguistics",
    publisher = "Association for Computational Linguistics (ACL)",

    }

    Duong, L, Anastasopoulos, A, Chiang, D, Bird, S & Cohn, T 2016, An attentional model for speech translation without transcription. in 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference. Association for Computational Linguistics (ACL), San Diego, United States, pp. 949-959, 15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016, San Diego, United States, 12/06/16.

    An attentional model for speech translation without transcription. / Duong, Long; Anastasopoulos, Antonios; Chiang, David; Bird, Steven; Cohn, Trevor.

    2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference. San Diego, United States : Association for Computational Linguistics (ACL), 2016. p. 949-959.

    Research output: Chapter in Book/Report/Conference proceedingConference Paper published in ProceedingsResearchpeer-review

    TY - GEN

    T1 - An attentional model for speech translation without transcription

    AU - Duong, Long

    AU - Anastasopoulos, Antonios

    AU - Chiang, David

    AU - Bird, Steven

    AU - Cohn, Trevor

    PY - 2016/6

    Y1 - 2016/6

    N2 - For many low-resource languages, spoken language resources are more likely to be annotated with translations than transcriptions. This bilingual speech data can be used for word-spotting, spoken document retrieval, and even for documentation of endangered languages. We experiment with the neural, attentional model applied to this data. On phoneto-word alignment and translation reranking tasks, we achieve large improvements relative to several baselines. On the more challenging speech-to-word alignment task, our model nearly matches GIZA++'s performance on gold transcriptions, but without recourse to transcriptions or to a lexicon.

    AB - For many low-resource languages, spoken language resources are more likely to be annotated with translations than transcriptions. This bilingual speech data can be used for word-spotting, spoken document retrieval, and even for documentation of endangered languages. We experiment with the neural, attentional model applied to this data. On phoneto-word alignment and translation reranking tasks, we achieve large improvements relative to several baselines. On the more challenging speech-to-word alignment task, our model nearly matches GIZA++'s performance on gold transcriptions, but without recourse to transcriptions or to a lexicon.

    UR - http://www.scopus.com/inward/record.url?scp=84994162838&partnerID=8YFLogxK

    UR - http://m-mitchell.com/NAACL-2016/NAACL-HLT2016/

    M3 - Conference Paper published in Proceedings

    SP - 949

    EP - 959

    BT - 2016 Conference of the North American Chapter of the Association for Computational Linguistics

    PB - Association for Computational Linguistics (ACL)

    CY - San Diego, United States

    ER -

    Duong L, Anastasopoulos A, Chiang D, Bird S, Cohn T. An attentional model for speech translation without transcription. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference. San Diego, United States: Association for Computational Linguistics (ACL). 2016. p. 949-959