Using Syntax in Large-Scale Audio Document Translation

被引:0
|
作者
Zheng, Jing [1 ]
Ayan, Necip Fazil [1 ]
Wang, Wen [1 ]
Burkett, David [2 ]
机构
[1] SRI Int, Speech Technol & Res Lab, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
[2] Univ Calif Berkeley, EECS Dept, Berkeley, CA 94720 USA
关键词
syntax; machine translation; audio document;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, the use of syntax has very effectively improved machine translation (MT) quality in many text translation tasks. However, using syntax in speech translation poses additional challenges because of disfluencies and other spoken language phenomena, and of errors introduced by automatic speech recognition (ASR). In this paper, we investigate the effect of using syntax in a large-scale audio document translation task targeting broadcast news and broadcast conversations. We do so by comparing the performance of three synchronous context-free grammar based translation approaches: 1) hierarchical phrase-based translation, 2) syntax-augmented MT, and 3) string-to-dependency MT. The results show a positive effect of explicitly using syntax when translating broadcast news, but no benefit when translating broadcast conversations. The results indicate that improving the robustness of syntactic systems against conversational language style is important to their success and requires future effort.
引用
收藏
页码:444 / +
页数:2
相关论文
共 50 条
  • [21] A Large-Scale Study of Machine Translation in the Turkic Languages
    Mirzakhalov, Jamshidbek
    Babu, Anoop
    Ataman, Duygu
    Kariev, Sherzod
    Tyers, Francis
    Abduraufov, Otabek
    Hajili, Mammad
    Ivanova, Sardana
    Khaytbaev, Abror
    Laverghetta, Antonio, Jr.
    Moydinboyev, Behzodbek
    Onal, Esra
    Pulatova, Shaxnoza
    Wahab, Ahsan
    Firat, Orhan
    Chellappan, Sriram
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5876 - 5890
  • [22] IMPROVING AUTOMATIC DRUM TRANSCRIPTION USING LARGE-SCALE AUDIO-TO-MIDI ALIGNED DATA
    Wei, I-Chieh
    Wu, Chih-Wei
    Su, Li
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 246 - 250
  • [23] SureChEMBL: a large-scale, chemically annotated patent document database
    Papadatos, George
    Davies, Mark
    Dedman, Nathan
    Chambers, Jon
    Gaulton, Anna
    Siddle, James
    Koks, Richard
    Irvine, Sean A.
    Pettersson, Joe
    Goncharoff, Nicko
    Hersey, Anne
    Overington, John P.
    NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) : D1220 - D1228
  • [24] Electronic document management systems and distributed large-scale systems
    Orlov, V. L.
    Kurako, E. A.
    2017 TENTH INTERNATIONAL CONFERENCE MANAGEMENT OF LARGE-SCALE SYSTEM DEVELOPMENT (MLSD), 2017,
  • [25] Orchestration of Semantic Web services for large-scale document annotation
    Norton, B
    Chapman, S
    Ciravegna, F
    SEMANTIC WEB: RESEARCH AND APPLICATIONS, PROCEEDINGS, 2005, 3532 : 649 - 663
  • [26] SwiftLink: Serendipitous Navigation Strategy for Large-scale Document Collections
    von Wyl, Marc
    Marchand-Maillet, Stephane
    2012 23RD INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2012, : 83 - 87
  • [27] A dynamic SOM algorithm for clustering large-scale document collection
    Luo, Kegang
    Liu, Yuanchao
    Wang, Xiaolong
    ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 15 - +
  • [28] A large-scale dataset for Chinese historical document recognition and analysis
    Shi, Yongxin
    Peng, Dezhi
    Zhang, Yuyi
    Cao, Jiahuan
    Jin, Lianwen
    SCIENTIFIC DATA, 2025, 12 (01)
  • [29] NPR: Nocturnal Place Recognition Using Nighttime Translation in Large-Scale Training Procedures
    Liu, Bingxi
    Fu, Yujie
    Lu, Feng
    Cui, Jinqiang
    Wu, Yihong
    Zhang, Hong
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (03) : 368 - 379
  • [30] BIGVIDEO: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation
    Kang, Liyan
    Huang, Luyang
    Peng, Ningxin
    Zhu, Peihao
    Sung, Zewei
    Cheng, Shanbo
    Wang, Mingxuan
    Huang, Degen
    Su, Jinsong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8456 - 8473