Insight into Multiple References in an MT Evaluation Metric

被引:1
|
作者
Qin, Ying [1 ]
Specia, Lucia [2 ]
机构
[1] Beijing Foreign Studies Univ, Beijing 100089, Peoples R China
[2] Univ Sheffield, Sheffield S10 2TN, S Yorkshire, England
关键词
Machine translation evaluation; METEOR metric; Multiple references;
D O I
10.1007/978-3-319-25816-4_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current evaluation metrics in machine translation (MT) make poor use of multiple reference translations. In this paper we focus on the METEOR metric to gain in-depth insights into how best multiple references can be exploited. Results on five score selection strategies reveal that it is not always wise to choose the best (closest to MT) reference to generate the candidate score. We also propose two weighting approaches by taking into account the recurring information among references. The modified METEOR scores significantly increase the correlation with human judgments on accuracy and fluency evaluation at system level.
引用
收藏
页码:131 / 140
页数:10
相关论文
共 50 条
  • [1] An evaluation metric for image segmentation of multiple objects
    Polak, Mark
    Zhang, Hong
    Pi, Minghong
    IMAGE AND VISION COMPUTING, 2009, 27 (08) : 1223 - 1227
  • [2] Word Embedding-Based Automatic MT Evaluation Metric using Word Position Information
    Echizen'ya, Hiroshi
    Araki, Kenji
    Hovy, Eduard
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1874 - 1883
  • [3] Understanding Multiple References Citation
    Lin, Gege
    Hou, Haiyan
    Hu, Zhigang
    17TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI2019), VOL II, 2019, : 2347 - 2357
  • [4] Sequence Factorization with Multiple References
    Wandelt, Sebastian
    Leser, Ulf
    PLOS ONE, 2015, 10 (09):
  • [5] ENVIRONMENTAL EVALUATION OF THE FOREST OF MT FUJI, BASED ON MULTIPLE SATELLITE DATA
    SHIOSAKA, K
    KONTA, F
    NISHIKAWA, H
    REMOTE SENSING OF EARTHS SURFACE AND ATMOSPHERE, 1993, 14 (03): : 273 - 276
  • [6] LEGISLATIVE INSIGHT INTO METRIC CONVERSION ACT
    COX, JE
    MECHANICAL ENGINEERING, 1977, 99 (05) : 90 - 90
  • [7] MinKSR: A Novel MT Evaluation Metric for Coordinating Human Translators with the CAT-Oriented Input Method
    Huang, Guoping
    Zhao, Chunlu
    Ma, Hongyuan
    Zhou, Yu
    Zhang, Jiajun
    MACHINE TRANSLATION, 2016, 668 : 1 - 13
  • [8] YiSi - A unified semantic MT quality evaluation and estimation metric for languages with different levels of available resources
    Lo, Chi-kiu
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), 2019, : 507 - 513
  • [9] Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References
    Gupta, Prakhar
    Mehri, Shikib
    Zhao, Tiancheng
    Pavel, Amy
    Eskenazi, Maxine
    Bigham, Jeffrey P.
    20TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2019), 2019, : 379 - 391
  • [10] Fetal growth: the dilemma of multiple references
    Costa, Fabricio Da Silva
    Papageorghiou, Aris
    Helfer, Talita Micheletti
    REVISTA BRASILEIRA DE GINECOLOGIA E OBSTETRICIA, 2015, 37 (08): : 345 - 346