MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

被引:0
|
作者
Murugesan, Keerthiram [1 ]
Swaminathan, Sarathkrishna [1 ]
Dan, Soham [1 ]
Chaudhury, Subhajit [1 ]
Gunasekara, Chulaka [1 ]
Crouse, Maxwell [1 ]
Mahajan, Diwakar [1 ]
Abdelaziz, Ibrahim [1 ]
Fokoue, Achille [1 ]
Kapanipathi, Pavan [1 ]
Roukos, Salim [1 ]
Gray, Alexander [1 ]
机构
[1] IBM Res, New York, NY 10598 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts. Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types such as spatial/geographic errors, entity errors, etc, to guide the model for better prediction of human judgments. We propose a neural framework for evaluating machine texts that uses these mismatch error types as auxiliary tasks and re-purposes the existing single-number evaluation metrics as additional scalar features, in addition to textual features extracted from the machine and reference texts. Our experiments reveal key insights about the existing metrics via the mismatch errors. We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.
引用
收藏
页码:4485 / 4503
页数:19
相关论文
共 50 条
  • [1] FINEMATCH: Aspect-Based Fine-Grained Image and Text Mismatch Detection and Correction
    Hua, Hang
    Shi, Jing
    Kafle, Kushal
    Jenni, Simon
    Zhang, Daoan
    Collomosse, John
    Cohen, Scott
    Lu, Jiebo
    COMPUTER VISION - ECCV 2024, PT IX, 2025, 15067 : 474 - 491
  • [2] A Fine-Grained Geolocalization Method for User Generated Short Text
    Zhang, Yinyin
    Li, Yongjun
    Ji, Wenli
    Wang, Siqi
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2022, 17 (10) : 1485 - 1494
  • [3] xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection
    Guerreiro, Nuno M.
    Rei, Ricardo
    van Stigt, Daan
    Coheur, Luisa
    Colombo, Pierre
    Martins, Andre F. T.
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 979 - 995
  • [4] On the Evaluation of Machine-Generated Reports
    Mayfield, James
    Yang, Eugene
    Lawrie, Dawn
    MacAvaney, Sean
    McNamee, Paul
    Oard, Douglas W.
    Soldaini, Luca
    Soboroff, Ian
    Weller, Orion
    Kayi, Efsun
    Sanders, Kate
    Mason, Marc
    Hibbler, Noah
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1904 - 1915
  • [5] RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
    Dugan, Liam
    Hwang, Alyssa
    Trhlik, Filip
    Ludan, Josh Magnus
    Zhu, Andrew
    Xu, Hainiu
    Ippolito, Daphne
    Callison-Burch, Chris
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 12463 - 12492
  • [6] MAGE: Machine-generated Text Detection in the Wild
    Li, Yafu
    Li, Qintong
    Cui, Leyang
    Bi, Wei
    Wang, Zhilin
    Wang, Longyue
    Yang, Linyi
    Shi, Shuming
    Zhang, Yue
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 36 - 53
  • [7] Detection of Machine-Generated Text: Literature Survey
    University of Arkansas at Little Rock, United States
    arXiv,
  • [8] INSTRUCTSCORE: Explainable Text Generation Evaluation with Fine-grained Feedback
    Xu, Wenda
    Wang, Danqing
    Pan, Liangming
    Song, Zhenqiao
    Freitag, Markus
    Wang, William Yang
    Li, Lei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5967 - 5994
  • [9] An Entity-Based Fine-Grained Geolocalization of User Generated Short Text
    Li, Yongjun
    Ji, Wenli
    Deng, Yao
    Gao, Xing
    IEEE ACCESS, 2020, 8 : 219114 - 219123
  • [10] Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans
    Ortmann, Katrin
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1400 - 1407