MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

被引:0
|
作者
Murugesan, Keerthiram [1 ]
Swaminathan, Sarathkrishna [1 ]
Dan, Soham [1 ]
Chaudhury, Subhajit [1 ]
Gunasekara, Chulaka [1 ]
Crouse, Maxwell [1 ]
Mahajan, Diwakar [1 ]
Abdelaziz, Ibrahim [1 ]
Fokoue, Achille [1 ]
Kapanipathi, Pavan [1 ]
Roukos, Salim [1 ]
Gray, Alexander [1 ]
机构
[1] IBM Res, New York, NY 10598 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts. Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types such as spatial/geographic errors, entity errors, etc, to guide the model for better prediction of human judgments. We propose a neural framework for evaluating machine texts that uses these mismatch error types as auxiliary tasks and re-purposes the existing single-number evaluation metrics as additional scalar features, in addition to textual features extracted from the machine and reference texts. Our experiments reveal key insights about the existing metrics via the mismatch errors. We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.
引用
收藏
页码:4485 / 4503
页数:19
相关论文
共 50 条
  • [41] Evaluation of machine-generated chemical ontologies for molecular information
    Boyer, Stephen
    Griffin, Thomas
    Louie, Eric
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 251
  • [42] Machine-Generated Text: A Comprehensive Survey of Threat Models and Detection Methods
    Crothers, Evan N.
    Japkowicz, Nathalie
    Viktor, Herna L.
    IEEE ACCESS, 2023, 11 : 70977 - 71002
  • [43] Additive-error fine-grained quantum supremacy
    Morimae, Tomoyuki
    Tamaki, Suguru
    QUANTUM, 2020, 4
  • [44] Modeling Fine-Grained Entity Types with Box Embeddings
    Onoe, Yasumasa
    Boratko, Michael
    McCallum, Andrew
    Durrett, Greg
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2051 - 2064
  • [45] Measuring the Acceptable Word Error Rate of Machine-Generated Webcast Transcripts
    Munteanu, Cosmin
    Penn, Gerald
    Baecker, Ron
    Toms, Elaine
    James, David
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 157 - +
  • [46] FACTSCORE: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
    Min, Sewon
    Krishna, Kalpesh
    Lyu, Xinxi
    Lewis, Mike
    Yih, Wen-tau
    Koh, Pang Wei
    Iyyer, Mohit
    Zettlemoyer, Luke
    Hajishirzi, Hannaneh
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12076 - 12100
  • [47] KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection
    Spiegel, Michal
    Macko, Dominik
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 558 - 564
  • [48] Diagnosing Machine Learning Pipelines with Fine-grained Lineage
    Zhang, Zhao
    Sparks, Evan R.
    Franklin, Michael J.
    HPDC'17: PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2017, : 143 - 153
  • [49] Machine Learning for Fine-Grained Hardware Prefetcher Control
    Hiebel, Jason
    Brown, Laura E.
    Wang, Zhenlin
    PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [50] Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data
    Mekala, Dheeraj
    Gangal, Varun
    Shang, Jingbo
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 583 - 594