METER: Multimodal Hallucination Detection with Mixture of Experts via Tools Ensembling and Reasoning

被引：0

作者：

Zhang, Ruwen ^{[1
]}

Chen, Jinglu ^{[1
]}

Dai, Mingjie ^{[1
]}

Jiang, Xinyi ^{[1
]}

Hu, Yuxin ^{[1
]}

Liu, Bo ^{[1
]}

Cao, Jiuxin ^{[1
]}

机构：

[1] Southeast Univ, Nanjing, Peoples R China

来源：

NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT V, NLPCC 2024 | 2025年 / 15363卷

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Hallucination Detection; LVLMs; Tools Integration; CoT;

D O I：

10.1007/978-981-97-9443-0_24

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The development of Large Vision-Language Models (LVLMs) has been hindered by hallucinations. Existing methods often struggle to accurately infer relationships between objects and their attributes, and frequently overlook the challenges posed by semantic duality. In this work, we develop METER, a novel multimodal hallucination detection method that utilizes a mixture of experts through tool-supported reasoning and ensembling. Specifically, our model rethinks and infers based on the decomposed reasoning steps derived from the chain-of-thought prompts, which eliminates the need for additional manual templates and recognizes attributes of hallucination step by step. We also use topics discovered from image-text pairs to distinguish ambiguous text, mitigating semantic duality. Furthermore, we investigate the effects of incorporating external tools into hallucination detection, exploring the variations and efficacy of tool ensembling in mitigating hallucinations. Additionally, we successfully alleviate hallucinations by incorporating METER's explanation into the prompt. Extensive experiments demonstrate the effectiveness of our model. Our codes are available at https://github.com/lambdarw/METER.

引用

页码：274 / 286

页数：13

共 4 条

[1] Multimodal Object Detection via Probabilistic Ensembling
Chen, Yi-Ting
Shi, Jinghao
Ye, Zelin
Mertz, Christoph
Ramanan, Deva
Kong, Shu
COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 139 - 158
[2] Pedestrian Detection via Mixture of CNN Experts and thresholded Aggregated Channel Features
Verma, Ankit
Hebbalaguppe, Ramya
Vig, Lovekesh
Kumar, Swagat
Hassan, Ehtesham
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 555 - 563
[3] Context-based classification via mixture of hidden Markov model experts with applications in landmine detection
Yuksel, Seniha E.
Gader, Paul D.
IET COMPUTER VISION, 2016, 10 (08) : 873 - 883
[4] Werkzeug at SemEval-2024 Task 8: LLM-Generated Text Detection via Gated Mixture-of-Experts Fine-Tuning
Wu, Youlin
Wang, Kaichun
Ma, Kai
Yang, Liang
Lin, Hongfei
PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 547 - 552

← 1 →