METER: Multimodal Hallucination Detection with Mixture of Experts via Tools Ensembling and Reasoning

被引:0
|
作者
Zhang, Ruwen [1 ]
Chen, Jinglu [1 ]
Dai, Mingjie [1 ]
Jiang, Xinyi [1 ]
Hu, Yuxin [1 ]
Liu, Bo [1 ]
Cao, Jiuxin [1 ]
机构
[1] Southeast Univ, Nanjing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Hallucination Detection; LVLMs; Tools Integration; CoT;
D O I
10.1007/978-981-97-9443-0_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The development of Large Vision-Language Models (LVLMs) has been hindered by hallucinations. Existing methods often struggle to accurately infer relationships between objects and their attributes, and frequently overlook the challenges posed by semantic duality. In this work, we develop METER, a novel multimodal hallucination detection method that utilizes a mixture of experts through tool-supported reasoning and ensembling. Specifically, our model rethinks and infers based on the decomposed reasoning steps derived from the chain-of-thought prompts, which eliminates the need for additional manual templates and recognizes attributes of hallucination step by step. We also use topics discovered from image-text pairs to distinguish ambiguous text, mitigating semantic duality. Furthermore, we investigate the effects of incorporating external tools into hallucination detection, exploring the variations and efficacy of tool ensembling in mitigating hallucinations. Additionally, we successfully alleviate hallucinations by incorporating METER's explanation into the prompt. Extensive experiments demonstrate the effectiveness of our model. Our codes are available at https://github.com/lambdarw/METER.
引用
收藏
页码:274 / 286
页数:13
相关论文
共 4 条
  • [1] Multimodal Object Detection via Probabilistic Ensembling
    Chen, Yi-Ting
    Shi, Jinghao
    Ye, Zelin
    Mertz, Christoph
    Ramanan, Deva
    Kong, Shu
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 139 - 158
  • [2] Pedestrian Detection via Mixture of CNN Experts and thresholded Aggregated Channel Features
    Verma, Ankit
    Hebbalaguppe, Ramya
    Vig, Lovekesh
    Kumar, Swagat
    Hassan, Ehtesham
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 555 - 563
  • [3] Context-based classification via mixture of hidden Markov model experts with applications in landmine detection
    Yuksel, Seniha E.
    Gader, Paul D.
    IET COMPUTER VISION, 2016, 10 (08) : 873 - 883
  • [4] Werkzeug at SemEval-2024 Task 8: LLM-Generated Text Detection via Gated Mixture-of-Experts Fine-Tuning
    Wu, Youlin
    Wang, Kaichun
    Ma, Kai
    Yang, Liang
    Lin, Hongfei
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 547 - 552