Woodpecker: hallucination correction for multimodal large language models

被引:0
|
作者
Yin, Shukang [1 ]
Fu, Chaoyou [2 ,3 ]
Zhao, Sirui [1 ]
Xu, Tong [1 ]
Wang, Hao [1 ]
Sui, Dianbo [4 ]
Shen, Yunhang [5 ]
Li, Ke [5 ]
Sun, Xing [5 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Sch Artificial Intelligence & Data Sci, Hefei 230026, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[3] Nanjing Univ, Sch Intelligence Sci & Technol, Suzhou 215163, Peoples R China
[4] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[5] YouTu, Shanghai 200233, Peoples R China
基金
中国国家自然科学基金;
关键词
multimodal learning; multimodal large language models; hallucination correction; large language models; vision and language;
D O I
10.1007/s11432-024-4251-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language models (MLLMs), referring to that the generated text is inconsistent with the image content. To mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like woodpeckers heal trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://github.com/BradyFU/Woodpecker.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Woodpecker: hallucination correction for multimodal large language models
    Shukang YIN
    Chaoyou FU
    Sirui ZHAO
    Tong XU
    Hao WANG
    Dianbo SUI
    Yunhang SHEN
    Ke LI
    Xing SUN
    Enhong CHEN
    Science China(Information Sciences), 2024, 67 (12) : 52 - 64
  • [2] HILL: A Hallucination Identifier for Large Language Models
    Leiser, Florian
    Eckhardt, Sven
    Leuthe, Valentin
    Knaeble, Merlin
    Maedche, Alexander
    Schwabe, Gerhard
    Sunyaev, Ali
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,
  • [3] Sources of Hallucination by Large Language Models on Inference Tasks
    McKenna, Nick
    Li, Tianyi
    Cheng, Liang
    Hosseini, Mohammad Javad
    Johnson, Mark
    Steedman, Mark
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2758 - 2774
  • [4] Mitigating Factual Inconsistency and Hallucination in Large Language Models
    Muneeswaran, I
    Shankar, Advaith
    Varun, V.
    Gopalakrishnan, Saisubramaniam
    Vaddina, Vishal
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 1169 - 1170
  • [5] A survey on multimodal large language models
    Yin, Shukang
    Fu, Chaoyou
    Zhao, Sirui
    Li, Ke
    Sun, Xing
    Xu, Tong
    Chen, Enhong
    NATIONAL SCIENCE REVIEW, 2024, 11 (12)
  • [6] A survey on multimodal large language models
    Shukang Yin
    Chaoyou Fu
    Sirui Zhao
    Ke Li
    Xing Sun
    Tong Xu
    Enhong Chen
    National Science Review, 2024, 11 (12) : 277 - 296
  • [7] Chain-of-Verification Reduces Hallucination in Large Language Models
    Dhuliawala, Shehzaad
    Komeili, Mojtaba
    Xu, Jing
    Raileanu, Roberta
    Li, Xian
    Celikyilmaz, Asli
    Weston, Jason
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 3563 - 3578
  • [8] Untangling Emotional Threads: Hallucination Networks of Large Language Models
    Goodarzi, Mahsa
    Venkatakrishnan, Radhakrishnan
    Canbaz, M. Abdullah
    COMPLEX NETWORKS & THEIR APPLICATIONS XII, VOL 1, COMPLEX NETWORKS 2023, 2024, 1141 : 202 - 214
  • [9] Evaluating Object Hallucination in Large Vision-Language Models
    Li, Yifan
    Du, Yifan
    Zhou, Kun
    Wang, Jinpeng
    Zhao, Wayne Xin
    Wen, Ji-Rong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 292 - 305
  • [10] Investigating Hallucination Tendencies of Large Language Models in Japanese and English
    Tsuruta, Hiromi
    Sakaguchi, Rio
    Research Square,