Semi-Open Set Object Detection Algorithm Leveraged by Multi-Modal Large Language Models

被引:0
|
作者
Wu, Kewei [1 ]
Wang, Yiran [1 ]
He, Xiaogang [1 ]
Yan, Jinyu [2 ]
Guo, Yang [2 ]
Jiang, Zhuqing [1 ]
Zhang, Xing [3 ]
Wang, Wei [3 ]
Xiong, Yongping [1 ]
Men, Aidong [1 ]
Xiao, Li [1 ]
机构
[1] School of Artificial Intelligence, Beijing University of Posts and Telecommunications, 10 Xitucheng Rd, Beijing,100876, China
[2] Beijing Zhuoshizhitong Technology Co., Ltd., Beijing,100096, China
[3] China Resources Digital Co., Ltd., Beijing,518049, China
关键词
D O I
10.3390/bdcc8120175
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [1] Visual Hallucinations of Multi-modal Large Language Models
    Huang, Wen
    Liu, Hongbin
    Guo, Minxin
    Gong, Neil Zhenqiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 9614 - 9631
  • [2] Exploring Large Language Models for Multi-Modal Out-of-Distribution Detection
    Dai, Yi
    Lang, Hao
    Zeng, Kaisheng
    Huang, Fei
    Li, Yongbin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5292 - 5305
  • [3] Generative Multi-Modal Knowledge Retrieval with Large Language Models
    Long, Xinwei
    Zeng, Jiali
    Meng, Fandong
    Ma, Zhiyuan
    Zhang, Kaiyan
    Zhou, Bowen
    Zhou, Jie
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18733 - 18741
  • [4] Multi-modal large language models in radiology: principles, applications, and potential
    Shen, Yiqiu
    Xu, Yanqi
    Ma, Jiajian
    Rui, Wushuang
    Zhao, Chen
    Heacock, Laura
    Huang, Chenchan
    ABDOMINAL RADIOLOGY, 2024,
  • [5] Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
    Xu, Yifan
    Zhang, Mengdan
    Yang, Xiaoshan
    Xu, Changsheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6253 - 6267
  • [6] Multi-modal Prompts with Feature Decoupling for Open-Vocabulary Object Detection
    Wang, Duorui
    Zhao, Xiaowei
    GENERALIZING FROM LIMITED RESOURCES IN THE OPEN WORLD, GLOW-IJCAI 2024, 2024, 2160 : 180 - 194
  • [7] Multi-modal Queried Object Detection in the Wild
    Xu, Yifan
    Zhang, Mengdan
    Fu, Chaoyou
    Chen, Peixian
    Yang, Xiaoshan
    Li, Ke
    Xu, Changsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] MOQAGPT: Zero-Shot Multi-modal Open-domain Question Answering with Large Language Models
    Zhang, Le
    Wu, Yihong
    Mo, Fengran
    Nie, Jian-Yun
    Agrawal, Aishwarya
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 1195 - 1210
  • [9] VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments
    Yang, Bang
    He, Lixing
    Liu, Kaiwei
    Yan, Zhenyu
    PROCEEDINGS 2024 IEEE INTERNATIONAL WORKSHOP ON FOUNDATION MODELS FOR CYBER-PHYSICAL SYSTEMS & INTERNET OF THINGS, FMSYS 2024, 2024, : 32 - 37
  • [10] LaMI: Large Language Models for Multi-Modal Human-Robot Interaction
    Wang, Chao
    Hasler, Stephan
    Tanneberg, Daniel
    Ocker, Felix
    Joublin, Frank
    Ceravola, Antonello
    Deigmoeller, Joerg
    Gienger, Michael
    EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,