Semi-Open Set Object Detection Algorithm Leveraged by Multi-Modal Large Language Models

被引:0
|
作者
Wu, Kewei [1 ]
Wang, Yiran [1 ]
He, Xiaogang [1 ]
Yan, Jinyu [2 ]
Guo, Yang [2 ]
Jiang, Zhuqing [1 ]
Zhang, Xing [3 ]
Wang, Wei [3 ]
Xiong, Yongping [1 ]
Men, Aidong [1 ]
Xiao, Li [1 ]
机构
[1] School of Artificial Intelligence, Beijing University of Posts and Telecommunications, 10 Xitucheng Rd, Beijing,100876, China
[2] Beijing Zhuoshizhitong Technology Co., Ltd., Beijing,100096, China
[3] China Resources Digital Co., Ltd., Beijing,518049, China
关键词
D O I
10.3390/bdcc8120175
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [21] Class-Agnostic Object Detection with Multi-modal Transformer
    Maaz, Muhammad
    Rasheed, Hanoona
    Khan, Salman
    Khan, Fahad Shahbaz
    Anwer, Rao Muhammad
    Yang, Ming-Hsuan
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 512 - 531
  • [22] Human head detection using multi-modal object features
    Luo, Y
    Murphey, YL
    Khairallah, F
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 2134 - 2139
  • [23] Multi-Modal Streaming 3D Object Detection
    Abdelfattah, Mazen
    Yuan, Kaiwen
    Wang, Z. Jane
    Ward, Rabab
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6163 - 6170
  • [24] Hypergraph-Based Multi-Modal Representation for Open-Set 3D Object Retrieval
    Feng, Yifan
    Ji, Shuyi
    Liu, Yu-Shen
    Du, Shaoyi
    Dai, Qionghai
    Gao, Yue
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (04) : 2206 - 2223
  • [25] Open-set 3D model retrieval algorithm based on multi-modal fusion
    Mao, Fuxin
    Yang, Xu
    Cheng, Jiaqiang
    Peng, Tao
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (01): : 61 - 70
  • [26] Adaptive Open Set Recognition with Multi-modal Joint Metric Learning
    Fu, Yimin
    Liu, Zhunga
    Yang, Yanbo
    Xu, Linfeng
    Lan, Hua
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2022, 2022, 13534 : 631 - 644
  • [27] Demonstrating CAESURA: Language Models as Multi-Modal Query Planners
    Urban, Matthias
    Binnig, Carsten
    COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 472 - 475
  • [28] Multi-Modal Attribute Prompting for Vision-Language Models
    Liu, Xin
    Wu, Jiamin
    Yang, Wenfei
    Zhou, Xu
    Zhang, Tianzhu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11579 - 11591
  • [29] VirtuWander: Enhancing Multi-modal Interaction for Virtual Tour Guidance through Large Language Models
    Wang, Zhan
    Yuan, Lin-Ping
    Wang, Liangwei
    Jiang, Bingchuan
    Zeng, Wei
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2024, 2024,
  • [30] Multi-modal Language Models for Human-Robot Interaction
    Janssens, Ruben
    COMPANION OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024 COMPANION, 2024, : 109 - 111