Towards an Exhaustive Evaluation of Vision-Language Foundation Models

被引:0
|
作者
Salin, Emmanuelle [1 ]
Ayache, Stephane [1 ]
Favre, Benoit [1 ]
机构
[1] Univ Toulon & Var, Aix Marseille Univ, CNRS, LIS, Marseille, France
关键词
D O I
10.1109/ICCVW60793.2023.00041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-language foundation models have had considerable increase in performances in the last few years. However, there is still a lack comprehensive evaluation methods able to clearly explain their performances. We argue that a more systematic approach to foundation model evaluation would be beneficial to their use in real-world applications. In particular, we think that those models should be evaluated on a broad range of precise capabilities, in order to bring awareness to the width of their scope and their potential weaknesses. To that end, we propose a methodology to build a taxonomy of multimodal capabilities for vision-language foundation models. The proposed taxonomy is intended as a first step towards an exhaustive evaluation of vision-language foundation models.
引用
收藏
页码:339 / 352
页数:14
相关论文
共 50 条
  • [1] Equivariant Similarity for Vision-Language Foundation Models
    Wang, Tan
    Lin, Kevin
    Li, Linjie
    Lin, Chung-Ching
    Yang, Zhengyuan
    Zhang, Hanwang
    Liu, Zicheng
    Wang, Lijuan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11964 - 11974
  • [2] Towards Better Vision-Inspired Vision-Language Models
    Cao, Yun-Hao
    Ji, Kaixiang
    Huang, Ziyuan
    Zheng, Chuanyang
    Liu, Jiajia
    Wang, Jian
    Chen, Jingdong
    Yang, Ming
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13537 - 13547
  • [3] Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
    Zhang, Xinsong
    Zeng, Yan
    Zhang, Jipeng
    Li, Hang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 551 - 568
  • [4] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
    Ma, Zixian
    Hong, Jerry
    Gul, Mustafa Omer
    Ciandhi, Mona
    Geo, Irena
    krishna, Ranjay
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10910 - 10921
  • [5] Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models
    Yang, Yi
    Zhang, Qingwen
    Ikemura, Kei
    Batool, Nazre
    Folkesson, John
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 2405 - 2412
  • [6] Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
    Peng, Wenshuo
    Zhang, Kaipeng
    Yang, Yue
    Zhang, Hao
    Qiao, Yu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4506 - 4514
  • [7] Vision-language foundation model for echocardiogram interpretation
    Christensen, Matthew
    Vukadinovic, Milos
    Yuan, Neal
    Ouyang, David
    NATURE MEDICINE, 2024, 30 (05) : 1481 - +
  • [8] A vision-language foundation model for precision oncology
    Xiang, Jinxi
    Wang, Xiyue
    Zhang, Xiaoming
    Xi, Yinghua
    Eweje, Feyisope
    Chen, Yijiang
    Li, Yuchen
    Bergstrom, Colin
    Gopaulchan, Matthew
    Kim, Ted
    Yu, Kun-Hsing
    Willens, Sierra
    Olguin, Francesca Maria
    Nirschl, Jeffrey J.
    Neal, Joel
    Diehn, Maximilian
    Yang, Sen
    Li, Ruijiang
    NATURE, 2025, : 769 - 778
  • [9] A vision-language foundation model for clinical oncology
    Skourti, Eleni
    NATURE CANCER, 2025, 6 (02) : 226 - 226
  • [10] Vision-Language Models for Vision Tasks: A Survey
    Zhang, Jingyi
    Huang, Jiaxing
    Jin, Sheng
    Lu, Shijian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644