How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

被引:0
|
作者
Yao, Yiyang [1 ]
Liu, Peng [2 ]
Zhao, Tiancheng [3 ]
Zhang, Qianqian [2 ]
Liao, Jiajia [3 ]
Fang, Chunxin [3 ]
Lee, Kyusong [3 ]
Wang, Qing [1 ]
机构
[1] Northwestern Polytech Univ, Xian, Peoples R China
[2] Linker Technol Res Co Ltd, Seogwipo, South Korea
[3] Zhejiang Univ, Binjiang Inst, Hangzhou, Zhejiang, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7 | 2024年
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection (OD) in computer vision has made significant progress in recent years, transitioning from closed-set labels to open-vocabulary detection (OVD) based on large-scale vision-language pre-training (VLP). However, current evaluation methods and datasets are limited to testing generalization over object types and referral expressions, which do not provide a systematic, fine-grained, and accurate benchmark of OVD models' abilities. In this paper, we propose a new benchmark named OVDEval, which includes 9 sub-tasks and introduces evaluations on commonsense knowledge, attribute understanding, position understanding, object relation comprehension, and more. The dataset is meticulously created to provide hard negatives that challenge models' true understanding of visual and linguistic input. Additionally, we identify a problem with the popular Average Precision (AP) metric when benchmarking models on these fine-grained label datasets and propose a new metric called Non-Maximum Suppression Average Precision (NMS-AP) to address this issue. Extensive experimental results show that existing top OVD models all fail on the new tasks except for simple object types, demonstrating the value of the proposed dataset in pinpointing the weakness of current OVD models and guiding future research. Furthermore, the proposed NMS-AP metric is verified by experiments to provide a much more truthful evaluation of OVD models, whereas traditional AP metrics yield deceptive results. Data is available at https://github.com/om-ai-lab/OVDEval
引用
收藏
页码:6630 / 6638
页数:9
相关论文
共 50 条
  • [1] Open-vocabulary Attribute Detection
    Bravo, Maria A.
    Mittal, Sudhanshu
    Ging, Simon
    Brox, Thomas
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7041 - 7050
  • [2] Open-Vocabulary Object Detection With an Open Corpus
    Wang, Jiong
    Zhang, Huiming
    Hong, Haiwen
    Jin, Xuan
    He, Yuan
    Xue, Hui
    Zhao, Zhou
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6736 - 6746
  • [3] Scaling Open-Vocabulary Object Detection
    Minderer, Matthias
    Gritsenko, Alexey
    Houlsby, Neil
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Simple Open-Vocabulary Object Detection
    Minderer, Matthias
    Gritsenko, Alexey
    Stone, Austin
    Neumann, Maxim
    Weissenborn, Dirk
    Dosovitskiy, Alexey
    Mahendran, Aravindh
    Arnab, Anurag
    Dehghani, Mostafa
    Shen, Zhuoran
    Wang, Xiao
    Zhai, Xiaohua
    Kipf, Thomas
    Houlsby, Neil
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 728 - 755
  • [5] Open-Vocabulary Object Detection Using Captions
    Zareian, Alireza
    Dela Rosa, Kevin
    Hu, Derek Hao
    Chang, Shih-Fu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14388 - 14397
  • [6] A Simple Framework for Open-Vocabulary Segmentation and Detection
    Zhang, Hao
    Li, Feng
    Zou, Xueyan
    Liu, Shilong
    Li, Chunyuan
    Yang, Jianwei
    Zhang, Lei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1020 - 1031
  • [7] Weakly Supervised Open-Vocabulary Object Detection
    Lin, Jianghang
    Shen, Yunhang
    Wang, Bingquan
    Lin, Shaohui
    Li, Ke
    Cao, Liujuan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3404 - 3412
  • [8] Generalization Boosted Adapter for Open-Vocabulary Segmentation
    Xu, Wenhao
    Wang, Changwei
    Feng, Xuxiang
    Xu, Rongtao
    Huang, Longzhao
    Zhang, Zherui
    Guo, Li
    Xu, Shibiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 520 - 533
  • [9] Aligning Bag of Regions for Open-Vocabulary Object Detection
    Wu, Size
    Zhang, Wenwei
    Jin, Sheng
    Liu, Wentao
    Loy, Chen Change
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15254 - 15264
  • [10] SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
    Liu, Mingxuan
    Hayes, Tyler L.
    Ricci, Elisa
    Csurka, Gabriela
    Volpi, Riccardo
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16634 - 16644