How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

被引：0

作者：

Yao, Yiyang ^{[1
]}

Liu, Peng ^{[2
]}

Zhao, Tiancheng ^{[3
]}

Zhang, Qianqian ^{[2
]}

Liao, Jiajia ^{[3
]}

Fang, Chunxin ^{[3
]}

Lee, Kyusong ^{[3
]}

Wang, Qing ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Xian, Peoples R China

[2] Linker Technol Res Co Ltd, Seogwipo, South Korea

[3] Zhejiang Univ, Binjiang Inst, Hangzhou, Zhejiang, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7 | 2024年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object detection (OD) in computer vision has made significant progress in recent years, transitioning from closed-set labels to open-vocabulary detection (OVD) based on large-scale vision-language pre-training (VLP). However, current evaluation methods and datasets are limited to testing generalization over object types and referral expressions, which do not provide a systematic, fine-grained, and accurate benchmark of OVD models' abilities. In this paper, we propose a new benchmark named OVDEval, which includes 9 sub-tasks and introduces evaluations on commonsense knowledge, attribute understanding, position understanding, object relation comprehension, and more. The dataset is meticulously created to provide hard negatives that challenge models' true understanding of visual and linguistic input. Additionally, we identify a problem with the popular Average Precision (AP) metric when benchmarking models on these fine-grained label datasets and propose a new metric called Non-Maximum Suppression Average Precision (NMS-AP) to address this issue. Extensive experimental results show that existing top OVD models all fail on the new tasks except for simple object types, demonstrating the value of the proposed dataset in pinpointing the weakness of current OVD models and guiding future research. Furthermore, the proposed NMS-AP metric is verified by experiments to provide a much more truthful evaluation of OVD models, whereas traditional AP metrics yield deceptive results. Data is available at https://github.com/om-ai-lab/OVDEval

引用

页码：6630 / 6638

页数：9

共 50 条

[1] Open-vocabulary Attribute Detection
Bravo, Maria A.
Mittal, Sudhanshu
Ging, Simon
Brox, Thomas
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7041 - 7050
[2] Open-Vocabulary Object Detection With an Open Corpus
Wang, Jiong
Zhang, Huiming
Hong, Haiwen
Jin, Xuan
He, Yuan
Xue, Hui
Zhao, Zhou
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6736 - 6746
[3] Scaling Open-Vocabulary Object Detection
Minderer, Matthias
Gritsenko, Alexey
Houlsby, Neil
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[4] Simple Open-Vocabulary Object Detection
Minderer, Matthias
Gritsenko, Alexey
Stone, Austin
Neumann, Maxim
Weissenborn, Dirk
Dosovitskiy, Alexey
Mahendran, Aravindh
Arnab, Anurag
Dehghani, Mostafa
Shen, Zhuoran
Wang, Xiao
Zhai, Xiaohua
Kipf, Thomas
Houlsby, Neil
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 728 - 755
[5] Open-Vocabulary Object Detection Using Captions
Zareian, Alireza
Dela Rosa, Kevin
Hu, Derek Hao
Chang, Shih-Fu
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14388 - 14397
[6] A Simple Framework for Open-Vocabulary Segmentation and Detection
Zhang, Hao
Li, Feng
Zou, Xueyan
Liu, Shilong
Li, Chunyuan
Yang, Jianwei
Zhang, Lei
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1020 - 1031
[7] Weakly Supervised Open-Vocabulary Object Detection
Lin, Jianghang
Shen, Yunhang
Wang, Bingquan
Lin, Shaohui
Li, Ke
Cao, Liujuan
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3404 - 3412
[8] Generalization Boosted Adapter for Open-Vocabulary Segmentation
Xu, Wenhao
Wang, Changwei
Feng, Xuxiang
Xu, Rongtao
Huang, Longzhao
Zhang, Zherui
Guo, Li
Xu, Shibiao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 520 - 533
[9] Aligning Bag of Regions for Open-Vocabulary Object Detection
Wu, Size
Zhang, Wenwei
Jin, Sheng
Liu, Wentao
Loy, Chen Change
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15254 - 15264
[10] SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Liu, Mingxuan
Hayes, Tyler L.
Ricci, Elisa
Csurka, Gabriela
Volpi, Riccardo
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16634 - 16644

← 1 2 3 4 5 →