DART: An automated end-to-end object detection pipeline with data D iversification, open-vocabulary bounding box A nnotation, pseudo-label R eview, and model Training

被引:1
|
作者
Xin, Chen [1 ,2 ]
Hartel, Andreas [2 ]
Kasneci, Enkelejda [1 ]
机构
[1] Tech Univ Munich, Arcisstr 21, D-80333 Munich, Germany
[2] Liebherr Elect & Drives GmbH, Peter Dornier Str 11, D-88131 Lindau Bodensee, Germany
关键词
Open-vocabulary object detection (OVD); Data diversification; Pseudo-label; Large multimodal model (LMM); Stable diffusion; YOLO;
D O I
10.1016/j.eswa.2024.125124
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Accurate real-time object detection is vital across numerous industrial applications, from safety monitoring to quality control. Traditional approaches, however, are hindered by arduous manual annotation and data collection, struggling to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an innovative automated end-to-end pipeline that revolutionizes object detection workflows from data collection to model evaluation. It eliminates the need for laborious human labeling and extensive data collection while achieving outstanding accuracy across diverse scenarios. DART encompasses four key stages: (1) Data D iversification using subject-driven image generation (DreamBooth with SDXL), (2) A nnotation via open-vocabulary object detection (Grounding DINO) to generate bounding box and class labels, (3) R eview of generated images and pseudo-labels by large multimodal models (InternVL1.5 and GPT-4o) to guarantee credibility, and (4) T raining of real-time object detectors (YOLOv8 and YOLOv10) using the verified data. We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current instantiation of DART significantly increases average precision (AP) from 0.064 to 0.832. Its modular design ensures easy exchangeability and extensibility, allowing for future algorithm upgrades, seamless integration of new object categories, and adaptability to customized environments without manual labeling and additional data collection. The code and dataset are released at https://github.com/chen-xin-94/DART.
引用
收藏
页数:30
相关论文
empty
未找到相关数据