Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces

被引:0
|
作者
Chen, Zhiling [1 ]
Chen, Hanning [2 ]
Imani, Mohsen [2 ]
Chen, Ruimin [1 ]
Imani, Farhad [1 ]
机构
[1] Univ Connecticut, Sch Mech Aerosp & Mfg Engn, Storrs, CT 06269 USA
[2] Univ Calif Irvine, Dept Comp Sci, Irvine, CA USA
基金
美国国家科学基金会;
关键词
Personal protective equipment; Zero-shot object detection; Vision language model; Large language model; CONSTRUCTION; IDENTIFICATION;
D O I
10.1016/j.eswa.2024.125769
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Workplace accidents due to personal protective equipment (PPE) non-compliance raise serious safety concerns and lead to legal liabilities, financial penalties, and reputational damage. While object detection models have shown the capability to address this issue by identifying safety gear, most existing models, such as YOLO, Faster R-CNN, and SSD, are limited in verifying the fine-grained attributes of PPE across diverse workplace scenarios. Vision language models (VLMs) are gaining traction for detection tasks by leveraging the synergy between visual and textual information, offering a promising solution to traditional object detection limitations in PPE recognition. Nonetheless, VLMs face challenges inconsistently verifying PPE attributes due to the complexity and variability of workplace environments, requiring them to interpret context-specific language and visual cues simultaneously. We introduce Clip2Safety, an interpretable detection framework for diverse workplace safety compliance, which comprises four main modules: scene recognition, visual prompt, safety gear detection, and fine-grained verification. Scene recognition identifies the current scenario to determine the necessary safety gear. Visual prompt formulates specific visual cues needed for the detection process. Safety gear detection identifies whether the required safety gear is being worn according to the specified scenario. Lastly, fine-grained verification assesses whether the worn safety equipment meets the fine-grained attribute requirements. We conduct real-world case studies across six different scenarios. The results show that Clip2Safety not only demonstrates an accuracy improvement over state-of-the-art question-answering based VLMs but also achieves inference times that are 21x faster.
引用
收藏
页数:15
相关论文
共 50 条
  • [11] Simple Framework for Interpretable Fine-Grained Text Classification
    Battogtokh, Munkhtulga
    Luck, Michael
    Davidescu, Cosmin
    Borgo, Rita
    ARTIFICIAL INTELLIGENCE-ECAI 2023 INTERNATIONAL WORKSHOPS, PT 1, XAI3, TACTIFUL, XI-ML, SEDAMI, RAAIT, AI4S, HYDRA, AI4AI, 2023, 2024, 1947 : 398 - 425
  • [12] Fine-Grained Semantically Aligned Vision-Language Pre-Training
    Li, Juncheng
    He, Xin
    Wei, Longhui
    Qian, Long
    Zhu, Linchao
    Xie, Lingxi
    Zhuang, Yueting
    Tian, Qi
    Tang, Siliang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [13] Facial Expression Monitoring via Fine-Grained Vision-Language Alignment
    Ren, Weihong
    Gao, Yu
    Chen, Xiai
    Han, Zhi
    Wang, Zhiyong
    Wang, Jiaole
    Liu, Honghai
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024,
  • [14] PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition
    Zhang, Haosong
    Leong, Mei Chee
    Li, Liyuan
    Lin, Weisi
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 18857 - 18867
  • [15] Fine-Grained Complexity of Safety Verification
    Chini, Peter
    Meyer, Roland
    Saivasan, Prakash
    JOURNAL OF AUTOMATED REASONING, 2020, 64 (07) : 1419 - 1444
  • [16] Fine-Grained Complexity of Safety Verification
    Chini, Peter
    Meyer, Roland
    Saivasan, Prakash
    TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, TACAS 2018, PT II, 2018, 10806 : 20 - 37
  • [17] Fine-Grained Complexity of Safety Verification
    Peter Chini
    Roland Meyer
    Prakash Saivasan
    Journal of Automated Reasoning, 2020, 64 : 1419 - 1444
  • [18] Federated fine-grained prompts for vision-language models based on open-vocabulary object detection
    Li, Yu
    APPLIED INTELLIGENCE, 2025, 55 (07)
  • [19] Fine-Grained Fairness Analysis of Abusive Language Detection Systems with CheckList
    Manerba, Marta Marchiori
    Tonelli, Sara
    WOAH 2021: THE 5TH WORKSHOP ON ONLINE ABUSE AND HARMS, 2021, : 81 - 91
  • [20] Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks
    Wagner, Joerg
    Koehler, Jan Mathias
    Gindele, Tobias
    Hetzel, Leon
    Wiedemer, Jakob Thaddaeus
    Behnke, Sven
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9089 - 9099