Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces

被引:0
|
作者
Chen, Zhiling [1 ]
Chen, Hanning [2 ]
Imani, Mohsen [2 ]
Chen, Ruimin [1 ]
Imani, Farhad [1 ]
机构
[1] Univ Connecticut, Sch Mech Aerosp & Mfg Engn, Storrs, CT 06269 USA
[2] Univ Calif Irvine, Dept Comp Sci, Irvine, CA USA
基金
美国国家科学基金会;
关键词
Personal protective equipment; Zero-shot object detection; Vision language model; Large language model; CONSTRUCTION; IDENTIFICATION;
D O I
10.1016/j.eswa.2024.125769
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Workplace accidents due to personal protective equipment (PPE) non-compliance raise serious safety concerns and lead to legal liabilities, financial penalties, and reputational damage. While object detection models have shown the capability to address this issue by identifying safety gear, most existing models, such as YOLO, Faster R-CNN, and SSD, are limited in verifying the fine-grained attributes of PPE across diverse workplace scenarios. Vision language models (VLMs) are gaining traction for detection tasks by leveraging the synergy between visual and textual information, offering a promising solution to traditional object detection limitations in PPE recognition. Nonetheless, VLMs face challenges inconsistently verifying PPE attributes due to the complexity and variability of workplace environments, requiring them to interpret context-specific language and visual cues simultaneously. We introduce Clip2Safety, an interpretable detection framework for diverse workplace safety compliance, which comprises four main modules: scene recognition, visual prompt, safety gear detection, and fine-grained verification. Scene recognition identifies the current scenario to determine the necessary safety gear. Visual prompt formulates specific visual cues needed for the detection process. Safety gear detection identifies whether the required safety gear is being worn according to the specified scenario. Lastly, fine-grained verification assesses whether the worn safety equipment meets the fine-grained attribute requirements. We conduct real-world case studies across six different scenarios. The results show that Clip2Safety not only demonstrates an accuracy improvement over state-of-the-art question-answering based VLMs but also achieves inference times that are 21x faster.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] c-RNN: A Fine-Grained Language Model for Image Captioning
    Huang, Gengshi
    Hu, Haifeng
    NEURAL PROCESSING LETTERS, 2019, 49 (02) : 683 - 691
  • [32] Vulnerability Detection with Fine-Grained Interpretations
    Li, Yi
    Wang, Shaohua
    Nguyen, Tien N.
    PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 292 - 303
  • [33] Fine-Grained Event Trigger Detection
    Duong Minh Le
    Thien Huu Nguyen
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2745 - 2752
  • [34] Fine-Grained Controversy Detection in Wikipedia
    Bykau, Siarhei
    Korn, Flip
    Srivastava, Divesh
    Velegrakis, Yannis
    2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 1573 - 1584
  • [35] Fine-grained Design Pattern Detection
    Lebon, Maurice
    Tzerpos, Vassilios
    2012 IEEE 36TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2012, : 267 - 272
  • [36] Fine-grained analysis of language varieties and demographics
    Rangel, Francisco
    Rosso, Paolo
    Zaghouani, Wajdi
    Charfi, Anis
    NATURAL LANGUAGE ENGINEERING, 2020, 26 (06) : 641 - 661
  • [37] Modular safety checking for fine-grained concurrency
    Calcagno, Cristiano
    Parkinson, Matthew
    Vafeiadis, Viktor
    STATIC ANALYSIS, PROCEEDINGS, 2007, 4634 : 233 - +
  • [38] An Novel Interpretable Fine-grained Image Classification Model Based on Improved Neural Prototype Tree
    Cui, Jin'an
    Gong, Jinghao
    Wang, Guangchen
    Li, Jinbao
    Liu, Xiaoyu
    Liu, Song
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [39] Language-Guided Hierarchical Fine-Grained Image Forgery Detection and Localization
    Guo, Xiao
    Liu, Xiaohong
    Masi, Iacopo
    Liu, Xiaoming
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 2670 - 2691
  • [40] Fine-grained Vision-based Vehicle Classification
    Zahn, K.
    Caduff, A.
    Hofstetter, J.
    Rechsteiner, M.
    Bucher, P.
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN SIGNAL PROCESSING AND ARTIFICIAL INTELLIGENCE, ASPAI' 2020, 2020, : 112 - 114