Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces

被引:0
|
作者
Chen, Zhiling [1 ]
Chen, Hanning [2 ]
Imani, Mohsen [2 ]
Chen, Ruimin [1 ]
Imani, Farhad [1 ]
机构
[1] Univ Connecticut, Sch Mech Aerosp & Mfg Engn, Storrs, CT 06269 USA
[2] Univ Calif Irvine, Dept Comp Sci, Irvine, CA USA
基金
美国国家科学基金会;
关键词
Personal protective equipment; Zero-shot object detection; Vision language model; Large language model; CONSTRUCTION; IDENTIFICATION;
D O I
10.1016/j.eswa.2024.125769
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Workplace accidents due to personal protective equipment (PPE) non-compliance raise serious safety concerns and lead to legal liabilities, financial penalties, and reputational damage. While object detection models have shown the capability to address this issue by identifying safety gear, most existing models, such as YOLO, Faster R-CNN, and SSD, are limited in verifying the fine-grained attributes of PPE across diverse workplace scenarios. Vision language models (VLMs) are gaining traction for detection tasks by leveraging the synergy between visual and textual information, offering a promising solution to traditional object detection limitations in PPE recognition. Nonetheless, VLMs face challenges inconsistently verifying PPE attributes due to the complexity and variability of workplace environments, requiring them to interpret context-specific language and visual cues simultaneously. We introduce Clip2Safety, an interpretable detection framework for diverse workplace safety compliance, which comprises four main modules: scene recognition, visual prompt, safety gear detection, and fine-grained verification. Scene recognition identifies the current scenario to determine the necessary safety gear. Visual prompt formulates specific visual cues needed for the detection process. Safety gear detection identifies whether the required safety gear is being worn according to the specified scenario. Lastly, fine-grained verification assesses whether the worn safety equipment meets the fine-grained attribute requirements. We conduct real-world case studies across six different scenarios. The results show that Clip2Safety not only demonstrates an accuracy improvement over state-of-the-art question-answering based VLMs but also achieves inference times that are 21x faster.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] PROMETHEUS- VISION: Vision-Language Model as a Judge for Fine-Grained Evaluation
    Lee, Seongyun
    Kim, Seungone
    Park, Sue Hyun
    Kim, Geewook
    Seo, Minjoon
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11286 - 11315
  • [2] Hierarchical Attention Network for Interpretable and Fine-Grained Vulnerability Detection
    Gu, Mianxue
    Feng, Hantao
    Sun, Hongyu
    Liu, Peng
    Yue, Qiuling
    Hu, Jinglu
    Cao, Chunjie
    Zhang, Yuqing
    IEEE INFOCOM 2022 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2022,
  • [3] Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization
    Chan, Hou Pong
    Zeng, Qi
    Ji, Heng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 6433 - 6444
  • [4] Fine-grained Image Classification via Combining Vision and Language
    He, Xiangteng
    Peng, Yuxin
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7332 - 7340
  • [5] Measuring Progress in Fine-grained Vision-and-Language Understanding
    Bugliarello, Emanuele
    Sartran, Laurent
    Agrawal, Aishwarya
    Hendricks, Lisa Anne
    Nematzadeh, Aida
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1559 - 1582
  • [6] Fine-Grained and Interpretable Neural Speech Editing
    Morrison, Max
    Churchwell, Cameron
    Pruyne, Nathan
    Pardo, Bryan
    INTERSPEECH 2024, 2024, : 187 - 191
  • [7] Fine-grained Language Identification with Multilingual CapsNet Model
    Verma, Mudit
    Buduru, Arun Balaji
    2020 IEEE SIXTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2020), 2020, : 94 - 102
  • [8] Auxiliary Fine-grained Alignment Constraints for Vision-and-Language Navigation
    Cui, Yibo
    Huang, Ruqiang
    Zhang, Yakun
    Cen, Yingjie
    Xie, Liang
    Yan, Ye
    Yin, Erwei
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2621 - 2626
  • [9] Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification
    Lan, Long
    Wang, Fengxiang
    Zheng, Xiangtao
    Wang, Zengmao
    Liu, Xinwang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [10] Abstractive Summarization Based on Fine-Grained Interpretable Matrix
    Wang H.
    Gao Y.
    Feng J.
    Hu M.
    Wang H.
    Bai Y.
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57 (01): : 23 - 30