Probing vision and language models for construction waste material recognition

被引:1
|
作者
Sun, Ying [1 ,2 ]
Gu, Zhaolin [1 ]
Yang, Sean Bin [2 ,3 ]
机构
[1] Xi An Jiao Tong Univ, Sch Human Settlement & Civil Engn, Xian 710049, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China
[3] Aalborg Univ, Dept Comp Sci, DK-9220 Aalborg, Denmark
关键词
Automatic sorting system; Vision and language models; Bidirectional contrastive training; Construction material recognition;
D O I
10.1016/j.autcon.2024.105629
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Motivated by the critical role of automatic sorting in construction waste management, recent advancements have leveraged deep learning's ability to capture powerful features within unimodality-based recognition approaches. However, existing methods remain limited due to reliance on solely image-based datasets, restricting feature expression. To solve this, this paper introduces the VL-CSW dataset by considering both image and text modalities. Next, this paper proposes ConCLIP, , a vision-and-language model tailored for CSW recognition. ConCLIP incorporates a pre-feature interaction network for enhanced modality-specific feature learning and leverages a bidirectional contrastive training paradigm alongside supervised task training to optimize its performance across both modalities. Evaluation on VL-CSW datasets demonstrates the ConCLIP's 's superiority on CSW material classification task, significantly outperforming strong baselines in most settings. Notably, ConCLIP achieves performance improvements of 1.83% and 3.41% compared to unimodality methods in VL-Concrete and VL-Metal classification tasks, respectively, highlighting the efficacy of multi-modality in enhancing automatic sorting system performance.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Category-Specific Prompts for Animal Action Recognition with Pretrained Vision-Language Models
    Jing, Yinuo
    Wang, Chunyu
    Zhang, Ruxu
    Liang, Kongming
    Ma, Zhanyu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5716 - 5724
  • [32] Vision-Language Fusion for Object Recognition
    Shiang, Sz-Rung
    Rosenthal, Stephanie
    Gershman, Anatole
    Carbonell, Jaime
    Oh, Jean
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4603 - 4610
  • [33] Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition
    Sun, Hongbo
    He, Xiangteng
    Zhou, Jiahuan
    Peng, Yuxin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5828 - 5836
  • [34] Probing Fundamental Visual Comprehend Capabilities on Vision Language Models via Visual Phrases from Structural Data
    Xie, Peijin
    Liu, Bingquan
    COGNITIVE COMPUTATION, 2024, 16 (06) : 3484 - 3504
  • [35] Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors
    Kawaharazuka, Kento
    Obinata, Yoshiki
    Kanazawa, Naoaki
    Okada, Kei
    Inaba, Masayuki
    2023 IEEE-RAS 22ND INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, HUMANOIDS, 2023,
  • [36] Probing Pre-trained Auto-regressive Language Models for Named Entity Typing and Recognition
    Epure, Elena V.
    Hennequin, Romain
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1408 - 1417
  • [37] Debiasing vision-language models for vision tasks: a survey
    Zhu, Beier
    Zhang, Hanwang
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (01)
  • [38] Oman Construction Industry Prospective on Cause of Construction Material Waste
    Latif, Qadir Bux Alias Imran
    Al Batashi, Thuraya Bal Arab
    Qureshi, Kam Ran Latif
    INTERNATIONAL JOURNAL OF INTEGRATED ENGINEERING, 2020, 12 (01): : 243 - 252
  • [39] Development of sustainable construction material using construction and demolition waste
    Dakwale, V. A.
    Ralegamkar, R. V.
    INDIAN JOURNAL OF ENGINEERING AND MATERIALS SCIENCES, 2014, 21 (04) : 451 - 457
  • [40] Language-Agnostic Bias Detection in Language Models with Bias Probing
    Koeksall, Abdullatif
    Yalcin, Omer Faruk
    Akbiyik, Ahmet
    Kilavuz, M. Tahir
    Korhonen, Anna
    Schutze, Hinrich
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12735 - 12747