Probing vision and language models for construction waste material recognition

被引：1

作者：

Sun, Ying ^{[1
,2
]}

Gu, Zhaolin ^{[1
]}

Yang, Sean Bin ^{[2
,3
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Human Settlement & Civil Engn, Xian 710049, Peoples R China

[2] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China

[3] Aalborg Univ, Dept Comp Sci, DK-9220 Aalborg, Denmark

来源：

AUTOMATION IN CONSTRUCTION | 2024年 / 166卷

关键词：

Automatic sorting system; Vision and language models; Bidirectional contrastive training; Construction material recognition;

D O I：

10.1016/j.autcon.2024.105629

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Motivated by the critical role of automatic sorting in construction waste management, recent advancements have leveraged deep learning's ability to capture powerful features within unimodality-based recognition approaches. However, existing methods remain limited due to reliance on solely image-based datasets, restricting feature expression. To solve this, this paper introduces the VL-CSW dataset by considering both image and text modalities. Next, this paper proposes ConCLIP, , a vision-and-language model tailored for CSW recognition. ConCLIP incorporates a pre-feature interaction network for enhanced modality-specific feature learning and leverages a bidirectional contrastive training paradigm alongside supervised task training to optimize its performance across both modalities. Evaluation on VL-CSW datasets demonstrates the ConCLIP's 's superiority on CSW material classification task, significantly outperforming strong baselines in most settings. Notably, ConCLIP achieves performance improvements of 1.83% and 3.41% compared to unimodality methods in VL-Concrete and VL-Metal classification tasks, respectively, highlighting the efficacy of multi-modality in enhancing automatic sorting system performance.

引用

页数：14

共 50 条

[31] Category-Specific Prompts for Animal Action Recognition with Pretrained Vision-Language Models
Jing, Yinuo
Wang, Chunyu
Zhang, Ruxu
Liang, Kongming
Ma, Zhanyu
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5716 - 5724
[32] Vision-Language Fusion for Object Recognition
Shiang, Sz-Rung
Rosenthal, Stephanie
Gershman, Anatole
Carbonell, Jaime
Oh, Jean
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4603 - 4610
[33] Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition
Sun, Hongbo
He, Xiangteng
Zhou, Jiahuan
Peng, Yuxin
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5828 - 5836
[34] Probing Fundamental Visual Comprehend Capabilities on Vision Language Models via Visual Phrases from Structural Data
Xie, Peijin
Liu, Bingquan
COGNITIVE COMPUTATION, 2024, 16 (06) : 3484 - 3504
[35] Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors
Kawaharazuka, Kento
Obinata, Yoshiki
Kanazawa, Naoaki
Okada, Kei
Inaba, Masayuki
2023 IEEE-RAS 22ND INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, HUMANOIDS, 2023,
[36] Probing Pre-trained Auto-regressive Language Models for Named Entity Typing and Recognition
Epure, Elena V.
Hennequin, Romain
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1408 - 1417
[37] Debiasing vision-language models for vision tasks: a survey
Zhu, Beier
Zhang, Hanwang
FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (01)
[38] Oman Construction Industry Prospective on Cause of Construction Material Waste
Latif, Qadir Bux Alias Imran
Al Batashi, Thuraya Bal Arab
Qureshi, Kam Ran Latif
INTERNATIONAL JOURNAL OF INTEGRATED ENGINEERING, 2020, 12 (01): : 243 - 252
[39] Development of sustainable construction material using construction and demolition waste
Dakwale, V. A.
Ralegamkar, R. V.
INDIAN JOURNAL OF ENGINEERING AND MATERIALS SCIENCES, 2014, 21 (04) : 451 - 457
[40] Language-Agnostic Bias Detection in Language Models with Bias Probing
Koeksall, Abdullatif
Yalcin, Omer Faruk
Akbiyik, Ahmet
Kilavuz, M. Tahir
Korhonen, Anna
Schutze, Hinrich
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12735 - 12747

← 1 2 3 4 5 →