Plastic waste identification based on multimodal feature selection and cross-modal Swin Transformer

被引：0

作者：

Ji, Tianchen ^{[1
]}

Fang, Huaiying ^{[1
]}

Zhang, Rencheng ^{[1
]}

Yang, Jianhong ^{[1
]}

Wang, Zhifeng ^{[2
]}

Wang, Xin ^{[1
]}

机构：

[1] Huaqiao Univ, Coll Mech Engn & Automat, Xiamen, Fujian, Peoples R China

[2] Xiamen Luhai Proenvironm Inc, Xiamen, Fujian, Peoples R China

来源：

WASTE MANAGEMENT | 2025年 / 192卷

关键词：

Multimodal; Swin Transformer; Cross-modalfusion; Feature selection; Waste identification; CLASSIFICATION;

D O I：

10.1016/j.wasman.2024.11.027

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

The classification and recycling of municipal solid waste (MSW) are strategies for resource conservation and pollution prevention, with plastic waste identification being an essential component of waste sorting. Multimodal detection of solid waste has increasingly replaced single-modal methods constrained by limited informational capacity. However, existing hyperspectral feature selection algorithms and multimodal identification methods have yet to leverage cross-modal information exhaustively. Therefore, two RGB-hyperspectral image (RGB-HSI) multimodal instance segmentation datasets were constructed to support research in plastic waste sorting. A feature band selection algorithm based on the Activation Weight function was proposed to automatically select influential hyperspectral bands from multimodal data, thereby reducing the burden of data acquisition, transmission, and inference. Furthermore, the multimodal Selective Feature Network (SFNet) was introduced to balance information across various modalities and stages. Moreover, the Correlation Swin Transformer Block was proposed, specifically crafted to fuse cross-modal mutual information, which can be synergistically employed with SFNet to enhance multimodal recognition capabilities further. Experimental results show that the Activation Weight band selection function can select the most effective feature bands. At the same time, the Correlation SFSwin Transformer achieved the highest F1-scores of 97.85% and 97.37% in the two plastic waste object detection experiments, respectively. The source code and final models are available at https://github.com/Bazenr/Corr elation-SFSwin, and the dataset can be accessed at https://www.kaggle.com/datasets/bazenr/rgb-hsi-rgb-nirmunicipal-solid-waste.

引用

页码：58 / 68

页数：11

共 50 条

[1] MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion
Khan, Mustaqeem
Tran, Phuong-Nam
Pham, Nhat Truong
El Saddik, Abdulmotaleb
Othmani, Alice
SCIENTIFIC REPORTS, 2025, 15 (01):
[2] Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
Shukor, Mustafa
Couairon, Guillaume
Grechka, Asya
Cord, Matthieu
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4566 - 4577
[3] Cross-modal collaborative feature representation via Transformer-based multimodal mixers for RGB-T crowd
Kong, Weihang
Liu, Jiayu
Hong, Yao
Li, He
Shen, Jienan
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[4] CMJRT: Cross-Modal Joint Representation Transformer for Multimodal Sentiment Analysis
Xu, Meng
Liang, Feifei
Su, Xiangyi
Fang, Cheng
IEEE ACCESS, 2022, 10 : 131671 - 131679
[5] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
Cao Xiaopeng
Zhang Linying
Chen Qiuxian
Ning Hailong
Dong Yizhuo
The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
[6] Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval
Wang, Kaiye
He, Ran
Wang, Liang
Wang, Wei
Tan, Tieniu
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (10) : 2010 - 2023
[7] A cross-modal crowd counting method combining CNN and cross-modal transformer
Zhang, Shihui
Wang, Wei
Zhao, Weibo
Wang, Lei
Li, Qunpeng
IMAGE AND VISION COMPUTING, 2023, 129
[8] Sports competition tactical analysis model of cross-modal transfer learning intelligent robot based on Swin Transformer and CLIP
Jiang, Li
Lu, Wang
FRONTIERS IN NEUROROBOTICS, 2023, 17
[9] SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images
Li, Gary Y.
Chen, Junyu
Jang, Se-In
Gong, Kuang
Li, Quanzheng
MEDICAL PHYSICS, 2024, 51 (03) : 2096 - 2107
[10] Cross-Modal Multitask Transformer for End-to-End Multimodal Aspect-Based Sentiment Analysis
Yang, Li
Na, Jin-Cheon
Yu, Jianfei
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (05)

← 1 2 3 4 5 →