Plastic waste identification based on multimodal feature selection and cross-modal Swin Transformer

被引:0
|
作者
Ji, Tianchen [1 ]
Fang, Huaiying [1 ]
Zhang, Rencheng [1 ]
Yang, Jianhong [1 ]
Wang, Zhifeng [2 ]
Wang, Xin [1 ]
机构
[1] Huaqiao Univ, Coll Mech Engn & Automat, Xiamen, Fujian, Peoples R China
[2] Xiamen Luhai Proenvironm Inc, Xiamen, Fujian, Peoples R China
关键词
Multimodal; Swin Transformer; Cross-modalfusion; Feature selection; Waste identification; CLASSIFICATION;
D O I
10.1016/j.wasman.2024.11.027
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The classification and recycling of municipal solid waste (MSW) are strategies for resource conservation and pollution prevention, with plastic waste identification being an essential component of waste sorting. Multimodal detection of solid waste has increasingly replaced single-modal methods constrained by limited informational capacity. However, existing hyperspectral feature selection algorithms and multimodal identification methods have yet to leverage cross-modal information exhaustively. Therefore, two RGB-hyperspectral image (RGB-HSI) multimodal instance segmentation datasets were constructed to support research in plastic waste sorting. A feature band selection algorithm based on the Activation Weight function was proposed to automatically select influential hyperspectral bands from multimodal data, thereby reducing the burden of data acquisition, transmission, and inference. Furthermore, the multimodal Selective Feature Network (SFNet) was introduced to balance information across various modalities and stages. Moreover, the Correlation Swin Transformer Block was proposed, specifically crafted to fuse cross-modal mutual information, which can be synergistically employed with SFNet to enhance multimodal recognition capabilities further. Experimental results show that the Activation Weight band selection function can select the most effective feature bands. At the same time, the Correlation SFSwin Transformer achieved the highest F1-scores of 97.85% and 97.37% in the two plastic waste object detection experiments, respectively. The source code and final models are available at https://github.com/Bazenr/Corr elation-SFSwin, and the dataset can be accessed at https://www.kaggle.com/datasets/bazenr/rgb-hsi-rgb-nirmunicipal-solid-waste.
引用
收藏
页码:58 / 68
页数:11
相关论文
共 50 条
  • [1] MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion
    Khan, Mustaqeem
    Tran, Phuong-Nam
    Pham, Nhat Truong
    El Saddik, Abdulmotaleb
    Othmani, Alice
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [2] Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
    Shukor, Mustafa
    Couairon, Guillaume
    Grechka, Asya
    Cord, Matthieu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4566 - 4577
  • [3] Cross-modal collaborative feature representation via Transformer-based multimodal mixers for RGB-T crowd
    Kong, Weihang
    Liu, Jiayu
    Hong, Yao
    Li, He
    Shen, Jienan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [4] CMJRT: Cross-Modal Joint Representation Transformer for Multimodal Sentiment Analysis
    Xu, Meng
    Liang, Feifei
    Su, Xiangyi
    Fang, Cheng
    IEEE ACCESS, 2022, 10 : 131671 - 131679
  • [5] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
    Cao Xiaopeng
    Zhang Linying
    Chen Qiuxian
    Ning Hailong
    Dong Yizhuo
    The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
  • [6] Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval
    Wang, Kaiye
    He, Ran
    Wang, Liang
    Wang, Wei
    Tan, Tieniu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (10) : 2010 - 2023
  • [7] A cross-modal crowd counting method combining CNN and cross-modal transformer
    Zhang, Shihui
    Wang, Wei
    Zhao, Weibo
    Wang, Lei
    Li, Qunpeng
    IMAGE AND VISION COMPUTING, 2023, 129
  • [8] Sports competition tactical analysis model of cross-modal transfer learning intelligent robot based on Swin Transformer and CLIP
    Jiang, Li
    Lu, Wang
    FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [9] SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images
    Li, Gary Y.
    Chen, Junyu
    Jang, Se-In
    Gong, Kuang
    Li, Quanzheng
    MEDICAL PHYSICS, 2024, 51 (03) : 2096 - 2107
  • [10] Cross-Modal Multitask Transformer for End-to-End Multimodal Aspect-Based Sentiment Analysis
    Yang, Li
    Na, Jin-Cheon
    Yu, Jianfei
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (05)