Effective Multi-Species weed detection in complex wheat fields using Multi-Modal and Multi-View image fusion

被引:1
|
作者
Xu, Ke [1 ,2 ,3 ,4 ,5 ]
Xie, Qi [1 ,3 ,4 ,5 ]
Zhu, Yan [1 ,3 ,4 ,5 ]
Cao, Weixing [1 ]
Ni, Jun [1 ,3 ,4 ,5 ]
机构
[1] Nanjing Agr Univ, Coll Agr, Nanjing 210095, Peoples R China
[2] Anhui Polytech Univ, Coll Integrated Circuits, Wuhu 241000, Peoples R China
[3] Natl Engn & Technol Ctr Informat Agr, Nanjing 210095, Peoples R China
[4] Minist Educ, Engn Res Ctr Smart Agr, Nanjing 210095, Peoples R China
[5] Collaborat Innovat Ctr Modern Crop Prod Cosponsore, Nanjing 210095, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal image; Multi-view image; Weed detection; Deep learning; Complex wheat fields; CLASSIFICATION; SEGMENTATION; CAMERA; IDENTIFICATION; VEGETATION; INDEXES;
D O I
10.1016/j.compag.2025.109924
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Rapid and accurate acquisition of weed information in wheat fields is the precondition and key to precision weeding. Weed detection in open and complex wheat fields faces two challenges: 1) detection of grass weeds that have a similar appearance to wheat; and 2) weed detection under leaf occlusion. Dual-modal information, namely, red-green-blue (RGB) images and depth images can be introduced, which fundamentally overcomes limitations of single-modal image features in identifying grass weeds. Then, a dual-path Swin Transformer model was developed for multi-modal feature extraction. An alignment and attention module (AAM) was designed to realize the alignment and fusion of information in different modalities. Finally, multi-view images were introduced to break the limitation of leaf occlusion in natural wheat fields by completing the feature space of objects, and the multi-view information fusion method based on the common underlying principle of view agreement was proposed. The experimental results demonstrate that depth information is a valuable complement to RGB imagery, facilitating the detection of grass weeds and significantly enhancing weed detection accuracy in wheat fields. Compared with Deep Convolutional Neural Network (DCNN) model, dual-path Swin Transformer model and the model containing AAM have resulted in an improvement in weed detection accuracy by 6.6% and 11.03%, respectively. Additionally, incorporating multi-view information has effectively addressed the issue of leaf occlusion in weed detection, resulting in an 85.14% increase in weed detection accuracy. Furthermore, the problem of view divergence in multi-view learning has been resolved, reducing the false detection rate by 23.94% compared to the direct Combination method.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Multi-modal and multi-view image dataset for weeds detection in wheat field
    Xu, Ke
    Jiang, Zhijian
    Liu, Qihang
    Xie, Qi
    Zhu, Yan
    Cao, Weixing
    Ni, Jun
    FRONTIERS IN PLANT SCIENCE, 2022, 13
  • [2] Multi-View and Multi-Modal Action Recognition with Learned Fusion
    Ardianto, Sandy
    Hang, Hsueh-Ming
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1601 - 1604
  • [3] MULTI-VIEW AND MULTI-MODAL EVENT DETECTION UTILIZING TRANSFORMER-BASED MULTI-SENSOR FUSION
    Yasuda, Masahiro
    Ohishi, Yasunori
    Saito, Shoichiro
    Harado, Noboru
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4638 - 4642
  • [4] An approach to multi-modal multi-view video coding
    Zhang, Yun
    Jiang, Gangyi
    Yi, Wenjuan
    Yu, Mei
    Jiang, Zhidi
    Kim, Yong Deak
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1405 - +
  • [5] MV-BART: Multi-view BART for Multi-modal Sarcasm Detection
    Zhuang, Xingjie
    Zhou, Fengling
    Li, Zhixin
    PROCEEDINGS OF THE 33RD ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2024, 2024, : 3602 - 3611
  • [6] Multi-view Image Fusion
    Comino Trinidad, Marc
    Martin Brualla, Ricardo
    Kainz, Florian
    Kontkanen, Janne
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4100 - 4109
  • [7] A multi-view approach to multi-modal MRI cluster ensembles
    Mendez, Carlos Andres
    Summers, Paul
    Menegaz, Gloria
    MEDICAL IMAGING 2014: IMAGE PROCESSING, 2014, 9034
  • [8] A MULTI-VIEW APPROACH TO CONSENSUS CLUSTERING IN MULTI-MODAL MRI
    Mendez, C. Andres
    Menegaz, Gloria
    Summers, Paul
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] Multi-view Multi-modal Person Authentication from a Single Walking Image Sequence
    Muramatsu, Daigo
    Iwama, Haruyuki
    Makihara, Yasushi
    Yagi, Yasushi
    2013 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB), 2013,
  • [10] Automatic Medical Image Report Generation with Multi-view and Multi-modal Attention Mechanism
    Yang, Shaokang
    Niu, Jianwei
    Wu, Jiyan
    Liu, Xuefeng
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT III, 2020, 12454 : 687 - 699