Multi-Modal Deep Learning for Weeds Detection in Wheat Field Based on RGB-D Images

被引:16
|
作者
Xu, Ke [1 ,2 ,3 ,4 ,5 ]
Zhu, Yan [1 ,2 ,3 ,4 ,5 ]
Cao, Weixing [1 ,2 ,3 ,4 ,5 ]
Jiang, Xiaoping [1 ,2 ,3 ,4 ,5 ]
Jiang, Zhijian [6 ]
Li, Shuailong [6 ]
Ni, Jun [1 ,2 ,3 ,4 ,5 ]
机构
[1] Nanjing Agr Univ, Coll Agr, Nanjing, Peoples R China
[2] Natl Engn & Technol Ctr Informat Agr, Nanjing, Peoples R China
[3] Minist Educ, Engn Res Ctr Smart Agr, Nanjing, Peoples R China
[4] Jiangsu Key Lab Informat Agr, Nanjing, Peoples R China
[5] Jiangsu Collaborat Innovat Ctr Technol & Applicat, Nanjing, Peoples R China
[6] Nanjing Agr Univ, Coll Artificial Intelligence, Nanjing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
weeds detection; RGB-D image; multi-modal deep learning; machine learning; three-channel network; CROP; VISION; GROWTH; IMPACT; YIELD;
D O I
10.3389/fpls.2021.732968
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Single-modal images carry limited information for features representation, and RGB images fail to detect grass weeds in wheat fields because of their similarity to wheat in shape. We propose a framework based on multi-modal information fusion for accurate detection of weeds in wheat fields in a natural environment, overcoming the limitation of single modality in weeds detection. Firstly, we recode the single-channel depth image into a new three-channel image like the structure of RGB image, which is suitable for feature extraction of convolutional neural network (CNN). Secondly, the multi-scale object detection is realized by fusing the feature maps output by different convolutional layers. The three-channel network structure is designed to take into account the independence of RGB and depth information, respectively, and the complementarity of multi-modal information, and the integrated learning is carried out by weight allocation at the decision level to realize the effective fusion of multi-modal information. The experimental results show that compared with the weed detection method based on RGB image, the accuracy of our method is significantly improved. Experiments with integrated learning shows that mean average precision (mAP) of 36.1% for grass weeds and 42.9% for broad-leaf weeds, and the overall detection precision, as indicated by intersection over ground truth (IoG), is 89.3%, with weights of RGB and depth images at alpha = 0.4 and beta = 0.3. The results suggest that our methods can accurately detect the dominant species of weeds in wheat fields, and that multi-modal fusion can effectively improve object detection performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation
    Yuan, Jianzhong
    Zhou, Wujie
    Luo, Ting
    IEEE ACCESS, 2019, 7 : 169350 - 169358
  • [22] Multi-Modal RGB-D Scene Recognition Across Domains
    Ferreri, Andrea
    Bucci, Silvia
    Tommasi, Tatiana
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2199 - 2208
  • [23] MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition
    Wang, Anran
    Cai, Jianfei
    Lu, Jiwen
    Cham, Tat-Jen
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1125 - 1133
  • [24] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
    Sun, Chenwang
    Zhang, Qing
    Zhuang, Chenyu
    Zhang, Mingqian
    IMAGE AND VISION COMPUTING, 2024, 147
  • [25] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
    Gao, Wei
    Liao, Guibiao
    Ma, Siwei
    Li, Ge
    Liang, Yongsheng
    Lin, Weisi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
  • [26] Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
    Sun, Peng
    Zhang, Wenhu
    Li, Songyuan
    Guo, Yilin
    Song, Congli
    Li, Xi
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (11) : 2822 - 2841
  • [27] Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
    Peng Sun
    Wenhu Zhang
    Songyuan Li
    Yilin Guo
    Congli Song
    Xi Li
    International Journal of Computer Vision, 2022, 130 : 2822 - 2841
  • [28] MULTI-MODAL FEATURE FUSION FOR ACTION RECOGNITION IN RGB-D SEQUENCES
    Shahroudy, Amir
    Wang, Gang
    Ng, Tian-Tsong
    2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 73 - 76
  • [29] Cross-Level Multi-Modal Features Learning With Transformer for RGB-D Object Recognition
    Zhang, Ying
    Yin, Maoliang
    Wang, Heyong
    Hua, Changchun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7121 - 7130
  • [30] Exploiting enhanced and robust RGB-D face representation via progressive multi-modal learning
    Zhu, Yizhe
    Gao, Jialin
    Wu, Tianshu
    Liu, Qiong
    Zhou, Xi
    PATTERN RECOGNITION LETTERS, 2023, 166 : 38 - 45