Multi-Modal Deep Learning for Weeds Detection in Wheat Field Based on RGB-D Images

被引：16

作者：

Xu, Ke ^{[1
,2
,3
,4
,5
]}

Zhu, Yan ^{[1
,2
,3
,4
,5
]}

Cao, Weixing ^{[1
,2
,3
,4
,5
]}

Jiang, Xiaoping ^{[1
,2
,3
,4
,5
]}

Jiang, Zhijian ^{[6
]}

Li, Shuailong ^{[6
]}

Ni, Jun ^{[1
,2
,3
,4
,5
]}

机构：

[1] Nanjing Agr Univ, Coll Agr, Nanjing, Peoples R China

[2] Natl Engn & Technol Ctr Informat Agr, Nanjing, Peoples R China

[3] Minist Educ, Engn Res Ctr Smart Agr, Nanjing, Peoples R China

[4] Jiangsu Key Lab Informat Agr, Nanjing, Peoples R China

[5] Jiangsu Collaborat Innovat Ctr Technol & Applicat, Nanjing, Peoples R China

[6] Nanjing Agr Univ, Coll Artificial Intelligence, Nanjing, Peoples R China

来源：

FRONTIERS IN PLANT SCIENCE | 2021年 / 12卷

基金：

中国国家自然科学基金;

关键词：

weeds detection; RGB-D image; multi-modal deep learning; machine learning; three-channel network; CROP; VISION; GROWTH; IMPACT; YIELD;

D O I：

10.3389/fpls.2021.732968

中图分类号：

Q94 [植物学];

学科分类号：

071001 ;

摘要：

Single-modal images carry limited information for features representation, and RGB images fail to detect grass weeds in wheat fields because of their similarity to wheat in shape. We propose a framework based on multi-modal information fusion for accurate detection of weeds in wheat fields in a natural environment, overcoming the limitation of single modality in weeds detection. Firstly, we recode the single-channel depth image into a new three-channel image like the structure of RGB image, which is suitable for feature extraction of convolutional neural network (CNN). Secondly, the multi-scale object detection is realized by fusing the feature maps output by different convolutional layers. The three-channel network structure is designed to take into account the independence of RGB and depth information, respectively, and the complementarity of multi-modal information, and the integrated learning is carried out by weight allocation at the decision level to realize the effective fusion of multi-modal information. The experimental results show that compared with the weed detection method based on RGB image, the accuracy of our method is significantly improved. Experiments with integrated learning shows that mean average precision (mAP) of 36.1% for grass weeds and 42.9% for broad-leaf weeds, and the overall detection precision, as indicated by intersection over ground truth (IoG), is 89.3%, with weights of RGB and depth images at alpha = 0.4 and beta = 0.3. The results suggest that our methods can accurately detect the dominant species of weeds in wheat fields, and that multi-modal fusion can effectively improve object detection performance.

引用

页数：10

共 50 条

[21] DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation
Yuan, Jianzhong
Zhou, Wujie
Luo, Ting
IEEE ACCESS, 2019, 7 : 169350 - 169358
[22] Multi-Modal RGB-D Scene Recognition Across Domains
Ferreri, Andrea
Bucci, Silvia
Tommasi, Tatiana
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2199 - 2208
[23] MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition
Wang, Anran
Cai, Jianfei
Lu, Jiwen
Cham, Tat-Jen
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1125 - 1133
[24] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
Sun, Chenwang
Zhang, Qing
Zhuang, Chenyu
Zhang, Mingqian
IMAGE AND VISION COMPUTING, 2024, 147
[25] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
Gao, Wei
Liao, Guibiao
Ma, Siwei
Li, Ge
Liang, Yongsheng
Lin, Weisi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
[26] Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
Sun, Peng
Zhang, Wenhu
Li, Songyuan
Guo, Yilin
Song, Congli
Li, Xi
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (11) : 2822 - 2841
[27] Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
Peng Sun
Wenhu Zhang
Songyuan Li
Yilin Guo
Congli Song
Xi Li
International Journal of Computer Vision, 2022, 130 : 2822 - 2841
[28] MULTI-MODAL FEATURE FUSION FOR ACTION RECOGNITION IN RGB-D SEQUENCES
Shahroudy, Amir
Wang, Gang
Ng, Tian-Tsong
2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 73 - 76
[29] Cross-Level Multi-Modal Features Learning With Transformer for RGB-D Object Recognition
Zhang, Ying
Yin, Maoliang
Wang, Heyong
Hua, Changchun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7121 - 7130
[30] Exploiting enhanced and robust RGB-D face representation via progressive multi-modal learning
Zhu, Yizhe
Gao, Jialin
Wu, Tianshu
Liu, Qiong
Zhou, Xi
PATTERN RECOGNITION LETTERS, 2023, 166 : 38 - 45

← 1 2 3 4 5 →