Multi-modal deep learning for Fuji apple detection using RGB-D cameras and their radiometric capabilities

被引：111

作者：

Gene-Mola, Jordi ^{[1
]}

Vilaplana, Veronica ^{[2
]}

Rosell-Polo, Joan R. ^{[1
]}

Morros, Josep-Ramon ^{[2
]}

Ruiz-Hidalgo, Javier ^{[2
]}

Gregorio, Eduard ^{[1
]}

机构：

[1] Univ Lleida UdL, Agrotetnio Ctr, Dept Agr & Forest Engn, Res Grp AgroICT & Precis Agr, Lleida, Catalonia, Spain

[2] Univ Politecn Cataluna, Dept Signal Theory & Commun, Barcelona, Catalonia, Spain

来源：

COMPUTERS AND ELECTRONICS IN AGRICULTURE | 2019年 / 162卷

关键词：

RGB-D; Multi-modal faster R-CNN; Convolutional neural networks; Fruit detection; Agricultural robotics; Fruit reflectance; TERRESTRIAL LASER SCANNER; FRUIT DETECTION; PRECISION AGRICULTURE; STRUCTURED LIGHT; ORCHARD; IMAGES; COLOR; LIDAR; TREE; SENSORS;

D O I：

10.1016/j.compag.2019.05.016

中图分类号：

S [农业科学];

学科分类号：

09 ;

摘要：

Fruit detection and localization will be essential for future agronomic management of fruit crops, with applications in yield prediction, yield mapping and automated harvesting. RGB-D cameras are promising sensors for fruit detection given that they provide geometrical information with color data. Some of these sensors work on the principle of time-of-flight (ToF) and, besides color and depth, provide the backscatter signal intensity. However, this radiometric capability has not been exploited for fruit detection applications. This work presents the KFuji RGB-DS database, composed of 967 multi-modal images containing a total of 12,839 Fuji apples. Compilation of the database allowed a study of the usefulness of fusing RGB-D and radiometric information obtained with Kinect v2 for fruit detection. To do so, the signal intensity was range corrected to overcome signal attenuation, obtaining an image that was proportional to the reflectance of the scene. A registration between RGB, depth and intensity images was then carried out. The Faster R-CNN model was adapted for use with five channel input images: color (RGB), depth (D) and range-corrected intensity signal (S). Results show an improvement of 4.46% in F1-score when adding depth and range-corrected intensity channels, obtaining an F1-score of 0.898 and an AP of 94.8% when all channels are used. From our experimental results, it can be concluded that the radiometric capabilities of ToF sensors give valuable information for fruit detection.

引用

页码：689 / 698

页数：10

共 50 条

[21] MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition
Wang, Anran
Cai, Jianfei
Lu, Jiwen
Cham, Tat-Jen
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1125 - 1133
[22] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
Sun, Chenwang
Zhang, Qing
Zhuang, Chenyu
Zhang, Mingqian
IMAGE AND VISION COMPUTING, 2024, 147
[23] Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras
Ma, Lingni
Stueckler, Joerg
Kerl, Christian
Cremers, Daniel
2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 598 - 605
[24] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
Gao, Wei
Liao, Guibiao
Ma, Siwei
Li, Ge
Liang, Yongsheng
Lin, Weisi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
[25] Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
Sun, Peng
Zhang, Wenhu
Li, Songyuan
Guo, Yilin
Song, Congli
Li, Xi
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (11) : 2822 - 2841
[26] Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
Peng Sun
Wenhu Zhang
Songyuan Li
Yilin Guo
Congli Song
Xi Li
International Journal of Computer Vision, 2022, 130 : 2822 - 2841
[27] MULTI-MODAL FEATURE FUSION FOR ACTION RECOGNITION IN RGB-D SEQUENCES
Shahroudy, Amir
Wang, Gang
Ng, Tian-Tsong
2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 73 - 76
[28] Cross-Level Multi-Modal Features Learning With Transformer for RGB-D Object Recognition
Zhang, Ying
Yin, Maoliang
Wang, Heyong
Hua, Changchun
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7121 - 7130
[29] Exploiting enhanced and robust RGB-D face representation via progressive multi-modal learning
Zhu, Yizhe
Gao, Jialin
Wu, Tianshu
Liu, Qiong
Zhou, Xi
PATTERN RECOGNITION LETTERS, 2023, 166 : 38 - 45
[30] RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning
Xiong, Zhitong
Yuan, Yuan
Wang, Qi
IEEE ACCESS, 2019, 7 : 106739 - 106747

← 1 2 3 4 5 →