Multi-modal deep learning for Fuji apple detection using RGB-D cameras and their radiometric capabilities

被引:111
|
作者
Gene-Mola, Jordi [1 ]
Vilaplana, Veronica [2 ]
Rosell-Polo, Joan R. [1 ]
Morros, Josep-Ramon [2 ]
Ruiz-Hidalgo, Javier [2 ]
Gregorio, Eduard [1 ]
机构
[1] Univ Lleida UdL, Agrotetnio Ctr, Dept Agr & Forest Engn, Res Grp AgroICT & Precis Agr, Lleida, Catalonia, Spain
[2] Univ Politecn Cataluna, Dept Signal Theory & Commun, Barcelona, Catalonia, Spain
关键词
RGB-D; Multi-modal faster R-CNN; Convolutional neural networks; Fruit detection; Agricultural robotics; Fruit reflectance; TERRESTRIAL LASER SCANNER; FRUIT DETECTION; PRECISION AGRICULTURE; STRUCTURED LIGHT; ORCHARD; IMAGES; COLOR; LIDAR; TREE; SENSORS;
D O I
10.1016/j.compag.2019.05.016
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Fruit detection and localization will be essential for future agronomic management of fruit crops, with applications in yield prediction, yield mapping and automated harvesting. RGB-D cameras are promising sensors for fruit detection given that they provide geometrical information with color data. Some of these sensors work on the principle of time-of-flight (ToF) and, besides color and depth, provide the backscatter signal intensity. However, this radiometric capability has not been exploited for fruit detection applications. This work presents the KFuji RGB-DS database, composed of 967 multi-modal images containing a total of 12,839 Fuji apples. Compilation of the database allowed a study of the usefulness of fusing RGB-D and radiometric information obtained with Kinect v2 for fruit detection. To do so, the signal intensity was range corrected to overcome signal attenuation, obtaining an image that was proportional to the reflectance of the scene. A registration between RGB, depth and intensity images was then carried out. The Faster R-CNN model was adapted for use with five channel input images: color (RGB), depth (D) and range-corrected intensity signal (S). Results show an improvement of 4.46% in F1-score when adding depth and range-corrected intensity channels, obtaining an F1-score of 0.898 and an AP of 94.8% when all channels are used. From our experimental results, it can be concluded that the radiometric capabilities of ToF sensors give valuable information for fruit detection.
引用
收藏
页码:689 / 698
页数:10
相关论文
共 50 条
  • [21] MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition
    Wang, Anran
    Cai, Jianfei
    Lu, Jiwen
    Cham, Tat-Jen
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1125 - 1133
  • [22] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
    Sun, Chenwang
    Zhang, Qing
    Zhuang, Chenyu
    Zhang, Mingqian
    IMAGE AND VISION COMPUTING, 2024, 147
  • [23] Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras
    Ma, Lingni
    Stueckler, Joerg
    Kerl, Christian
    Cremers, Daniel
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 598 - 605
  • [24] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
    Gao, Wei
    Liao, Guibiao
    Ma, Siwei
    Li, Ge
    Liang, Yongsheng
    Lin, Weisi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
  • [25] Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
    Sun, Peng
    Zhang, Wenhu
    Li, Songyuan
    Guo, Yilin
    Song, Congli
    Li, Xi
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (11) : 2822 - 2841
  • [26] Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
    Peng Sun
    Wenhu Zhang
    Songyuan Li
    Yilin Guo
    Congli Song
    Xi Li
    International Journal of Computer Vision, 2022, 130 : 2822 - 2841
  • [27] MULTI-MODAL FEATURE FUSION FOR ACTION RECOGNITION IN RGB-D SEQUENCES
    Shahroudy, Amir
    Wang, Gang
    Ng, Tian-Tsong
    2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 73 - 76
  • [28] Cross-Level Multi-Modal Features Learning With Transformer for RGB-D Object Recognition
    Zhang, Ying
    Yin, Maoliang
    Wang, Heyong
    Hua, Changchun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7121 - 7130
  • [29] Exploiting enhanced and robust RGB-D face representation via progressive multi-modal learning
    Zhu, Yizhe
    Gao, Jialin
    Wu, Tianshu
    Liu, Qiong
    Zhou, Xi
    PATTERN RECOGNITION LETTERS, 2023, 166 : 38 - 45
  • [30] RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning
    Xiong, Zhitong
    Yuan, Yuan
    Wang, Qi
    IEEE ACCESS, 2019, 7 : 106739 - 106747