Multi-modal deep learning for Fuji apple detection using RGB-D cameras and their radiometric capabilities

被引:111
|
作者
Gene-Mola, Jordi [1 ]
Vilaplana, Veronica [2 ]
Rosell-Polo, Joan R. [1 ]
Morros, Josep-Ramon [2 ]
Ruiz-Hidalgo, Javier [2 ]
Gregorio, Eduard [1 ]
机构
[1] Univ Lleida UdL, Agrotetnio Ctr, Dept Agr & Forest Engn, Res Grp AgroICT & Precis Agr, Lleida, Catalonia, Spain
[2] Univ Politecn Cataluna, Dept Signal Theory & Commun, Barcelona, Catalonia, Spain
关键词
RGB-D; Multi-modal faster R-CNN; Convolutional neural networks; Fruit detection; Agricultural robotics; Fruit reflectance; TERRESTRIAL LASER SCANNER; FRUIT DETECTION; PRECISION AGRICULTURE; STRUCTURED LIGHT; ORCHARD; IMAGES; COLOR; LIDAR; TREE; SENSORS;
D O I
10.1016/j.compag.2019.05.016
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Fruit detection and localization will be essential for future agronomic management of fruit crops, with applications in yield prediction, yield mapping and automated harvesting. RGB-D cameras are promising sensors for fruit detection given that they provide geometrical information with color data. Some of these sensors work on the principle of time-of-flight (ToF) and, besides color and depth, provide the backscatter signal intensity. However, this radiometric capability has not been exploited for fruit detection applications. This work presents the KFuji RGB-DS database, composed of 967 multi-modal images containing a total of 12,839 Fuji apples. Compilation of the database allowed a study of the usefulness of fusing RGB-D and radiometric information obtained with Kinect v2 for fruit detection. To do so, the signal intensity was range corrected to overcome signal attenuation, obtaining an image that was proportional to the reflectance of the scene. A registration between RGB, depth and intensity images was then carried out. The Faster R-CNN model was adapted for use with five channel input images: color (RGB), depth (D) and range-corrected intensity signal (S). Results show an improvement of 4.46% in F1-score when adding depth and range-corrected intensity channels, obtaining an F1-score of 0.898 and an AP of 94.8% when all channels are used. From our experimental results, it can be concluded that the radiometric capabilities of ToF sensors give valuable information for fruit detection.
引用
收藏
页码:689 / 698
页数:10
相关论文
共 50 条
  • [31] RGB-D Salient Object Detection Method Based on Multi-Modal Fusion and Contour Guidance
    Peng, Yanbin
    Feng, Mingkun
    Zheng, Zhijun
    IEEE ACCESS, 2023, 11 : 145217 - 145230
  • [32] RGB-D Image Saliency Detection Based on Multi-modal Feature-fused Supervision
    Liu Zhengyi
    Duan Quntao
    Shi Song
    Zhao Peng
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (04) : 997 - 1004
  • [33] An improved YOLOv7 network using RGB-D multi-modal feature fusion for tea shoots detection
    Wu, Yanxu
    Chen, Jianneng
    Wu, Shunkai
    Li, Hui
    He, Leiying
    Zhao, Runmao
    Wu, Chuanyu
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 216
  • [34] A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling
    Asif, Umar
    Bennamoun, Mohammed
    Sohel, Ferdous A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (09) : 2051 - 2065
  • [35] Learning a deeply supervised multi-modal RGB-D embedding for semantic scene and object category recognition
    Zaki, Hasan F. M.
    Shafait, Faisal
    Mian, Ajmal
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2017, 92 : 41 - 52
  • [36] MMPL-Net: multi-modal prototype learning for one-shot RGB-D segmentation
    Shan, Dexing
    Zhang, Yunzhou
    Liu, Xiaozheng
    Liu, Shitong
    Coleman, Sonya A.
    Kerr, Dermot
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (14): : 10297 - 10310
  • [37] MMPL-Net: multi-modal prototype learning for one-shot RGB-D segmentation
    Dexing Shan
    Yunzhou Zhang
    Xiaozheng Liu
    Shitong Liu
    Sonya A. Coleman
    Dermot Kerr
    Neural Computing and Applications, 2023, 35 : 10297 - 10310
  • [38] Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking
    Jiang, Ming-xin
    Deng, Chao
    Shan, Jing-song
    Wang, Yuan-yuan
    Jia, Yin-jie
    Sun, Xing
    INFORMATION FUSION, 2019, 50 : 1 - 8
  • [39] Eulerian Magnification of Multi-Modal RGB-D Video for Heart Rate Estimation
    Dosso, Yasmina Souley
    Bekele, Amente
    Green, James R.
    2018 IEEE INTERNATIONAL SYMPOSIUM ON MEDICAL MEASUREMENTS AND APPLICATIONS (MEMEA), 2018, : 642 - 647
  • [40] Real-Time Multi-Modal People Detection and Tracking of Mobile Robots with A RGB-D Sensor
    Huang, Wenchao
    Zhou, Bo
    Qian, Kun
    Fang, Fang
    Ma, Xudong
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2019), 2019, : 325 - 330