Towards real-time embodied AI agent: a bionic visual encoding framework for mobile robotics

被引:2
|
作者
Hou, Xueyu [1 ]
Guan, Yongjie [1 ]
Han, Tao [2 ]
Wang, Cong [2 ]
机构
[1] Univ Maine, ECE Dept, Orono, ME 04469 USA
[2] New Jersey Inst Technol, ECE Dept, Newark, NJ USA
关键词
Mobile robotics; Visual encoding; Embodied AI; Computer vision; ICONIC MEMORY;
D O I
10.1007/s41315-024-00363-w
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Embodied artificial intelligence (AI) agents, which navigate and interact with their environment using sensors and actuators, are being applied for mobile robotic platforms with limited computing power, such as autonomous vehicles, drones, and humanoid robots. These systems make decisions through environmental perception from deep neural network (DNN)-based visual encoders. However, the constrained computational resources and the large amounts of visual data to be processed can create bottlenecks, such as taking almost 300 milliseconds per decision on an embedded GPU board (Jetson Xavier). Existing DNN acceleration methods need model retraining and can still reduce accuracy. To address these challenges, our paper introduces a bionic visual encoder framework, }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document}, to support real-time requirements of embodied AI agents. The proposed framework complements existing DNN acceleration techniques. Specifically, we integrate motion data to identify overlapping areas between consecutive frames, which reduces DNN workload by propagating encoding results. We bifurcate processing into high-resolution for task-critical areas and low-resolution for less-significant regions. This dual-resolution approach allows us to maintain task performance while lowering the overall computational demands. We evaluate }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document} across three robotic scenarios: autonomous driving, vision-and-language navigation, and drone navigation, using various DNN models and mobile platforms. }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document} outperforms baselines in speed (1.2-3. 3 x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}), performance (+4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+4\%$$\end{document} to +29%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+29\%$$\end{document}), and power consumption (-36%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-36\%$$\end{document} to -47%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-47\%$$\end{document}).
引用
收藏
页码:1038 / 1056
页数:19
相关论文
共 50 条
  • [21] Mobile agent technologies applied to real-time control systems
    Yang, SH
    MEASUREMENT & CONTROL, 2005, 38 (10): : 298 - 298
  • [22] A Concept of Dynamically Reconfigurable Real-time Vision System for Autonomous Mobile Robotics
    De Cabrol, Aymeric
    Garcia, Thibault
    Bonnin, Patrick
    Chetto, Maryline
    INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2008, 5 (02) : 174 - 184
  • [23] A Concept of Dynamically Reconfigurable Real-time Vision System for Autonomous Mobile Robotics
    Aymeric De Cabrol
    Thibault Garcia
    Patrick Bonnin
    Maryline Chetto
    International Journal of Automation & Computing, 2008, (02) : 174 - 184
  • [24] Impact of Real-time Visual Attention on Computer Vision Products and Cognitive Robotics
    Vikram, Tadmeri Narayan
    Tscherepanow, Marko
    Wrede, Britta
    PROCEEDINGS OF THE 2ND EUROPEAN FUTURE TECHNOLOGIES CONFERENCE AND EXHIBITION 2011 (FET 11), 2011, 7 : 332 - +
  • [25] Multiple mobile robots real-time visual search algorithm
    Yan, Caixia
    Zhan, Qiang
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND PATTERN RECOGNITION IN INDUSTRIAL ENGINEERING, 2010, 7820
  • [26] Real-time control of a mobile robot by using visual stimuli
    Istituto Elaborazione Segnali ed, Immagini - C.N.R., Bari, Italy
    Proc IEEE Int Conf Rob Autom, (1665-1670):
  • [27] Real-time automated visual inspection using mobile robots
    Vieira Neto, Hugo
    Nehmzow, Ulrich
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2007, 49 (03) : 293 - 307
  • [28] Real-time control of a mobile robot by using visual stimuli
    Ancona, N
    Branca, A
    ICRA '99: IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-4, PROCEEDINGS, 1999, : 1665 - 1670
  • [29] Real-time multiple mobile robots visual detection system
    Yan, Caixia
    Zhan, Qiang
    SENSOR REVIEW, 2011, 31 (03) : 228 - 238
  • [30] Toward Real-time and Cooperative Mobile Visual Sensing and Sharing
    Chen, Huihui
    Guo, Bin
    Yu, Zhiwen
    Han, Qi
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,