Towards real-time embodied AI agent: a bionic visual encoding framework for mobile robotics

被引:2
|
作者
Hou, Xueyu [1 ]
Guan, Yongjie [1 ]
Han, Tao [2 ]
Wang, Cong [2 ]
机构
[1] Univ Maine, ECE Dept, Orono, ME 04469 USA
[2] New Jersey Inst Technol, ECE Dept, Newark, NJ USA
关键词
Mobile robotics; Visual encoding; Embodied AI; Computer vision; ICONIC MEMORY;
D O I
10.1007/s41315-024-00363-w
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Embodied artificial intelligence (AI) agents, which navigate and interact with their environment using sensors and actuators, are being applied for mobile robotic platforms with limited computing power, such as autonomous vehicles, drones, and humanoid robots. These systems make decisions through environmental perception from deep neural network (DNN)-based visual encoders. However, the constrained computational resources and the large amounts of visual data to be processed can create bottlenecks, such as taking almost 300 milliseconds per decision on an embedded GPU board (Jetson Xavier). Existing DNN acceleration methods need model retraining and can still reduce accuracy. To address these challenges, our paper introduces a bionic visual encoder framework, }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document}, to support real-time requirements of embodied AI agents. The proposed framework complements existing DNN acceleration techniques. Specifically, we integrate motion data to identify overlapping areas between consecutive frames, which reduces DNN workload by propagating encoding results. We bifurcate processing into high-resolution for task-critical areas and low-resolution for less-significant regions. This dual-resolution approach allows us to maintain task performance while lowering the overall computational demands. We evaluate }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document} across three robotic scenarios: autonomous driving, vision-and-language navigation, and drone navigation, using various DNN models and mobile platforms. }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document} outperforms baselines in speed (1.2-3. 3 x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}), performance (+4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+4\%$$\end{document} to +29%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+29\%$$\end{document}), and power consumption (-36%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-36\%$$\end{document} to -47%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-47\%$$\end{document}).
引用
收藏
页码:1038 / 1056
页数:19
相关论文
共 50 条
  • [31] Visual interaction for real-time navigation of autonomous mobile robots
    Della Vedova, Marco L.
    Facchinetti, Tullio
    Ferrara, Antonella
    Martinelli, Alessandro
    2009 INTERNATIONAL CONFERENCE ON CYBERWORLDS, 2009, : 211 - 218
  • [32] Real-time Automated Visual Inspection using Mobile Robots
    Hugo Vieira Neto
    Ulrich Nehmzow
    Journal of Intelligent and Robotic Systems, 2007, 49 : 293 - 307
  • [33] A framework for simulating real-time multi-agent systems
    Micacchi, Chris
    Cohen, Robin
    KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 17 (02) : 135 - 166
  • [34] A framework for simulating real-time multi-agent systems
    Chris Micacchi
    Robin Cohen
    Knowledge and Information Systems, 2008, 17 : 135 - 166
  • [35] The XBot Real-Time Software Framework for Robotics: From the Developer to the User Perspective
    Muratore, Luca
    Laurenzi, Arturo
    Mingo Hoffman, Enrico
    Tsagarakis, Nikos G.
    IEEE ROBOTICS & AUTOMATION MAGAZINE, 2020, 27 (03) : 133 - 143
  • [36] A Framework for Mobile Ad hoc Networks in Real-Time Maude
    Liu, Si
    Olveczky, Peter Csaba
    Meseguer, Jose
    REWRITING LOGIC AND ITS APPLICATIONS, WRLA 2014, 2014, 8663 : 162 - 177
  • [37] Adaptive Scheduling Framework for Real-Time Video Encoding on Heterogeneous Systems
    Ilic, Aleksandar
    Momcilovic, Svetislav
    Roma, Nuno
    Sousa, Leonel
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2016, 26 (03) : 597 - 611
  • [38] Towards Real-Time Monocular Depth Estimation For Mobile Systems
    Deldjoo, Yashar
    Di Noia, Tommaso
    Di Sciascio, Eugenio
    Pernisco, Gaetano
    Reno, Vito
    Stella, Ettore
    MULTIMODAL SENSING AND ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS II, 2021, 11785
  • [39] Towards a real-time data sharing system for mobile devices
    Bagale, Jiva N.
    Shiyanbola, Abdurrahman
    Moore, John P. T.
    Kheirkhahzadeh, Antonio D.
    2014 EIGHTH INTERNATIONAL CONFERENCE ON NEXT GENERATION MOBILE APPS, SERVICES AND TECHNOLOGIES (NGMAST), 2014, : 147 - 152
  • [40] A scalable framework for mobile real-time group communication services
    Naor, Zohar
    Das, Sajal K.
    COMPUTER NETWORKS, 2013, 57 (18) : 3855 - 3865