Dual-View Visual Contextualization for Web Navigation

被引:0
|
作者
Kil, Jihyung [1 ]
Song, Chan Hee [1 ]
Zheng, Boyuan [1 ]
Deng, Xiang [1 ]
Su, Yu [1 ]
Chao, We-Lun [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR52733.2024.01369
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (i.e., actionable elements and operations) of webpages. Nevertheless, HTML documents may not provide a clear task-related context for each element, making it hard to select the right (sequence of) actions. In this paper, we propose to contextualize HTML elements through their "dual views" in webpage screenshots: each HTML element has its corresponding bounding box and visual content in the screenshot. We build upon the insight-web developers tend to arrange task-related elements nearby on webpages to enhance user experiences-and propose to contextualize each element with its neighbor elements, using both textual and visual features. The resulting representations of HTML elements are more informative for the agent to take action. We validate our method on the recently released Mind2Web dataset, which features diverse navigation domains and tasks on real-world websites. Our method consistently outperforms the baseline in all the scenarios, including cross-task, cross-website, and cross-domain ones.
引用
收藏
页码:14445 / 14454
页数:10
相关论文
共 50 条
  • [41] Compact polarization-based dual-view panoramic lens
    Luo, Yujie
    Huang, Xiao
    Bai, Jian
    Liang, Rongguang
    APPLIED OPTICS, 2017, 56 (22) : 6283 - 6287
  • [42] OBJECT POSITION MEASURING BASED ON ADJUSTABLE DUAL-VIEW CAMERA
    Song, Xiaowei
    Wu, Yuanzhao
    Yang, Lei
    Liu, Zhong
    ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
  • [43] Unified Dual-view Cognitive Model for Interpretable Claim Verification
    Wu, Lianwei
    Rao, Yuan
    Lan, Yuqian
    Sun, Ling
    Qi, Zhaoyin
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 59 - 68
  • [44] An Improved Algorithm for Real-Time Dual-View Display
    Sun, Wenjie
    Gao, Zhongpai
    Zhai, Guangtao
    Zhang, Jiahe
    Wang, Zhaodi
    Zhu, Yucheng
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [45] Overlapping-free dual-view integral imaging display
    Zhong, Fei-Yan
    Deng, Huan
    Guo, Zhao-Da
    Wang, Tian-Hao
    Li, Qiang
    Chen, Cong
    OPTICS COMMUNICATIONS, 2022, 512
  • [46] Convergent Point Positioning Methods for Dual-view Stereo Camera
    Song, Xiaowei
    Yang, Lei
    Wu, Yuanzhao
    FIFTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2013), 2013, 8878
  • [47] Crosstalk-free dual-view integral imaging display
    Wu, Fei
    Liu, Ze-Sheng
    Deng, Hui
    JOURNAL OF THE SOCIETY FOR INFORMATION DISPLAY, 2019, 27 (09) : 555 - 558
  • [48] Dual-view 3-D display with multiple resolutions
    Lv, Guo-Jiao
    Zhao, Bai-Chuan
    Wu, Fei
    JOURNAL OF THE SOCIETY FOR INFORMATION DISPLAY, 2021, 29 (09) : 697 - 703
  • [49] Dual-view liquid crystal display fabricated by patterned electrodes
    Hsieh, Chia-Ting
    Shu, Jian-Nan
    Chen, Hui-Ting
    Huang, Chi-Yen
    Tian, Ching-Jui
    Lin, Chi-Huang
    OPTICS EXPRESS, 2012, 20 (08): : 8641 - 8648
  • [50] Dual-view photoacoustic microscopy for quantitative cell nuclear imaging
    Cai, De
    Wong, Terence T. W.
    Zhu, Liren
    Shi, Junhui
    Chen, Sung-Liang
    Wang, Lihong V.
    OPTICS LETTERS, 2018, 43 (20) : 4875 - 4878