Dual-View Visual Contextualization for Web Navigation

被引:0
|
作者
Kil, Jihyung [1 ]
Song, Chan Hee [1 ]
Zheng, Boyuan [1 ]
Deng, Xiang [1 ]
Su, Yu [1 ]
Chao, We-Lun [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR52733.2024.01369
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (i.e., actionable elements and operations) of webpages. Nevertheless, HTML documents may not provide a clear task-related context for each element, making it hard to select the right (sequence of) actions. In this paper, we propose to contextualize HTML elements through their "dual views" in webpage screenshots: each HTML element has its corresponding bounding box and visual content in the screenshot. We build upon the insight-web developers tend to arrange task-related elements nearby on webpages to enhance user experiences-and propose to contextualize each element with its neighbor elements, using both textual and visual features. The resulting representations of HTML elements are more informative for the agent to take action. We validate our method on the recently released Mind2Web dataset, which features diverse navigation domains and tasks on real-world websites. Our method consistently outperforms the baseline in all the scenarios, including cross-task, cross-website, and cross-domain ones.
引用
收藏
页码:14445 / 14454
页数:10
相关论文
共 50 条
  • [31] Monoview/dual-view switchable liquid crystal display
    Chen, Chao Ping
    Lee, Joong Ha
    Yoon, Tae-Hoon
    Kim, Jae Chang
    OPTICS LETTERS, 2009, 34 (14) : 2222 - 2224
  • [32] Dual-view integral aging display using a polarizer
    Wu, Fei
    Zhao, Bai-Chuan
    Liu, Ze-Sheng
    Lv, Guo-Jiao
    APPLIED OPTICS, 2020, 59 (19) : 5785 - 5787
  • [33] Snapshot dual-view 3D imaging
    Ji, Chao
    Fang, Mengyan
    Xin, Liwei
    He, Kai
    Li, Yahui
    Wang, Xing
    Tian, Jinshou
    AIP ADVANCES, 2023, 13 (04)
  • [34] Optimal Placement of Fire Detector Using Dual-View
    Yang, Tan Ping
    Asirvadam, Vijanth Sagayan
    Bin Saad, Nordin
    2012 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT AND ADVANCED SYSTEMS (ICIAS), VOLS 1-2, 2012, : 599 - 603
  • [35] Dual-Graph Convolutional Network and Dual-View Fusion for Group Recommendation
    Zhou, Chenyang
    Zou, Guobing
    Hui, Shengxiang
    Lv, Hehe
    Wu, Liangrui
    Zhang, Bofeng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT V, PAKDD 2024, 2024, 14649 : 231 - 243
  • [36] Dual-view transport of intensity phase imaging flow cytometry
    Sun, Aihui
    Li, Yaxi
    Zhu, Pengfei
    He, Xiaoliang
    Jiang, Zhilong
    Kong, Yan
    Liu, Cheng
    Wang, Shouyu
    BIOMEDICAL OPTICS EXPRESS, 2023, 14 (10): : 5199 - 5207
  • [37] Exploring context switching and cognition in dual-view coordinated visualizations
    Convertino, G
    Chen, J
    Yost, B
    Ryu, YS
    North, C
    INTERNATIONAL CONFERENCE ON COORDINATED AND MULTIPLE VIEWS IN EXPLORATORY VISUALIZATION, PROCEEDINGS, 2003, : 55 - 62
  • [38] High-speed dual-view photoacoustic imaging pen
    Zhang, Wuyu
    Ma, Haigang
    Cheng, Zhongwen
    Wang, Zhiyang
    Xiong, Kedi
    Yang, Sihua
    OPTICS LETTERS, 2020, 45 (07) : 1599 - 1602
  • [39] Dual-View Desynchronization Hypergraph Learning for Dynamic Hyperedge Prediction
    Wang, Zhihui
    Chen, Jianrui
    Shao, Zhongshi
    Wang, Zhen
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (02) : 597 - 612
  • [40] Preliminary clinical evaluation of dual-view pinhole scintimammography.
    Peterson, NP
    Tsui, BMW
    Wessell, DE
    McCartney, WH
    JOURNAL OF NUCLEAR MEDICINE, 1998, 39 (05) : 25P - 25P