Dual-View Visual Contextualization for Web Navigation

被引:0
|
作者
Kil, Jihyung [1 ]
Song, Chan Hee [1 ]
Zheng, Boyuan [1 ]
Deng, Xiang [1 ]
Su, Yu [1 ]
Chao, We-Lun [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR52733.2024.01369
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (i.e., actionable elements and operations) of webpages. Nevertheless, HTML documents may not provide a clear task-related context for each element, making it hard to select the right (sequence of) actions. In this paper, we propose to contextualize HTML elements through their "dual views" in webpage screenshots: each HTML element has its corresponding bounding box and visual content in the screenshot. We build upon the insight-web developers tend to arrange task-related elements nearby on webpages to enhance user experiences-and propose to contextualize each element with its neighbor elements, using both textual and visual features. The resulting representations of HTML elements are more informative for the agent to take action. We validate our method on the recently released Mind2Web dataset, which features diverse navigation domains and tasks on real-world websites. Our method consistently outperforms the baseline in all the scenarios, including cross-task, cross-website, and cross-domain ones.
引用
收藏
页码:14445 / 14454
页数:10
相关论文
共 50 条
  • [21] Siamese Network for Dual-View Mammography Mass Matching
    Perek, Shaked
    Hazan, Alon
    Barkan, Ella
    Akselrod-Ballin, Ayelet
    IMAGE ANALYSIS FOR MOVING ORGAN, BREAST, AND THORACIC IMAGES, 2018, 11040 : 55 - 63
  • [22] Dual-view hypergraph attention network for news recommendation
    Liu, Wenxuan
    Zhang, Zizhuo
    Wang, Bang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [23] Adaptive fusion of dual-view for grading prostate cancer
    He, Yaolin
    Li, Bowen
    He, Ruimin
    Fu, Guangming
    Sun, Dan
    Shan, Dongyong
    Zhang, Zijian
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2025, 119
  • [24] DANCE: Dual-View Distribution Alignment for Dataset Condensation
    Zhang, Hansong
    Li, Shikun
    Lin, Fanzhao
    Wang, Weiping
    Qian, Zhenxing
    Ge, Shiming
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1679 - 1687
  • [25] Potential for improved ATSR dual-view SST retrieval
    Murray, MJ
    Allen, MR
    Merchant, CJ
    Harris, AR
    GEOPHYSICAL RESEARCH LETTERS, 1998, 25 (17) : 3363 - 3366
  • [26] Dual-View Display Based on Spatial Psychovisual Modulation
    Gao, Zhongpai
    Zhai, Guangtao
    IEEE ACCESS, 2018, 6 : 41356 - 41366
  • [27] Deep Active Learning for Dual-View Mammogram Analysis
    Yan, Yutong
    Conze, Pierre-Henri
    Lamard, Mathieu
    Zhang, Heng
    Quellec, Gwenole
    Cochener, Beatrice
    Coatrieux, Gouenou
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2021, 2021, 12966 : 180 - 189
  • [28] Deep Contrastive Survival Analysis with Dual-View Clustering
    Cui, Chang
    Tang, Yongqiang
    Zhang, Wensheng
    ELECTRONICS, 2024, 13 (24):
  • [29] Potential for improved ATSR dual-view SST retrieval
    Space Science Department, Rutherford Appleton Laboratory, Chilton, United Kingdom
    不详
    不详
    不详
    不详
    不详
    不详
    Geophys. Res. Lett., 17 (3363-3366):
  • [30] The improved dual-view field goniometer system FIGOS
    Schopfer, Juerg
    Dangel, Stefan
    Kneubuehler, Mathias
    Itten, Klaus I.
    SENSORS, 2008, 8 (08) : 5120 - 5140