Dual-View Visual Contextualization for Web Navigation

被引:0
|
作者
Kil, Jihyung [1 ]
Song, Chan Hee [1 ]
Zheng, Boyuan [1 ]
Deng, Xiang [1 ]
Su, Yu [1 ]
Chao, We-Lun [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR52733.2024.01369
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (i.e., actionable elements and operations) of webpages. Nevertheless, HTML documents may not provide a clear task-related context for each element, making it hard to select the right (sequence of) actions. In this paper, we propose to contextualize HTML elements through their "dual views" in webpage screenshots: each HTML element has its corresponding bounding box and visual content in the screenshot. We build upon the insight-web developers tend to arrange task-related elements nearby on webpages to enhance user experiences-and propose to contextualize each element with its neighbor elements, using both textual and visual features. The resulting representations of HTML elements are more informative for the agent to take action. We validate our method on the recently released Mind2Web dataset, which features diverse navigation domains and tasks on real-world websites. Our method consistently outperforms the baseline in all the scenarios, including cross-task, cross-website, and cross-domain ones.
引用
收藏
页码:14445 / 14454
页数:10
相关论文
共 50 条
  • [1] Dual-View Learning for Detecting Web Query Intents
    Figueroa, Alejandro
    Atkinson, John
    COMPUTER, 2019, 52 (08) : 34 - 42
  • [2] A dual-view endoscope with image shift
    Yamauchi, Y
    Yamashita, J
    Fukui, Y
    Yokoyama, K
    Sekiya, T
    Ito, E
    Kanai, M
    Fukuyo, T
    Hashimoto, D
    Iseki, H
    Takakura, K
    CARS 2002: COMPUTER ASSISTED RADIOLOGY AND SURGERY, PROCEEDINGS, 2002, : 183 - 187
  • [3] Development of a dual-view endoscope system
    Sekiya, Takaomi
    Ito, Eiichi
    Kanai, Moriyasu
    Matsumoto, Mitsuhiro
    ADVANCED BIOMEDICAL AND CLINICAL DIAGNOSTIC SYSTEMS IV, 2006, 6080
  • [4] DVGaze: Dual-View Gaze Estimation
    Cheng, Yihua
    Lu, Feng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20575 - 20584
  • [5] Dual-view blue phase LCD
    20184606076900
    (1) AU Optronics Corp., Taiwan; (2) Graduate Institute of Photonics, National Changhua University of Education, Taiwan; (3) Bachelor Program of Image Display and Technology, Cheng Shiu University, Taiwan; (4) Department of Physics, National Cheng Kung University, Taiwan, 1600, (International Display Workshops):
  • [6] Learned Dual-View Reflection Removal
    Niklaus, Simon
    Zhang, Xuaner
    Barron, Jonathan T.
    Wadhwa, Neal
    Garg, Rahul
    Liu, Feng
    Xue, Tianfan
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3712 - 3721
  • [7] Dual-View Noise Correction for Crowdsourcing
    Ji, Qiang
    Jiang, Liangxiao
    Zhang, Wenjun
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (13) : 11804 - 11812
  • [8] Dual-View Normalization for Face Recognition
    Hsu, Gee-Sern
    Tang, Chia-Hao
    IEEE ACCESS, 2020, 8 : 147765 - 147775
  • [9] Dual-View Distilled BERT for Sentence Embedding
    Cheng, Xingyi
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2151 - 2155
  • [10] Dual-view Contrastive Learning for Auction Recommendation
    Ren, Dan Ni
    Hou, U. Leong
    Liu, Wei
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 2146 - 2155