A sampling method based on URL clustering for fast web accessibility evaluation

被引:9
|
作者
Zhang, Meng-ni [1 ]
Wang, Can [1 ]
Bu, Jia-jun [1 ]
Yu, Zhi [1 ]
Zhou, Yu [1 ]
Chen, Chun [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Page sampling; URL clustering; Web accessibility evaluation;
D O I
10.1631/FITEE.1400377
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When evaluating the accessibility of a large website, we rely on sampling methods to reduce the cost of evaluation. This may lead to a biased evaluation when the distribution of checkpoint violations in a website is skewed and the selected samples do not provide a good representation of the entire website. To improve sampling quality, stratified sampling methods first cluster web pages in a site and then draw samples from each cluster. In existing stratified sampling methods, however, all the pages in a website need to be analyzed for clustering, causing huge I/O and computation costs. To address this issue, we propose a novel page sampling method based on URL clustering for web accessibility evaluation, namely URLSamp. Using only the URL information for stratified page sampling, URLSamp can efficiently scale to large websites. Meanwhile, by exploiting similarities in URL patterns, URLSamp cluster pages by their generating scripts and can thus effectively detect accessibility problems from web page templates. We use a data set of 45 web sites to validate our method. Experimental results show that our URLSamp method is both effective and efficient for web accessibility evaluation.
引用
收藏
页码:449 / 456
页数:8
相关论文
共 50 条
  • [41] Fast Spectral Clustering with Random Projection and Sampling
    Sakai, Tomoya
    Imiya, Atsushi
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 372 - 384
  • [42] DIDES: a fast and effective sampling for clustering algorithm
    Ros, Frederic
    Guillaume, Serge
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 50 (02) : 543 - 568
  • [43] Hubness-based Sampling Method for Nystrom Spectral Clustering
    Li, Hongmin
    Ye, Xiucai
    Imakura, Akira
    Sakurai, Tetsuya
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [44] Web Page Clustering for More Efficient Website Accessibility Evaluations
    Mucha, Justyna
    Snaprud, Mikael
    Nietzio, Annika
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, ICCHP 2016, PT I, 2016, 9758 : 259 - 266
  • [45] Stochastic Load Flow Calculation Method Based on Clustering and Sampling
    Xie H.
    Ren C.
    Guo Z.
    Zhang P.
    Guo B.
    Diangong Jishu Xuebao/Transactions of China Electrotechnical Society, 2020, 35 (23): : 4940 - 4948
  • [46] Accessibility Evaluation Using Web Content Accessibility Guidelines (WCAG) 2.0
    Isa, Wan Abdul Rahim Wan Mohd
    Suhaimi, Ahmad Iqbal Hakim
    Ariffin, Nadhirah
    Ishak, Nurul Fatimah
    Ralim, Nadilah Mohd
    2016 4TH INTERNATIONAL CONFERENCE ON USER SCIENCE AND ENGINEERING (I-USER), 2016, : 1 - 4
  • [47] Web page recommendation by URL-based collaborative filtering
    Takasuka, Kiyotaka
    Terada, Minoru
    Maruyama, Kazutaka
    WEBIST 2007: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL WIA: WEB INTERFACES AND APPLICATIONS, 2007, : 447 - +
  • [48] A Web Page Clustering Method Based on Formal Concept Analysis
    Zhang, Zuping
    Zhao, Jing
    Yan, Xiping
    INFORMATION, 2018, 9 (09)
  • [49] A Clustering-Based Method for Intrusion Detection in Web Servers
    Pereira, Hermano
    Jamhour, Edgard
    2013 20TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS (ICT), 2013,
  • [50] A space-saving URL duplication removal method for web crawler
    Huang, H. (hhan@scut.edu.cn), 1600, Binary Information Press, Flat F 8th Floor, Block 3, Tanner Garden, 18 Tanner Road, Hong Kong (09):