Research on Web Page Classification Method Based on Query Log

被引:1
|
作者
Ye F. [1 ]
Ma Y. [1 ]
机构
[1] School of Computer Engineering and Science, Shanghai University, Shanghai
关键词
A; diesel; query log; TP; 391.1; Web page classification; word embedding;
D O I
10.1007/s12204-017-1899-0
中图分类号
学科分类号
摘要
Web page classification is an important application in many fields of Internet information retrieval, such as providing directory classification and vertical search. Methods based on query log which is a light weight version of Web page classification can avoid Web content crawling, making it relatively high in efficiency, but the sparsity of user click data makes it difficult to be used directly for constructing a classifier. To solve this problem, we explore the semantic relations among different queries through word embedding, and propose three improved graph structure classification algorithms. To reflect the semantic relevance between queries, we map the user query into the low-dimensional space according to its query vector in the first step. Then, we calculate the uniform resource locator (URL) vector according to the relationship between the query and URL. Finally, we use the improved label propagation algorithm (LPA) and the bipartite graph expansion algorithm to classify the unlabeled Web pages. Experiments show that our methods make about 20% more increase in F1-value than other Web page classification methods based on query log. © 2017, Shanghai Jiaotong University and Springer-Verlag GmbH Germany, part of Springer Nature.
引用
收藏
页码:404 / 410
页数:6
相关论文
共 50 条
  • [1] Research on Web Page Classification Method Based on Query Log
    叶飞跃
    马祎星
    Journal of Shanghai Jiaotong University(Science), 2018, 23 (03) : 404 - 410
  • [2] Web Page Classification Method Based on Semantics and Structure
    Li, Huaxin
    Zhang, Zhaoxin
    Xu, Yongdong
    2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2019), 2019, : 238 - 243
  • [3] A novel web page categorization algorithm based on block propagation using query-log information
    Dai, Wenyuan
    Yu, Yong
    Zhang, Cong-Le
    Han, Jie
    Xue, Gui-Rong
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2006, 4016 : 435 - 446
  • [4] Research on web page classification-based core characteristics and web structure
    Zengmin, Geng
    Jianxia, Du
    International Journal of Wireless and Mobile Computing, 2014, 7 (03) : 253 - 257
  • [5] Research on Query Results Cache Based on Log Analysis in Web Search Engines
    Ma, Hongyuan
    2013 3RD INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, COMMUNICATIONS AND NETWORKS (CECNET), 2013, : 551 - 554
  • [6] A Method of Web Page Classification Based on Feature Dimension Reduction
    Ren, Xun-yi
    Zhang, Dan
    2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL MODELING, SIMULATION AND APPLIED MATHEMATICS (CMSAM 2016), 2016, : 252 - 256
  • [7] Research on SVM-Based Automatic Classification of Chinese Web Page
    Song, Jie
    Liu, Yanque
    Li, Nana
    Gu, Junhua
    PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, 2008, : 160 - 164
  • [8] Research on web page automatic classification based on internet news corpus
    Cai, Wei
    Wang, Yong-Cheng
    Yin, Zhong-Hang
    Journal of Shanghai Jiaotong University (Science), 2007, 12 E (06) : 731 - 735
  • [9] The Research of Spam Web Page Detection Method Based on Web Page Differentiation and Concrete Cluster Centers
    Yu, Mei
    Zhang, Jie
    Wang, Jianrong
    Gao, Jie
    Xu, Tianyi
    Yu, Ruiguo
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS (WASA 2018), 2018, 10874 : 820 - 826
  • [10] Web page classification based on SVM
    Xue, Weimin
    Bao, Hong
    Xue, Weimin
    Huang, Weitong
    Lu, Yuchang
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 6111 - +