Analysis of Data Extraction and Data Cleaning in Web Usage Mining

被引:5
|
作者
Srivastava, Mitali [1 ]
Garg, Rakhi [2 ]
Mishra, P. K. [1 ]
机构
[1] Banaras Hindu Univ, Fac Sci, Dept Comp Sci, Varanasi, Uttar Pradesh, India
[2] Banaras Hindu Univ, Mahila Maha Vidyalaya, Comp Sci Sect, Varanasi, Uttar Pradesh, India
关键词
Web usage mining; Data preprocessing; Data extraction; Data cleaning;
D O I
10.1145/2743065.2743078
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data preprocessing is considered as an important phase of Web usage mining due to unstructured, heterogeneous and noisy nature of log data. Complete and effective data preprocessing insures the efficiency and scalability of algorithms used in pattern discovery phase of Web usage mining. Data preprocessing generally includes the steps- Data fusion, Data cleaning, User identification, Session identification, Path completion etc. Data cleaning is the initial and important step in preprocessing to extract cleaned data for further processing. It is important to apply data extraction before data cleaning on raw log data in analysis of specific time-duration i.e. one day, one week or one month etc. In this paper we have mainly focused on data fusion, data extraction and data cleaning steps of preprocessing and proposed an algorithm for data extraction which extracts log data according to analysis of time duration. This algorithm also sorts log entries according to their date and time which will be further used in prediction of browsing sequence of user. After that we have applied data cleaning algorithm on extracted real Web server log. In data cleaning almost all irrelevant files, irrelevant HTTP methods and wrong HTTP status codes are considered and after experiment it is analyzed that raw log data reduces to almost 80% which shows the importance of initial phases of data preprocessing.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] A novel data cleaning approach for Web usage mining
    Chen, X. (chenxs@scu.edu.cn), 1600, Sichuan University (46):
  • [2] Web usage data mining
    Ortega, Jose-Luis
    Aguillo, Isidro F.
    PROFESIONAL DE LA INFORMACION, 2009, 18 (01): : 20 - 26
  • [3] Semantic analysis for data preparation of web usage mining
    Jung, JJ
    Jo, GS
    INNOVATIONS IN APPLIED ARTIFICIAL INTELLIGENCE, 2004, 3029 : 1249 - 1258
  • [4] Web usage data mining agent
    Madiraju, P
    Zhang, YQ
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY IV, 2002, 4730 : 224 - 228
  • [5] Data collection of Web usage mining
    Xing, Dongshan
    Shen, Junyi
    Jisuanji Gongcheng/Computer Engineering, 2002, 28 (01):
  • [6] An overview of data preprocessing in data and web usage mining
    Suresh, R. M.
    Padmajavalli, R.
    2006 1ST INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, 2006, : 193 - +
  • [7] Web Usage Data Cleaning A Rule-Based Approach for Weblog Data Cleaning
    Ganibardi, Amine
    Ali, Cherif Arab
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2018), 2018, 11031 : 193 - 203
  • [8] Cooperative strategy for Web data mining and cleaning
    Li, YF
    Zhang, CQ
    Zhang, SC
    APPLIED ARTIFICIAL INTELLIGENCE, 2003, 17 (5-6) : 443 - 460
  • [9] Web usage mining with intentional browsing data
    Tao, Yu-Hu
    Hong, Tzung-Pe
    Su, Yu-Ming
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (03) : 1893 - 1904
  • [10] Web Usage Mining Data Preprocessing and Multi Level Analysis on Moodle
    Sael, Nawal
    Marzak, Abdelaziz
    Behja, Hicham
    2013 ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2013,