Categorizing The Turkish Web Pages By Data Mining Techniques

被引:0
|
作者
Husem, Secil Sekerci [1 ]
Gulcu, Ayla [1 ]
机构
[1] Fatih Sultan Mehmet Vakif Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
关键词
Data Mining; Text Classification; Naive Bayes; Support Vector Machines;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Today, it is not possible to use human power alone to cope with the increasing amount of data. For this reason, some automated methods are needed to group similar documents together or to place documents in predefined categories according to certain rules. The use of automated classification techniques is becoming increasingly important fir this reason. In this study, a database consisting of 22 thousand samples was created in order to respond to the need for Turkish data and various methods used for text classification in the literature were tested on this data. Multinomial Naive Bayes (NI-NB) and Support Vector Machines (SVM) algorithms which are frequently used for text classification, were compared by applying the n-gram word vector selection and information gain ratio. Beside these, it has been focused on the number of categories, the content of data used to train the model and the completeness of this data, and also the effects of these on classification success are examined.
引用
收藏
页码:255 / 260
页数:6
相关论文
共 50 条
  • [41] Blockmodeling techniques for Web mining
    Schoier, G
    COMPSTAT 2002: PROCEEDINGS IN COMPUTATIONAL STATISTICS, 2002, : 201 - 206
  • [42] PagePrompter: An intelligent web agent created using data mining techniques
    Yao, YY
    Hamilton, HJ
    Wang, XW
    ROUGH SETS AND CURRENT TRENDS IN COMPUTING, PROCEEDINGS, 2002, 2475 : 506 - 513
  • [43] Review on Modern Data Preprocessing Techniques in Web Usage Mining (WUM)
    Sukumar, P.
    Robert, L.
    Yuvaraj, S.
    2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, : 64 - 69
  • [44] Detection of Malicious Requests on Web Logs Using Data Mining Techniques
    Sahin, Mehmet Emin
    Ozdemir, Suat
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 463 - 468
  • [45] Effectual Web Content Mining using Noise Removal from Web Pages
    P. Sivakumar
    Wireless Personal Communications, 2015, 84 : 99 - 121
  • [46] Effectual Web Content Mining using Noise Removal from Web Pages
    Sivakumar, P.
    WIRELESS PERSONAL COMMUNICATIONS, 2015, 84 (01) : 99 - 121
  • [47] Web data mining
    Wibonele, KJ
    Zhang, YQ
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY IV, 2002, 4730 : 241 - 244
  • [48] Data mining for the web
    Spiliopoulou, M
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 1704 : 588 - 589
  • [49] Data mining on Web
    Zhang, XB
    THIRD INTERNATIONAL CONFERENCE ON ELECTRONIC COMMERCE ENGINEERING: DIGITAL ENTERPRISES AND NONTRADITIONAL INDUSTRIALIZATION, 2003, : 504 - 507
  • [50] From temporal data mining and web mining to temporal web mining
    Samia, M
    Conrad, S
    DATABASES AND INFORMATION SYSTEMS, 2005, 118 : 91 - 102