Categorizing The Turkish Web Pages By Data Mining Techniques

被引:0
|
作者
Husem, Secil Sekerci [1 ]
Gulcu, Ayla [1 ]
机构
[1] Fatih Sultan Mehmet Vakif Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
关键词
Data Mining; Text Classification; Naive Bayes; Support Vector Machines;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Today, it is not possible to use human power alone to cope with the increasing amount of data. For this reason, some automated methods are needed to group similar documents together or to place documents in predefined categories according to certain rules. The use of automated classification techniques is becoming increasingly important fir this reason. In this study, a database consisting of 22 thousand samples was created in order to respond to the need for Turkish data and various methods used for text classification in the literature were tested on this data. Multinomial Naive Bayes (NI-NB) and Support Vector Machines (SVM) algorithms which are frequently used for text classification, were compared by applying the n-gram word vector selection and information gain ratio. Beside these, it has been focused on the number of categories, the content of data used to train the model and the completeness of this data, and also the effects of these on classification success are examined.
引用
收藏
页码:255 / 260
页数:6
相关论文
共 50 条
  • [1] Mining web pages for data records
    Liu, B
    Grossman, R
    Zhai, YH
    IEEE INTELLIGENT SYSTEMS, 2004, 19 (06) : 49 - 55
  • [2] Categorizing Web pages on the subject of neural networks
    Vlajic, N
    Card, HC
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 1998, 21 (02) : 91 - 105
  • [3] Categorizing Web pages using modified ART
    Vlajic, N
    Card, HC
    UNIVERSITY AND INDUSTRY - PARTNERS IN SUCCESS, CONFERENCE PROCEEDINGS VOLS 1-2, 1998, : 313 - 316
  • [4] Categorizing Web pages on the subject of neural networks
    Internet Innovation Centre, Dept. of Elec. and Comp. Engineering, University of Manitoba, Winnipeg, Man. R3T 5V6, Canada
    不详
    不详
    J Network Comput Appl, 2 (91-105):
  • [5] Recommendation of Optimized Web Pages to Users Using Web Log Mining Techniques
    Bhushan, Ravi
    Nath, Rajender
    PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 1030 - 1033
  • [6] Web Data Mining Trends and Techniques
    Patil, Ujwala Manoj
    Patil, J. B.
    PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 961 - 965
  • [7] Web Pages Classification: An Effective Approach Based on Text Mining Techniques
    Babapour, Seyed Moein
    Roostaee, Meysam
    2017 IEEE 4TH INTERNATIONAL CONFERENCE ON KNOWLEDGE-BASED ENGINEERING AND INNOVATION (KBEI), 2017, : 320 - 323
  • [8] Data Mining: Web Data Mining Techniques, Tools and Algorithms: An Overview
    Mughal, Muhammd Jawad Hamid
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (06) : 208 - 215
  • [9] The Integrating Between Web Usage Mining and Data Mining Techniques
    Nassar, Omer Adel
    Al Saiyd, Nedhal A.
    2013 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSIT), 2013, : 243 - 247
  • [10] Implementation of data mining techniques in web of things
    Vihari, G.
    Prasad, N.
    Satyanarayan, K.
    INTERNATIONAL CONFERENCE ON COMPUTER VISION AND MACHINE LEARNING, 2019, 1228