Categorizing The Turkish Web Pages By Data Mining Techniques

被引:0
|
作者
Husem, Secil Sekerci [1 ]
Gulcu, Ayla [1 ]
机构
[1] Fatih Sultan Mehmet Vakif Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
关键词
Data Mining; Text Classification; Naive Bayes; Support Vector Machines;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Today, it is not possible to use human power alone to cope with the increasing amount of data. For this reason, some automated methods are needed to group similar documents together or to place documents in predefined categories according to certain rules. The use of automated classification techniques is becoming increasingly important fir this reason. In this study, a database consisting of 22 thousand samples was created in order to respond to the need for Turkish data and various methods used for text classification in the literature were tested on this data. Multinomial Naive Bayes (NI-NB) and Support Vector Machines (SVM) algorithms which are frequently used for text classification, were compared by applying the n-gram word vector selection and information gain ratio. Beside these, it has been focused on the number of categories, the content of data used to train the model and the completeness of this data, and also the effects of these on classification success are examined.
引用
收藏
页码:255 / 260
页数:6
相关论文
共 50 条
  • [31] Feature evaluation for web crawler detection with data mining techniques
    Stevanovic, Dusan
    An, Aijun
    Vlajic, Natalija
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (10) : 8707 - 8717
  • [32] Comparative Classification of Semantic Web Challenges and Data Mining Techniques
    Keyvanpour, MohammadReza
    Hassanzadeh, Hamed
    Khoshroo, Babak Mohammadizadeh
    WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, : 200 - +
  • [33] Granular fuzzy web intelligence techniques for profitable data mining
    Zhang, YQ
    Shteynberg, A
    Prasad, SK
    Sunderraman, R
    PROCEEDINGS OF THE 12TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1 AND 2, 2003, : 1462 - 1464
  • [34] Assessing Technology Platforms for Sustainability with Web Data Mining Techniques
    Blazquez, Desamparados
    Domenech, Josep
    Garcia-Alvarez-Coque, Jose-Maria
    SUSTAINABILITY, 2018, 10 (12)
  • [35] Techniques to Detect Clickjacking Vulnerability in Web Pages
    Jyotiyana, Priya
    Maheshwari, Saurabh
    OPTICAL AND WIRELESS TECHNOLOGIES, OWT 2017, 2018, 472 : 615 - 624
  • [36] Log summarizing agent for web access data using data mining techniques
    Kato, H
    Hiraishi, H
    Mizoguchi, F
    JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 2642 - 2647
  • [37] Applying data mining techniques in intrusion detection system on web and analysis of web usage
    Al-Ahliyya Amman University, Amman, Jordan
    不详
    Inf. Technol. J., 2006, 1 (57-63):
  • [38] Recent Advances in Data Mining for Categorizing Text Records
    Chaovalitwongse, W.
    Pham, H.
    Hwang, S.
    Liang, Z.
    Pham, C. H.
    RECENT ADVANCES IN RELIABILITY AND QUALITY IN DESIGN, 2008, : 423 - +
  • [39] RETRACTED: An efficient method of eliminating noisy information in Web pages for data mining (Retracted Article)
    Tripathy, AK
    Singh, AK
    FOURTH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2004, : 978 - +
  • [40] Categorizing Big Video Data on the Web: Challenges and Opportunities
    Jiang, Yu-Gang
    2015 1ST IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2015, : 13 - 15