Research on Text Classification Algorithm of Largest Dispersion Based on Term Frequency

被引:0
|
作者
An Junxiu [1 ]
Jin Yuchang [2 ,3 ]
机构
[1] CUIT, Sch Software Engn, Chengdu 610025, Peoples R China
[2] SW Univ, Sch Culture & Social Dev, Chongqing, Peoples R China
[3] Sichuan Normal Univ, Coll Informat Technol, Chengdu, Peoples R China
来源
2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS | 2009年
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
text classification algorithm; the largest dispersion algorithm; retrospect term frequency algorithm; the characteristics set;
D O I
10.1109/IFCSTA.2009.103
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In order to achieve a document in accordance with the contents of the page automatic classification, put forward the largest dispersion of text classification algorithm based on the term frequency. The algorithm using backward term frequency algorithm for the n-types typical texts confirm the scientific and effective characteristics set of n-types; rely on it, getting the classification values of webpage documents in the n-types characteristics set through adopt to the largest dispersion algorithm, getting the largest dispersion after dispersion comparison; and then compared the largest dispersion value with relative threshold, if the value is larger than the threshold, it is the type of webpage documents, but if the value is smaller than the threshold, the judgement about the type of document is invalid. The algorithm has good robustness and easy-to-use, which is very effective for the large-scale data of small documents.
引用
收藏
页码:400 / +
页数:2
相关论文
共 50 条
  • [1] Context-Based Term Frequency Assessment for Text Classification
    Liu, Rey-Long
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2010, 61 (02): : 300 - 309
  • [2] A text classification method based on term frequency classifier ensemble
    National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
    不详
    Jisuanji Yanjiu yu Fazhan, 2006, 10 (1681-1687):
  • [3] Text Classification based on Word Subspace with Term-Frequency
    Shimomoto, Erica K.
    Souza, Lincon S.
    Gatto, Bernardo B.
    Fukui, Kazuhiro
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [4] Context-Based Term Frequency Assessment for Text Classification
    Liu, Rey-Long
    PRICAI 2008: TRENDS IN ARTIFICIAL INTELLIGENCE, 2008, 5351 : 1004 - 1009
  • [5] Research of text classification technology based on genetic annealing algorithm
    Zhu Zhen-fang
    Liu Pei-yu
    Lu Ran
    PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN, VOL 1, 2008, : 265 - 269
  • [6] Research on manufacturing text classification based on improved genetic algorithm
    Zhou Kaijun
    Tong Yifei
    BRAZILIAN ARCHIVES OF BIOLOGY AND TECHNOLOGY, 2016, 59
  • [7] Text Classification Research Based on Improved SoftMax Regression Algorithm
    She, Xiangyang
    Zhu, Yinglong
    2018 11TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2018, : 273 - 276
  • [8] Feature selection based on term frequency deviation rate for text classification
    Hongfang Zhou
    Yiming Ma
    Xiang Li
    Applied Intelligence, 2021, 51 : 3255 - 3274
  • [9] Feature selection based on term frequency deviation rate for text classification
    Zhou, Hongfang
    Ma, Yiming
    Li, Xiang
    APPLIED INTELLIGENCE, 2021, 51 (06) : 3255 - 3274
  • [10] An improved term weighting method based on relevance frequency for text classification
    Chuanxiao Li
    Wenqiang Li
    Zhong Tang
    Song Li
    Hai Xiang
    Soft Computing, 2023, 27 : 3563 - 3579