Research on small files classification based on improved KNN algorithm and pretreatment strategy

被引:0
|
作者
Shi, Hengliang [1 ,2 ]
Bai, Xiaolei [1 ]
Zhen, Lintao [1 ]
机构
[1] Information Engineering College, Henan University of Science and Technology, No. 263, Kaiyuan Road, Luoyang, China
[2] Noah (Suzhou) IT Solution Co., Ltd, Suzhou, China
来源
ICIC Express Letters | 2015年 / 9卷 / 02期
关键词
Data handling - Learning algorithms - Information retrieval systems;
D O I
暂无
中图分类号
学科分类号
摘要
This article which combines MapReduce model with mass data processing innovatively, proposes small files classification and pretreatment strategy research on mass data. The described method provides more convenience for the parallel computing characteristics of MapReduce architecture, and saves a large amount of processing time. Meanwhile, the classification method is proved to be efficient and reliable through some experiments. The strategy of the paper can be widely applied to document classification and clustering research and application. © 2015, ICIC International.
引用
收藏
页码:603 / 608
相关论文
共 50 条
  • [1] An Improved KNN Classification Algorithm based on Sampling
    Cheng, Zhiwei
    Chen, Caisen
    Qiu, Xuehuan
    Xie, Huan
    PROCEEDINGS OF THE ADVANCES IN MATERIALS, MACHINERY, ELECTRICAL ENGINEERING (AMMEE 2017), 2017, 114 : 220 - 225
  • [2] Research on KNN Algorithm in Malicious PDF Files Classification under Adversarial Environment
    Li, Kunming
    Gu, Yijun
    Zhang, Peijing
    An, Wang
    Li, Wenzheng
    ICBDC 2019: PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON BIG DATA AND COMPUTING, 2019, : 156 - 159
  • [3] An Improved KNN Text Classification Algorithm based on Simhash
    Liu, Jie
    Jin, Ting
    Pan, Kejia
    Yang, Yi
    Wu, Yan
    Wang, Xin
    2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 92 - 95
  • [4] A fast document classification algorithm based on improved KNN
    Guo, Ge
    Ping, Xijian
    Chen, Gang
    ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 3, PROCEEDINGS, 2006, : 186 - +
  • [5] AN IMPROVED KNN TEXT CLASSIFICATION ALGORITHM BASED ON DENSITY
    Shi, Kansheng
    Li, Lemin
    Liu, Haitao
    He, Jie
    Zhang, Naitong
    Song, Wentao
    2011 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS, 2011, : 113 - 117
  • [6] An Improved KNN Text Classification Algorithm Based on Clustering
    Zhou Yong
    Li Youwen
    Xia Shixiong
    JOURNAL OF COMPUTERS, 2009, 4 (03) : 230 - 237
  • [7] An Improved KNN Algorithm for Text Classification
    Li, Huijuan
    Jiang, He
    Wang, Dongyuan
    Han, Bing
    2018 EIGHTH INTERNATIONAL CONFERENCE ON INSTRUMENTATION AND MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC 2018), 2018, : 1081 - 1085
  • [8] An Improved KNN Algorithm in Text Classification
    Wang, Xiaoni
    Zhang, Zhenjiang
    Cao, Wei
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMPUTER APPLICATIONS (ICSA 2013), 2013, 92 : 263 - 268
  • [9] Classification Method of Teaching Resources Based on Improved KNN Algorithm
    An, Yingbo
    Xu, Meiling
    Shen, Chen
    INTERNATIONAL JOURNAL OF EMERGING TECHNOLOGIES IN LEARNING, 2019, 14 (04): : 73 - 88
  • [10] Classification for Unbalanced Dataset by an Improved KNN Algorithm Based on Weight
    Wang, Chao-Xue
    Dong, Li-Li
    Pan, Zheng-Mao
    Zhang, Tao
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (11B): : 4983 - 4988