A Novel Spam Categorization Algorithm Based on Active Learning Method and Negative Selection Algorithm

被引:0
|
作者
Hu X.-J. [1 ]
Liu L. [1 ]
Qiu N.-J. [2 ]
机构
[1] College of Computer Science and Technology, Jilin University, Changchun, 130012, Jilin
[2] School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130022, Jilin
来源
Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2018年 / 46卷 / 01期
关键词
Active learning; Negative selection; Spam detection; Text categorization; Two-way user interest set;
D O I
10.3969/j.issn.0372-2112.2018.01.028
中图分类号
学科分类号
摘要
A two-class text categorization method, active learning negative selection text categorization (ALNSTC) algorithm, based on active learning (AL) method and negative selection (NS) algorithm, is proposed for the problem of spam proliferation. The positive user interest set and the negative user interest set are established according to a small number of labeled samples. And the sampling engine (SE) of AL method is improved by the autologous anomaly detection mechanism of the NS algorithm. The two-way user interest sets are used as detectors, and a new sample set is employed as a self-set. The above two sets are matched with Hamming match rules. The classification process of each sample set is able to update the two user interest sets. The proposed algorithm is carried out with a full-scale test on six common spam corpus, which are selected as experimental material, and analyzed and compared with other five state-of-the-art spam classification methods, which are quick online spam identification (QOSI) method, semi-supervised collaboration classification algorithm with enhanced difference (DSCC), dynamic web spam filtering (WSF2) method, multilevel spam filtering algorithm based on artificial immunity (MSFA-AI), and integrated multi-field learning (MFL) method, in different evaluation metrics, such as precision, recall, ROC curve, categorization running time and the labeled number of spam. The results show that the proposed method has better precision rate, recall rate, classification accuracy, and can reduce the artificial labeled number of spam samples. It is advantageous to enhance the classification capacity of the algorithm that the user preferences are converted into positive and negative user interest sets. In addition, the user labeled number is reduced when unknown category features are obtained by the exception detection mechanism. © 2018, Chinese Institute of Electronics. All right reserved.
引用
收藏
页码:203 / 209
页数:6
相关论文
共 16 条
  • [1] Guo H.-S., Wang W.-J., A pattern class mining model based on active learning, Journal of Computer Research and Development, 51, 10, pp. 2148-2159, (2014)
  • [2] Balcan M.F., Blum A., A discriminative model for semi-supervised learning, Journal of the ACM, 57, 3, pp. 1-46, (2010)
  • [3] Wang Y.-W., Liu Y.-N., Feng L.-Z., Zhu X.-D., A novel quick online spam identification method based on user interest set, Acta Electronica Sinica, 43, 10, pp. 1963-1970, (2015)
  • [4] Wu W.-N., Liu Y., Guo M.-Z., Liu X.-Y., Advances in active learning algorithm based on sampling strategy, Journal of Computer Research and Development, 49, 6, pp. 1162-1173, (2012)
  • [5] Liu W.Y., Wang T., Online active multi-field learning for efficient email spam filtering, Knowledge & Information Systems, 33, 1, pp. 117-136, (2012)
  • [6] Benevenuto F., Rodrigues T., Veloso A., Almeida J., Goncalves M., Almeida V., Practical detection of spammers and content promoters in online video sharing systems, IEEE Transactions on Systems Man & Cybernetics-Part B (Cybernetics), 42, 3, pp. 688-701, (2012)
  • [7] Feng L.Z., Wang Y.W., Zuo W.L., Quick online spam classification method based on active and incremental learning, Journal of Intelligent & Fuzzy Systems, 30, 1, pp. 17-27, (2016)
  • [8] Jin Z.-Z., Liao M.-H., Xiao G., Survey of negative selection algorithms, Journal on Communications, 34, 1, pp. 159-170, (2013)
  • [9] Idris I., Selamat A., Nguyen N.T., Omatu S., Krejcar O., Kuca K., Penhaker M., A combined negative selection algorithm-particle swarm optimization for an email spam detection system, Engineering Applications of Artificial Intelligence, 39, pp. 33-44, (2015)
  • [10] Idris I., Selamat A., Omatu S., Hybrid email spam detection model with negative selection algorithm and differential evolution, Engineering Applications of Artificial Intelligence, 28, pp. 97-110, (2014)