Dealing with Class Noise in Large Training Datasets for Malware Detection

被引:4
|
作者
Gavrilut, Dragos [1 ,2 ]
Ciortuz, Liviu [1 ]
机构
[1] Alexandru Ioan Cuza Univ, Fac Comp Sci, Iasi, Romania
[2] BitDefender Antivirus Res Lab, Iasi, Romania
来源
13TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2011) | 2012年
关键词
Malware detection; perceptrons; class noise; CLASSIFICATION;
D O I
10.1109/SYNASC.2011.39
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents the ways we explored until now for detecting and dealing with the class noise found in large annotated datasets used for training the classifiers that we have previously designed for industrial-scale malware identification. First we established a number of distance-based filtering rules that allow us to identify different "levels" of potential noise in the training data, and secondly we analysed the effects produced by either removal or "cleaning" of the potentially-noised records on the performances of our simplest classifiers. We show that a careful distance-based filtering can lead to sensibly better results in malware detection.
引用
收藏
页码:401 / 407
页数:7
相关论文
共 50 条
  • [11] Detecting Temporal Inconsistency in Biased Datasets for Android Malware Detection
    Hu, Haonan
    Liu, Yue
    Zhao, Yanjie
    Liu, Yonghui
    Sun, Xiaoyu
    Tantithamthavorn, Chakkrit
    Li, Li
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING WORKSHOPS, ASEW, 2023, : 17 - 23
  • [12] Malware detection using augmented naive Bayes with domain knowledge and under presence of class noise
    Ismail, Ismahani
    Marsono, Muhammad Nadzir
    Nor, Sulaiman Mohd
    International Journal of Information and Computer Security, 2014, 6 (02) : 179 - 197
  • [13] Dealing with Large Datasets Using an Artificial Intelligence Clustering Tool
    Moschopoulos, Charalampos N.
    Tsiatsis, Panagiotis
    Beligiannis, Grigorios N.
    Fotakis, Dimitrios
    Likothanassis, Spiridon D.
    TOOLS AND APPLICATIONS WITH ARTIFICIAL INTELLIGENCE, 2009, 166 : 105 - +
  • [14] Bridging Local and Global Data Cleansing: Identifying Class Noise in Large, Distributed Data Datasets
    XINGQUAN ZHU
    XINDONG WU
    QIJUN CHEN
    Data Mining and Knowledge Discovery, 2006, 12 : 275 - 308
  • [15] Bridging local and global data cleansing: Identifying class noise in large, distributed data datasets
    Zhu, XQ
    Wu, XD
    Chen, QJ
    DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 12 (2-3) : 275 - 308
  • [16] Imbalance Datasets in Malware Detection: A Review of Current Solutions and Future Directions
    Almajed, Hussain
    Alsaqer, Abdulrahman
    Frikha, Mounir
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (01) : 1323 - 1335
  • [17] Fast Support Vector Data Description Training Using Edge Detection on Large Datasets
    Hu, Chenlong
    Zhou, Bo
    Hu, Jinglu
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 2176 - 2182
  • [18] Filtering Approaches for Dealing with Noise in Anomaly Detection
    Hashemi, Navid
    Verdugo German, Eduardo
    Pena Ramirez, Jonatan
    Ruths, Justin
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5356 - 5361
  • [19] Detecting Android Malware and Classifying Its Families in Large-scale Datasets
    Sun, Bo
    Takahashi, Takeshi
    Ban, Tao
    Inoue, Daisuke
    ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2022, 13 (02)
  • [20] Finding Label Noise Examples in Large Scale Datasets
    Ekambaram, Rajmadhan
    Goldgof, Dmitry B.
    Hall, Lawrence O.
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 2420 - 2424