Dealing with Class Noise in Large Training Datasets for Malware Detection

被引:4
|
作者
Gavrilut, Dragos [1 ,2 ]
Ciortuz, Liviu [1 ]
机构
[1] Alexandru Ioan Cuza Univ, Fac Comp Sci, Iasi, Romania
[2] BitDefender Antivirus Res Lab, Iasi, Romania
关键词
Malware detection; perceptrons; class noise; CLASSIFICATION;
D O I
10.1109/SYNASC.2011.39
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents the ways we explored until now for detecting and dealing with the class noise found in large annotated datasets used for training the classifiers that we have previously designed for industrial-scale malware identification. First we established a number of distance-based filtering rules that allow us to identify different "levels" of potential noise in the training data, and secondly we analysed the effects produced by either removal or "cleaning" of the potentially-noised records on the performances of our simplest classifiers. We show that a careful distance-based filtering can lead to sensibly better results in malware detection.
引用
收藏
页码:401 / 407
页数:7
相关论文
共 50 条
  • [31] A Fast SVM Training Method for Very Large Datasets
    Li, Boyang
    Wang, Qiangwei
    Hu, Jinglu
    IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 277 - 282
  • [32] Towards efficient training on large datasets for genetic programming
    Curry, R
    Heywood, M
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2004, 3060 : 161 - 174
  • [33] Fast methods for training Gaussian processes on large datasets
    Moore, C. J.
    Chua, A. J. K.
    Berry, C. P. L.
    Gair, J. R.
    ROYAL SOCIETY OPEN SCIENCE, 2016, 3 (05):
  • [34] BOTA: Explainable IoT Malware Detection in Large Networks
    Uhricek, Daniel
    Hynek, Karel
    Cejka, Tomas
    Kolar, Dusan
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (10) : 8416 - 8431
  • [35] The Class Imbalance Problem in Construction of Training Datasets for Authorship Attribution
    Stanczyk, Urszula
    MAN-MACHINE INTERACTIONS 4, ICMMI 2015, 2016, 391 : 535 - 547
  • [36] Analysis of large social datasets by community detection
    S. Lozano
    J. Duch
    A. Arenas
    The European Physical Journal Special Topics, 2007, 143 : 257 - 259
  • [37] Analysis of large social datasets by community detection
    Lozano, S.
    Duch, J.
    Arenas, A.
    EUROPEAN PHYSICAL JOURNAL-SPECIAL TOPICS, 2007, 143 (1): : 257 - 259
  • [38] A Method for Class-Imbalance Learning in Android Malware Detection
    Guan, Jun
    Jiang, Xu
    Mao, Baolei
    ELECTRONICS, 2021, 10 (24)
  • [39] Research on Data Drift and Class Imbalance in Android Malware Detection
    Liu, Zhen
    Wang, Ruoyu
    Peng, Bitao
    Wang, Changji
    Gan, Qingqing
    MOBILE AND UBIQUITOUS SYSTEMS: COMPUTING, NETWORKING AND SERVICES, MOBIQUITOUS 2023, PT I, 2024, 593 : 429 - 444
  • [40] Impact of datasets on machine learning based methods in Android malware detection: an empirical study
    Ge, Xiuting
    Huang, Yifan
    Hui, Zhanwei
    Wang, Xiaojuan
    Cao, Xu
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 81 - 92