Testing of algorithms for anomaly detection in Big data using apache spark

被引:0
|
作者
Lighari, Sheeraz Niaz [1 ]
Hussain, Dil Muhammad Akbar [1 ]
机构
[1] Aalborg Univ, Dept Energy Technol, Esbjerg, Denmark
来源
2017 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN) | 2017年
关键词
Big data; Security analytics; Machine learning;
D O I
10.1109/CICN.2017.23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The constant upsurge in the size of networks and the data massively produced by them has made the data analysis very challenging principally the data attaining the boundaries of big data and it becomes even more difficult to detect intrusions in the case of big data. In this era, the experts find very limited tools and methods to analyze big data for security reasons. Either we need to device new tools or we can use existing tools in a novel manner to achieve the purpose of big data security analysis. In this paper, we are using apache spark a big data tool for analyzing the big dataset for anomaly detection. The anomaly detection is performed by using different machine learning algorithms like Logistic regression, Support vector machine, Naive bayes, Decision trees, Random forest, and Kmeans. More or less all the aforementioned algorithms are capable to detect anomalies in big data but we need to know how efficiently each performs. The main objective of this investigation is to find the most efficient algorithm in the context of anomaly detection. In this regard, we set to compare their training time, prediction time, and the rate of accuracy. The analysis was implemented on Kddcup99 dataset Although this dataset is of size in megabytes but it meets our purpose here for big data security analytics.
引用
收藏
页码:97 / 100
页数:4
相关论文
共 50 条
  • [31] Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark
    Mogha, Garima
    Ahlawat, Khyati
    Singh, Amit Prakash
    DATA SCIENCE AND ANALYTICS, 2018, 799 : 17 - 26
  • [32] Predictors of outpatients' no-show: big data analytics using apache spark
    Daghistani, Tahani
    AlGhamdi, Huda
    Alshammari, Riyad
    AlHazme, Raed H.
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [33] Big Data Application in Functional Magnetic Resonance Imaging using Apache Spark
    Sarraf, Saman
    Ostadhashem, Mehdi
    PROCEEDINGS OF 2016 FUTURE TECHNOLOGIES CONFERENCE (FTC), 2016, : 281 - 284
  • [34] Query Answering On Uncertain Big RDF Data Using Apache Spark Framework
    Benbernou, Salima
    Ouziri, Mourad
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4854 - 4860
  • [35] Predictors of outpatients’ no-show: big data analytics using apache spark
    Tahani Daghistani
    Huda AlGhamdi
    Riyad Alshammari
    Raed H. AlHazme
    Journal of Big Data, 7
  • [36] Hybrid Machine Learning-Based Approach for Anomaly Detection using Apache Spark
    Chliah, Hanane
    Battou, Amal
    Hadj, Maryem Ait el
    Laoufi, Adil
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (04) : 870 - 878
  • [37] Big Data Optimisation Among RDDs Persistence in Apache Spark
    Aziz, Khadija
    Zaidouni, Dounia
    Bellafkih, Mostafa
    BIG DATA, CLOUD AND APPLICATIONS, BDCA 2018, 2018, 872 : 29 - 40
  • [38] Apache Spark a Big Data Analytics Platform for Smart Grid
    Shyam, R.
    Ganesh, Bharathi H. B.
    Kumar, Sachin S.
    Poornachandran, Prabaharan
    Soman, K. P.
    SMART GRID TECHNOLOGIES (ICSGT- 2015), 2015, 21 : 171 - 178
  • [39] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
  • [40] Big Data Analytics for the ATLAS EventIndex Project with Apache Spark
    Casani, Alvaro Fernandez
    Montoro, Carlos Garcia
    de la Hoz, Santiago Gonzalez
    Salt, Jose
    Sanchez, Javier
    Perez, Miguel Villaplana
    COMPUTATIONAL AND MATHEMATICAL METHODS, 2023, 2023