Big data analytics for identifying electricity theft using machine learning approaches in microgrids for smart communities

被引:20
|
作者
Arif, Arooj [1 ]
Javaid, Nadeem [1 ]
Aldegheishem, Abdulaziz [2 ]
Alrajeh, Nabil [3 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Islamabad 44000, Pakistan
[2] King Saud Univ KSU, Coll Architecture & Planning, Urban Planning Dept, Riyadh, Saudi Arabia
[3] King Saud Univ KSU, Biomed Technol Dept, Coll Appl Med Sci, Riyadh, Saudi Arabia
来源
关键词
big data; electricity theft detection; hyperactive optimization toolkit; machine learning; smart grids; urban planning; IMBALANCED DATA; OPTIMIZATION; SYSTEMS;
D O I
10.1002/cpe.6316
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Electricity theft (ET) causes major revenue loss in power utilities. It reduces the quality of supply, raises production cost, causes legal consumers to pay the higher cost, and impacts the economy as a whole. In this article, we use the State Grid Corporation of China (SGCC) dataset, which contains electricity consumption data of 1035 days for two classes: normal and fraudulent. In this work, ET detection model is proposed that consists of four steps: interpolation, data balancing, feature extraction, and classification. First, missing values of the dataset are recovered using the interpolation method. Second, resampling technique is implemented. ET consumers are 9% in the SGCC dataset that make the model inefficient to correctly classify both classes (normal and theft). A hybrid resampling technique is proposed, named synthetic minority oversampling technique with near miss. Third, residual network extracts the latent features from the SGCC dataset. Fourth, three tree based classifiers, such as decision tree (DT), random forest (RF), and adaptive boosting (AdaBoost) are applied to train the encoded feature vectors for classification. Besides, search for good hyperparameters is a challenging task, which is usually done manually and takes a considerable amount of time. To resolve this problem, Bayesian optimizer is used to simplify the tuning process of DT, RF, and AdaBoost. Finally, the results indicate that RF outperforms DT and AdaBoost.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Big data analytics and classification of cardiovascular disease using machine learning
    Narejo, Sanam
    Shaikh, Anoud
    Memon, Mehak Maqbool
    Mahar, Kainat
    Aleem, Zonera
    Zardari, Bisharat
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (02) : 2025 - 2033
  • [32] A Theoretical Model for Big Data Analytics using Machine Learning Algorithms
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 635 - 639
  • [33] Robust Big Data Analytics for Electricity Price Forecasting in the Smart Grid
    Wang, Kun
    Xu, Chenhan
    Zhang, Yan
    Guo, Song
    Zomaya, Albert Y.
    IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (01) : 34 - 45
  • [34] Big Data Analytics for Discovering Electricity Consumption Patterns in Smart Cities
    Perez-Chacon, Ruben
    Luna-Romera, Jose M.
    Troncoso, Alicia
    Martinez-Alvarez, Francisco
    Riquelme, Jose C.
    ENERGIES, 2018, 11 (03)
  • [35] Advanced Machine Learning and Statistical Inference Approaches for Big Data Analytics and Information Fusion
    Mehra, Raman K.
    Gandhe, Avinash
    Mansinghka, Vikash
    Shafto, Patrick
    Lovell, Dan
    Yu, Ssu-Hsin
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XXII, 2013, 8745
  • [36] Critical review of machine learning approaches to apply big data analytics in DDoS forensics
    Hoon, Kian Son
    Yeo, Kheng Cher
    Azam, Sami
    Shanmugam, Bharanidharan
    De Boer, Friso
    2018 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2018,
  • [37] Advanced Machine Learning & Statistical Inference Approaches for Big Data Analytics and Information Fusion
    Mehra, Raman K.
    Gandhe, Avinash
    Mansinghka, Vikash
    Shafto, Patrick
    Lovell, Dan
    Yu, Ssu-Hsin
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XXII, 2013, 8745
  • [38] EDUCATIONAL BIG DATA ANALYTICS FOR FUTURISTIC SMART LEARNING USING DEEP LEARNING TECHNIQUES
    YU R.
    YAO T.
    BAI F.
    Scalable Computing, 2024, 25 (04): : 2728 - 2735
  • [39] Multimedia and machine learning approaches for data analytics
    Multimedia Tools and Applications, 2020, 79 : 35169 - 35169
  • [40] Multimedia and machine learning approaches for data analytics
    Yang, Wankou
    Jain, Deepak Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (47-48) : 35169 - 35169