Big data analytics for identifying electricity theft using machine learning approaches in microgrids for smart communities

被引:20
|
作者
Arif, Arooj [1 ]
Javaid, Nadeem [1 ]
Aldegheishem, Abdulaziz [2 ]
Alrajeh, Nabil [3 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Islamabad 44000, Pakistan
[2] King Saud Univ KSU, Coll Architecture & Planning, Urban Planning Dept, Riyadh, Saudi Arabia
[3] King Saud Univ KSU, Biomed Technol Dept, Coll Appl Med Sci, Riyadh, Saudi Arabia
来源
关键词
big data; electricity theft detection; hyperactive optimization toolkit; machine learning; smart grids; urban planning; IMBALANCED DATA; OPTIMIZATION; SYSTEMS;
D O I
10.1002/cpe.6316
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Electricity theft (ET) causes major revenue loss in power utilities. It reduces the quality of supply, raises production cost, causes legal consumers to pay the higher cost, and impacts the economy as a whole. In this article, we use the State Grid Corporation of China (SGCC) dataset, which contains electricity consumption data of 1035 days for two classes: normal and fraudulent. In this work, ET detection model is proposed that consists of four steps: interpolation, data balancing, feature extraction, and classification. First, missing values of the dataset are recovered using the interpolation method. Second, resampling technique is implemented. ET consumers are 9% in the SGCC dataset that make the model inefficient to correctly classify both classes (normal and theft). A hybrid resampling technique is proposed, named synthetic minority oversampling technique with near miss. Third, residual network extracts the latent features from the SGCC dataset. Fourth, three tree based classifiers, such as decision tree (DT), random forest (RF), and adaptive boosting (AdaBoost) are applied to train the encoded feature vectors for classification. Besides, search for good hyperparameters is a challenging task, which is usually done manually and takes a considerable amount of time. To resolve this problem, Bayesian optimizer is used to simplify the tuning process of DT, RF, and AdaBoost. Finally, the results indicate that RF outperforms DT and AdaBoost.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Machine learning for Big Data analytics in plants
    Ma, Chuang
    Zhang, Hao Helen
    Wang, Xiangfeng
    TRENDS IN PLANT SCIENCE, 2014, 19 (12) : 798 - 808
  • [22] Big Data, Predictive Analytics and Machine Learning
    Ongsulee, Pariwat
    Chotchaung, Veena
    Bamrungsi, Eak
    Rodcheewit, Thanaporn
    2018 16TH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT&KE), 2018, : 37 - 42
  • [23] Machine Learning Technologies for Big Data Analytics
    Gandomi, Amir H.
    Chen, Fang
    Abualigah, Laith
    ELECTRONICS, 2022, 11 (03)
  • [24] Internet of Things and Big Data Analytics for Smart and Connected Communities
    Sun, Yunchuan
    Song, Houbing
    Jara, Antonio J.
    Bie, Rongfang
    IEEE ACCESS, 2016, 4 : 766 - 773
  • [25] Editorial: Machine Learning and Big Data Analytics for IoT-Enabled Smart Cities
    Jan, Mian Ahmad
    He, Xiangjian
    Song, Houbing
    Babar, Muhammad
    MOBILE NETWORKS & APPLICATIONS, 2021, 26 (01): : 156 - 158
  • [26] A Smart Social Insurance Big Data Analytics Framework Based on Machine Learning Algorithms
    Senousy, Youssef
    Shehab, Abdulaziz
    Hanna, Wael K.
    Riad, Alaa M.
    El-bakry, Hazem A.
    Elkhamisy, Nashaat
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2020, 20 (01) : 95 - 111
  • [27] Editorial: Machine Learning and Big Data Analytics for IoT-Enabled Smart Cities
    Mian Ahmad Jan
    Xiangjian He
    Houbing Song
    Muhammad Babar
    Mobile Networks and Applications, 2021, 26 : 156 - 158
  • [28] Application of soft computing and machine learning in the big data analytics for smart cities and factories
    Esposito, Christian
    Pop, Florin
    Huang, Jun
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2019, 49 : 489 - 490
  • [29] Big data Predictive Analytics for Apache Spark using Machine Learning
    Junaid, Muhammad
    Wagan, Shiraz Ali
    Qureshi, Nawab Muhammad Faseeh
    Nam, Choon Sung
    Shin, Dong Ryeol
    2020 GLOBAL CONFERENCE ON WIRELESS AND OPTICAL TECHNOLOGIES (GCWOT), 2020,
  • [30] Using Machine Learning and Big Data Analytics to Prioritize Outpatients in HetNets
    Hadi, Mohammed
    Lawey, Ahmed
    El-Gorashi, Taisir
    Elmirghani, Jaafar
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM 2019 WKSHPS), 2019, : 726 - 731