Anomaly credit data detection based on enhanced Isolation Forest

被引:5
|
作者
Zhang, Xiaodong [1 ]
Yao, Yuan [1 ]
Lv, Congdong [1 ]
Wang, Tao [2 ]
机构
[1] Nanjing Audit Univ, Sch Informat Engn, Nanjing 211815, Peoples R China
[2] JUSFOUN BIG DATA, Beijing 10000, Peoples R China
基金
国家重点研发计划;
关键词
Credit evaluation; Anomaly detection; Class-imbalance; Cost-sensitive; EasyEnsemble; Isolation forest; SVM;
D O I
10.1007/s00170-022-09251-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In view of the real-world problem of falsity and errors credit data, and the performance degradation of the credit evaluation model caused by these problems, we proposed an outlier detection algorithm, which considered two characteristics of class-imbalance and cost-sensitive in credit data. We use an anomaly detection model called EIF to optimize the credit evaluation models. EIF uses the EasyEnsemble algorithm to construct balanced data sets, and train an Isolation Forest model for anomaly detection by the balanced datasets with different disturbances. On the one hand, the balanced dataset ensures that the class-imbalance problem is solved by undersampling, on the other hand, each sub-model learns from the overall minority class samples in order to solve the cost-sensitive problem. Experiments were performed on UCI German dataset, and the test set with fake data was constructed by correlation. Compared with other anomaly detection algorithms in common credit evaluation models, the EIF-optimized model has a higher F1 score and a lower cost-sensitive error rate. In conclusion, the EIF model is effective in enhancing the performance of the credit evaluation model for forged credit datasets.
引用
收藏
页码:185 / 192
页数:8
相关论文
共 50 条
  • [31] Bilateral-Weighted Online Adaptive Isolation Forest for anomaly detection in streaming data
    Hannak, Gabor
    Horvath, Gabor
    Kadar, Attila
    Szalai, Mark Daniel
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (03) : 215 - 223
  • [32] Extending Isolation Forest for Anomaly Detection in Big Data via K-Means
    Laskar, Md Tahmid Rahman
    Huang, Jimmy Xiangji
    Smetana, Vladan
    Stewart, Chris
    Pouw, Kees
    An, Aijun
    Chan, Stephen
    Liu, Lei
    ACM TRANSACTIONS ON CYBER-PHYSICAL SYSTEMS, 2021, 5 (04)
  • [33] An Anomaly Detection Method for Wireless Sensor Networks Based on the Improved Isolation Forest
    Chen, Junxiang
    Zhang, Jilin
    Qian, Ruixiang
    Yuan, Junfeng
    Ren, Yongjian
    APPLIED SCIENCES-BASEL, 2023, 13 (02):
  • [34] Magnetic Anomaly Detection Method Based on Feature Fusion and Isolation Forest Algorithm
    Zhang, Ning
    Liu, Yifei
    Xu, Lei
    Lin, Pengfei
    Zhao, Heda
    Chang, Ming
    IEEE ACCESS, 2022, 10 : 84444 - 84457
  • [35] Anomaly Detection in Semiconductor Cleanroom Using Isolation Forest
    Jahan, Israt
    Alam, Md Morshed
    Ahmed, Md Faisal
    Jang, Yeong Min
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 795 - 797
  • [36] Isolation Mondrian Forest for Batch and Online Anomaly Detection
    Ma, Haoran
    Ghojogh, Benyamin
    Samad, Maria N.
    Zheng, Dongyu
    Crowley, Mark
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 3051 - 3058
  • [37] Subspace analysis isolation forest for hyperspectral anomaly detection
    Huang Y.
    Xue Y.
    Li P.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2021, 50 (03): : 416 - 425
  • [38] CADI: Contextual Anomaly Detection using an Isolation Forest
    Yepmo, Veronne
    Smits, Gregory
    Lesot, Marie-Jeanne
    Pivert, Olivier
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 935 - 944
  • [39] On the statistical properties of the isolation forest anomaly detection method
    Pelletier, Bruno
    ELECTRONIC JOURNAL OF STATISTICS, 2024, 18 (02): : 4322 - 4381
  • [40] Semi-Supervised Isolation Forest for Anomaly Detection
    Stradiotti, Luca
    Perini, Lorenzo
    Davis, Jesse
    PROCEEDINGS OF THE 2024 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2024, : 670 - 678