Exploiting Domain Knowledge to Address Class Imbalance in Meteorological Data Mining

被引:1
|
作者
Tsagalidis, Evangelos [1 ]
Evangelidis, Georgios [2 ]
机构
[1] Hellen Agr Insurance Org, Meteorol Applicat Ctr, Int Airport Makedonia, Thessaloniki 55103, Greece
[2] Univ Macedonia, Sch Informat Sci, Dept Appl Informat, Thessaloniki 54636, Greece
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 23期
关键词
meteorological data mining and machine learning; class imbalance; classification; randomized undersampling; SMOTE oversampling; undersampling using temporal distances;
D O I
10.3390/app122312402
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We deal with the problem of class imbalance in data mining and machine learning classification algorithms. This is the case where some of the class labels are represented by a small number of examples in the training dataset compared to the rest of the class labels. Usually, those minority class labels are the most important ones, implying that classifiers should primarily perform well on predicting those labels. This is a well-studied problem and various strategies that use sampling methods are used to balance the representation of the labels in the training dataset and improve classifier performance. We explore whether expert knowledge in the field of Meteorology can enhance the quality of the training dataset when treated by pre-processing sampling strategies. We propose four new sampling strategies based on our expertise on the data domain and we compare their effectiveness against the established sampling strategies used in the literature. It turns out that our sampling strategies, which take advantage of expert knowledge from the data domain, achieve class balancing that improves the performance of most classifiers.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Incorporating domain knowledge into attribute-oriented data mining
    McClean, S
    Scotney, B
    Shapcott, M
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2000, 15 (06) : 535 - 547
  • [22] Combining expert knowledge and data mining in a medical diagnosis domain
    Alonso, F
    Caraça-Valente, JP
    González, AL
    Montes, C
    EXPERT SYSTEMS WITH APPLICATIONS, 2002, 23 (04) : 367 - 375
  • [23] Decision Tree Algorithms: Integration of Domain Knowledge for Data Mining
    Stravinskiene, Aukse
    Gudas, Saulius
    Dabrilaite, Aiste
    BUSINESS INFORMATION SYSTEMS WORKSHOPS, BIS 2012, 2012, 127 : 13 - 24
  • [24] A Data Mining Address Book
    Lyon, Douglas
    JOURNAL OF OBJECT TECHNOLOGY, 2008, 7 (01): : 15 - 26
  • [25] Evaluating the Performance of Data Level Methods Using KEEL Tool to Address Class Imbalance Problem
    Upadhyay, Kamlesh
    Kaur, Prabhjot
    Verma, Deepak Kumar
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (08) : 9741 - 9754
  • [26] Evaluating the Performance of Data Level Methods Using KEEL Tool to Address Class Imbalance Problem
    Kamlesh Upadhyay
    Prabhjot Kaur
    Deepak Kumar Verma
    Arabian Journal for Science and Engineering, 2022, 47 : 9741 - 9754
  • [27] Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue
    Shu, Rui
    Xia, Tianpei
    Williams, Laurie
    Menzies, Tim
    2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 144 - 155
  • [28] The Establishment and Data Mining of Meteorological Data Warehouse
    Shao, Lei
    Liu, Jun
    Dong, Guoling
    Mu, Yi
    Guo, Peng
    2014 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (IEEE ICMA 2014), 2014, : 2049 - 2054
  • [29] Exploiting domain knowledge to detect outliers
    Angiulli, Fabrizio
    Fassetti, Fabio
    DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (02) : 519 - 568
  • [30] Exploiting domain knowledge to detect outliers
    Fabrizio Angiulli
    Fabio Fassetti
    Data Mining and Knowledge Discovery, 2014, 28 : 519 - 568