Exploiting Domain Knowledge to Address Class Imbalance in Meteorological Data Mining

被引:1
|
作者
Tsagalidis, Evangelos [1 ]
Evangelidis, Georgios [2 ]
机构
[1] Hellen Agr Insurance Org, Meteorol Applicat Ctr, Int Airport Makedonia, Thessaloniki 55103, Greece
[2] Univ Macedonia, Sch Informat Sci, Dept Appl Informat, Thessaloniki 54636, Greece
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 23期
关键词
meteorological data mining and machine learning; class imbalance; classification; randomized undersampling; SMOTE oversampling; undersampling using temporal distances;
D O I
10.3390/app122312402
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We deal with the problem of class imbalance in data mining and machine learning classification algorithms. This is the case where some of the class labels are represented by a small number of examples in the training dataset compared to the rest of the class labels. Usually, those minority class labels are the most important ones, implying that classifiers should primarily perform well on predicting those labels. This is a well-studied problem and various strategies that use sampling methods are used to balance the representation of the labels in the training dataset and improve classifier performance. We explore whether expert knowledge in the field of Meteorology can enhance the quality of the training dataset when treated by pre-processing sampling strategies. We propose four new sampling strategies based on our expertise on the data domain and we compare their effectiveness against the established sampling strategies used in the literature. It turns out that our sampling strategies, which take advantage of expert knowledge from the data domain, achieve class balancing that improves the performance of most classifiers.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification
    Hirsch, Vitali
    Reimann, Peter
    Treder-Tschechlov, Dennis
    Schwarz, Holger
    Mitschang, Bernhard
    VLDB JOURNAL, 2023, 32 (05): : 1037 - 1064
  • [2] Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification
    Vitali Hirsch
    Peter Reimann
    Dennis Treder-Tschechlov
    Holger Schwarz
    Bernhard Mitschang
    The VLDB Journal, 2023, 32 : 1037 - 1064
  • [3] Exploiting Domain Knowledge to address Multi-Class Imbalance and a Heterogeneous Feature Space in Classification Tasks for Manufacturing Data
    Hirsch, Vitali
    Reimann, Peter
    Mitschang, Bernhard
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 3258 - 3271
  • [4] Alleviating Class Imbalance Problem In Data Mining
    Sarmanova, Akkenzhe
    Albayrak, Songul
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [5] A survey on applications of opinion mining class imbalance data
    Babu, P. Ratna
    Battula, Bhanu Prakash
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS (ICICI 2017), 2017, : 550 - 552
  • [6] Data Mining Integrated with Domain Knowledge
    Huang, Anqiang
    Zhang, Lingling
    Zhu, Zhengxiang
    Shi, Yong
    CUTTING-EDGE RESEARCH TOPICS ON MULTIPLE CRITERIA DECISION MAKING, PROCEEDINGS, 2009, 35 : 184 - +
  • [7] Exploiting data preparation to enhance mining and knowledge discovery
    Rajagopalan, B
    Isken, MW
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2001, 31 (04): : 460 - 467
  • [8] Exploiting class hierarchies for knowledge transfer in hyperspectral data
    Rajan, S
    Ghosh, J
    MULTIPLE CLASSIFIER SYSTEMS, 2005, 3541 : 417 - 427
  • [9] Exploiting class hierarchies for knowledge transfer in hyperspectral data
    Rajan, Suju
    Ghosh, Joydeep
    Crawford, Melba M.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2006, 44 (11): : 3408 - 3417
  • [10] DATA MINING AND KNOWLEDGE DISCOVERY TOOLS FOR EXPLOITING BIG EARTH OBSERVATION DATA
    Molina, D. Espinoza
    Datcu, M.
    36TH INTERNATIONAL SYMPOSIUM ON REMOTE SENSING OF ENVIRONMENT, 2015, 47 (W3): : 627 - 633