Exploiting Domain Knowledge to Address Class Imbalance in Meteorological Data Mining

被引:1
|
作者
Tsagalidis, Evangelos [1 ]
Evangelidis, Georgios [2 ]
机构
[1] Hellen Agr Insurance Org, Meteorol Applicat Ctr, Int Airport Makedonia, Thessaloniki 55103, Greece
[2] Univ Macedonia, Sch Informat Sci, Dept Appl Informat, Thessaloniki 54636, Greece
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 23期
关键词
meteorological data mining and machine learning; class imbalance; classification; randomized undersampling; SMOTE oversampling; undersampling using temporal distances;
D O I
10.3390/app122312402
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We deal with the problem of class imbalance in data mining and machine learning classification algorithms. This is the case where some of the class labels are represented by a small number of examples in the training dataset compared to the rest of the class labels. Usually, those minority class labels are the most important ones, implying that classifiers should primarily perform well on predicting those labels. This is a well-studied problem and various strategies that use sampling methods are used to balance the representation of the labels in the training dataset and improve classifier performance. We explore whether expert knowledge in the field of Meteorology can enhance the quality of the training dataset when treated by pre-processing sampling strategies. We propose four new sampling strategies based on our expertise on the data domain and we compare their effectiveness against the established sampling strategies used in the literature. It turns out that our sampling strategies, which take advantage of expert knowledge from the data domain, achieve class balancing that improves the performance of most classifiers.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Logical Aspects of Dealing with Domain Knowledge in Data Mining with Association Rules
    Rauch, Jan
    FUNDAMENTA INFORMATICAE, 2016, 148 (1-2) : 1 - 33
  • [42] Incorporating domain knowledge into data mining classifiers: An application in indirect lending
    Sinha, Atish R.
    Zhao, Huimin
    DECISION SUPPORT SYSTEMS, 2008, 46 (01) : 287 - 299
  • [43] Knowledge mining in earth observation data archives: A domain ontology perspective
    Durbha, SS
    King, RL
    IGARSS 2004: IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM PROCEEDINGS, VOLS 1-7: SCIENCE FOR SOCIETY: EXPLORING AND MANAGING A CHANGING PLANET, 2004, : 172 - +
  • [44] Incorporating Domain Knowledge into Data Mining Process:An Ontology Based Framework
    PAN Ding~ 1
    2. Department of Computer Science and Engineering
    WuhanUniversityJournalofNaturalSciences, 2006, (01) : 165 - 169
  • [45] Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem
    Huang, Yueh-Min
    Hung, Chun-Min
    Jiau, Hewijin Christine
    NONLINEAR ANALYSIS-REAL WORLD APPLICATIONS, 2006, 7 (04) : 720 - 747
  • [46] Data Transfer and Extension for Mining Big Meteorological Data
    Huang, Tianwen
    Jiao, Fei
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2017, PT I, 2017, 10361 : 57 - 66
  • [47] EXPLOITING DOMAIN KNOWLEDGE IN IC CELL LAYOUT
    KIM, JH
    MCDERMOTT, J
    SIEWIOREK, DP
    IEEE DESIGN & TEST OF COMPUTERS, 1984, 1 (03): : 52 - 64
  • [48] Exploiting Domain Knowledge in Making Delegation Decisions
    Emele, Chukwuemeka David
    Norman, Timothy J.
    Sensoy, Murat
    Parsons, Simon
    AGENTS AND DATA MINING INTERACTION, 2012, 7103 : 117 - +
  • [49] Exploiting Saliency Filters and Domain knowledge for Saliency
    Zeng, Jianqin
    Chen, Wei
    Zhang, Guangzheng
    Guo, Kai
    PROGRESS IN MECHATRONICS AND INFORMATION TECHNOLOGY, PTS 1 AND 2, 2014, 462-463 : 410 - 415
  • [50] Active domain adaptation with mining diverse knowledge: An updated class consensus dictionary approach
    Tian, Qing
    Zhou, Liangyu
    Zhu, Yanan
    Kang, Lulu
    INFORMATION SCIENCES, 2024, 667