Machine Learning with Imbalanced EEG Datasets using Outlier-based Sampling

被引:0
|
作者
Islah, Nizar [1 ]
Koerner, Jamie [1 ]
Genov, Roman [1 ]
Valiante, Taufik A. [1 ,2 ]
O'Leary, Gerard [1 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 2E4, Canada
[2] Univ Toronto, Dept Surg Neurosurg, Toronto, ON M5T 2S8, Canada
关键词
D O I
暂无
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Epilepsy is a neurological disorder which causes seizures in over 65 million people worldwide. Recently developed implantable therapeutic devices aim to prevent symptoms by applying acute electrical stimulation to the seizure-generating brain region in response to activity detected by on-device machine learning hardware. Many training algorithms require an equal number of examples for each target class (e.g. normal activity and seizures), and performance can suffer if this condition is not satisfied. In the case of epilepsy, poor performance can cause seizures to be missed, or stimulation to be applied erroneously. As there is an abundance of normal (interictal) data in clinical EEG recordings, but seizures are rare events (less than 1% of the dataset), the data available for training is severely imbalanced. There are several conventional pre-processing methods used to address imbalanced class learning, such as down-sampling of the majority class and up-sampling of the minority class, but each have performance drawbacks. This paper presents an improved method which involves reducing the majority class down to the most effective interictal outlier samples. Outliers are determined by using Exponentially Decaying Memory Signal Energy (EDMSE) features with Isolation Forests and an ANOVA-based method, which involves comparing a moving feature window to a baseline reference window. Outlier-based sampling is tested with two classifiers (KNN and Logistic Regression) and achieves higher accuracy (similar to 2% increase) and fewer false positives (similar to 38% decrease), along with a lower latency (similar to 3 seconds shorter) compared to conventional training set pre-processing methods.
引用
收藏
页码:112 / 115
页数:4
相关论文
共 50 条
  • [31] An algorithm of robust online extreme learning machine for dynamic imbalanced datasets
    Zhang, Jing
    Feng, Lin
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (07): : 1487 - 1498
  • [32] Learning imbalanced datasets based on SMOTE and Gaussian distribution
    Pan, Tingting
    Zhao, Junhong
    Wu, Wei
    Yang, Jie
    INFORMATION SCIENCES, 2020, 512 : 1214 - 1233
  • [33] It is time for some deep learning: a statistical commentary on machine learning for clinical prediction models using imbalanced datasets
    Stonko, David
    Jarman, Molly P.
    Byrne, James P.
    TRAUMA SURGERY & ACUTE CARE OPEN, 2024, 9 (01)
  • [34] Machine Learning for Imbalanced Datasets of Recognizing Inference in Text with Linguistic Phenomena
    Day, Min-Yuh
    Tsai, Cheng-Chia
    2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2015, : 562 - 568
  • [35] Cluster-Based Minority Over-Sampling for Imbalanced Datasets
    Puntumapon, Kamthorn
    Rakthamamon, Thanawin
    Waiyamai, Kitsana
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3101 - 3109
  • [36] Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning
    Tyagi, Shivani
    Mittal, Sangeeta
    PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 209 - 221
  • [37] AUTOMATIC BAD CHANNELS DETECTION FROM INTRACRANIAL EEG DATASETS USING MACHINE LEARNING
    Tuyisenge, V.
    Trebaul, L.
    Bhattacharjee, M.
    Chanteloup-Foret, B.
    Saubat, C.
    Mindruta, I.
    Rheims, S.
    Maillard, L.
    Kahane, P.
    Taussig, D.
    David, O.
    EPILEPSIA, 2017, 58 : S74 - S74
  • [38] Exploring Early Prediction of Chronic Kidney Disease Using Machine Learning Algorithms for Small and Imbalanced Datasets
    da Silveira, Andressa C. M.
    Sobrinho, Alvaro
    da Silva, Leandro Dias
    Costa, Evandro de Barros
    Pinheiro, Maria Eliete
    Perkusich, Angelo
    APPLIED SCIENCES-BASEL, 2022, 12 (07):
  • [39] Supervised Machine Learning and Heuristic Algorithms for Outlier Detection in Irregular Spatiotemporal Datasets
    Chowdhury, K. P.
    JOURNAL OF ENVIRONMENTAL INFORMATICS, 2019, 33 (01) : 1 - 16
  • [40] Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models
    Dube, Lindani
    Verster, Tanja
    DATA SCIENCE IN FINANCE AND ECONOMICS, 2023, 3 (04): : 354 - 379