Influence of Preprocessing Methods of Automated Milking Systems Data on Prediction of Mastitis with Machine Learning Models

被引:0
|
作者
Kashongwe, Olivier [1 ,2 ]
Kabelitz, Tina [1 ]
Ammon, Christian [1 ]
Minogue, Lukas [1 ]
Doherr, Markus [3 ]
Bolona, Pablo Silva [4 ]
Amon, Thomas [1 ,3 ]
Amon, Barbara [5 ,6 ]
机构
[1] Leibniz Inst Agr Engn & Bioecon ATB, Dept Sensors & Modelling, Max Eyth Allee 100, D-14469 Potsdam, Germany
[2] Univ Osnabruck, Joint Lab Artificial Intelligence & Data Sci Agr, D-49069 Osnabruck, Germany
[3] Free Univ Berlin, Dept Vet Med, Robert von Ostertag Str 7-13, D-14163 Berlin, Germany
[4] Anim & Grassland Res & Innovat Ctr, Teagasc, Moorepark P61 C996, Co Cork, Ireland
[5] Leibniz Inst Agr Engn & Bioecon ATB, Dept Technol Assessment, Max Eyth Allee 100, D-14469 Potsdam, Germany
[6] Univ Zielona Gora, Fac Civil Engn Architecture & Environm Engn, PL-65046 Zielona Gora, Poland
来源
AGRIENGINEERING | 2024年 / 6卷 / 03期
关键词
oversampling; undersampling; missing-value imputation; dairy cows; performance metrics; RESAMPLING METHODS; PERFORMANCE;
D O I
10.3390/agriengineering6030195
中图分类号
S2 [农业工程];
学科分类号
0828 ;
摘要
Missing data and class imbalance hinder the accurate prediction of rare events such as dairy mastitis. Resampling and imputation are employed to handle these problems. These methods are often used arbitrarily, despite their profound impact on prediction due to changes caused to the data structure. We hypothesize that their use affects the performance of ML models fitted to automated milking systems (AMSs) data for mastitis prediction. We compare three imputations-simple imputer (SI), multiple imputer (MICE) and linear interpolation (LI)-and three resampling techniques: Synthetic Minority Oversampling Technique (SMOTE), Support Vector Machine SMOTE (SVMSMOTE) and SMOTE with Edited Nearest Neighbors (SMOTEEN). The classifiers were logistic regression (LR), multilayer perceptron (MLP), decision tree (DT) and random forest (RF). We evaluated them with various metrics and compared models with the kappa score. A complete case analysis fitted the RF (0.78) better than other models, for which SI performed best. The DT, RF, and MLP performed better with SVMSMOTE. The RF, DT and MLP had the overall best performance, contributed by imputation or resampling (SMOTE and SVMSMOTE). We recommend carefully selecting resampling and imputation techniques and comparing them with complete cases before deciding on the preprocessing approach used to test AMS data with ML models.
引用
收藏
页码:3427 / 3442
页数:16
相关论文
共 50 条
  • [31] Machine learning methods for genomic prediction of cow behavioral traits measured by automatic milking systems in North American Holstein cattle
    Pedrosa, Victor B.
    Chen, Shi-Yi
    Gloria, Leonardo S.
    Doucette, Jarrod S.
    Boerman, Jacquelyn P.
    Rosa, Guilherme J. M.
    Brito, Luiz F.
    JOURNAL OF DAIRY SCIENCE, 2024, 107 (07) : 4758 - 4771
  • [32] Three Optimization Methods for Preprocessing Dam Safety Monitoring Data Using Machine Learning
    Jiang, Zihan
    Gu, Hao
    Fang, Yue
    Shao, Chenfei
    Lu, Xi
    Cao, Wenhan
    Wang, Jiayi
    Wu, Yan
    Zhu, Mingyuan
    STRUCTURAL CONTROL & HEALTH MONITORING, 2024, 2024 (01):
  • [33] Assessing the Influence of Preprocessing Methods on Raw GPS-Data for Automated Change Point Detection
    Thalmann, Tomas
    Abdalla, Amin
    CONNECTING A DIGITAL EUROPE THROUGH LOCATION AND PLACE, 2014, : 123 - 139
  • [34] The Influence of Motion Data Low-Pass Filtering Methods in Machine-Learning Models
    Wang, Shuaijie
    Pitts, Jessica
    Purohit, Rudri
    Shah, Himani
    APPLIED SCIENCES-BASEL, 2025, 15 (04):
  • [35] AUGMENTING MORTALITY PREDICTION WITH MEDICATION DATA AND MACHINE LEARNING MODELS
    Sikora, Andrea
    Zhang, Tianyi
    Chen, Xianyan
    Smith, Susan
    Devlin, John
    Murphy, David
    Kamaleswaran, Rishikesan
    Murray, Brian
    CRITICAL CARE MEDICINE, 2025, 53 (01)
  • [36] Data-driven models in machine learning for crime prediction
    Wawrzyniak, Zbigniew M.
    Jankowski, Stanislaw
    Szczechla, Eliza
    Szymanski, Zbigniew
    Pytlak, Radoslaw
    Michalak, Pawel
    Borowik, Grzegorz
    2018 26TH INTERNATIONAL CONFERENCE ON SYSTEMS ENGINEERING (ICSENG 2018), 2018,
  • [37] Towards Explaining the Effects of Data Preprocessing on Machine Learning
    Zelaya, Carlos Vladimiro Gonzalez
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2086 - 2090
  • [38] Machine Learning based Intelligent Framework for Data Preprocessing
    Sarwar, Sohail
    Qayyum, Zia Ul
    Kaleem, Abdul
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (06) : 1010 - 1015
  • [39] Prediction of diffusion coefficients in aqueous systems by machine learning models
    Aniceto, Jose P. S.
    Zezere, Bruno
    Silva, Carlos M.
    JOURNAL OF MOLECULAR LIQUIDS, 2024, 405
  • [40] Data Preprocessing and Machine Learning Modeling for Rockburst Assessment
    Li, Jie
    Fu, Helin
    Hu, Kaixun
    Chen, Wei
    SUSTAINABILITY, 2023, 15 (18)