Influence of Preprocessing Methods of Automated Milking Systems Data on Prediction of Mastitis with Machine Learning Models

被引:0
|
作者
Kashongwe, Olivier [1 ,2 ]
Kabelitz, Tina [1 ]
Ammon, Christian [1 ]
Minogue, Lukas [1 ]
Doherr, Markus [3 ]
Bolona, Pablo Silva [4 ]
Amon, Thomas [1 ,3 ]
Amon, Barbara [5 ,6 ]
机构
[1] Leibniz Inst Agr Engn & Bioecon ATB, Dept Sensors & Modelling, Max Eyth Allee 100, D-14469 Potsdam, Germany
[2] Univ Osnabruck, Joint Lab Artificial Intelligence & Data Sci Agr, D-49069 Osnabruck, Germany
[3] Free Univ Berlin, Dept Vet Med, Robert von Ostertag Str 7-13, D-14163 Berlin, Germany
[4] Anim & Grassland Res & Innovat Ctr, Teagasc, Moorepark P61 C996, Co Cork, Ireland
[5] Leibniz Inst Agr Engn & Bioecon ATB, Dept Technol Assessment, Max Eyth Allee 100, D-14469 Potsdam, Germany
[6] Univ Zielona Gora, Fac Civil Engn Architecture & Environm Engn, PL-65046 Zielona Gora, Poland
来源
AGRIENGINEERING | 2024年 / 6卷 / 03期
关键词
oversampling; undersampling; missing-value imputation; dairy cows; performance metrics; RESAMPLING METHODS; PERFORMANCE;
D O I
10.3390/agriengineering6030195
中图分类号
S2 [农业工程];
学科分类号
0828 ;
摘要
Missing data and class imbalance hinder the accurate prediction of rare events such as dairy mastitis. Resampling and imputation are employed to handle these problems. These methods are often used arbitrarily, despite their profound impact on prediction due to changes caused to the data structure. We hypothesize that their use affects the performance of ML models fitted to automated milking systems (AMSs) data for mastitis prediction. We compare three imputations-simple imputer (SI), multiple imputer (MICE) and linear interpolation (LI)-and three resampling techniques: Synthetic Minority Oversampling Technique (SMOTE), Support Vector Machine SMOTE (SVMSMOTE) and SMOTE with Edited Nearest Neighbors (SMOTEEN). The classifiers were logistic regression (LR), multilayer perceptron (MLP), decision tree (DT) and random forest (RF). We evaluated them with various metrics and compared models with the kappa score. A complete case analysis fitted the RF (0.78) better than other models, for which SI performed best. The DT, RF, and MLP performed better with SVMSMOTE. The RF, DT and MLP had the overall best performance, contributed by imputation or resampling (SMOTE and SVMSMOTE). We recommend carefully selecting resampling and imputation techniques and comparing them with complete cases before deciding on the preprocessing approach used to test AMS data with ML models.
引用
收藏
页码:3427 / 3442
页数:16
相关论文
共 50 条
  • [1] On Evaluating Data Preprocessing Methods for Machine Learning Models for Flight Delays
    Moreira, Leonardo
    Dantas, Christofer
    Oliveira, Leonardo
    Soares, Jorge
    Ogasawara, Eduardo
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 779 - 786
  • [2] Multivariable time series classification for clinical mastitis detection and prediction in automated milking systems
    Fan, X.
    Watters, R. D.
    Nydam, D., V
    Virkler, P. D.
    Wieland, M.
    Reed, K. F.
    JOURNAL OF DAIRY SCIENCE, 2023, 106 (05) : 3448 - 3464
  • [3] Atlantic-Automated data preprocessing framework for supervised machine learning
    Santos, Luis
    Ferreira, Luis
    SOFTWARE IMPACTS, 2023, 17
  • [4] Early Detection Method for Subclinical Mastitis in Auto Milking Systems Using Machine Learning
    Motohashi, Haruka
    Ohwada, Hayato
    Kubota, Chikara
    PROCEEDINGS OF 2020 IEEE 19TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC 2020), 2020, : 76 - 83
  • [5] Streamflow prediction in mountainous region using new machine learning and data preprocessing methods: a case study
    Ikram, Rana Muhammad Adnan
    Hazarika, Barenya Bikash
    Gupta, Deepak
    Heddam, Salim
    Kisi, Ozgur
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (12): : 9053 - 9070
  • [6] Streamflow prediction in mountainous region using new machine learning and data preprocessing methods: a case study
    Rana Muhammad Adnan Ikram
    Barenya Bikash Hazarika
    Deepak Gupta
    Salim Heddam
    Ozgur Kisi
    Neural Computing and Applications, 2023, 35 : 9053 - 9070
  • [7] Iliou Machine Learning Data Preprocessing Method for Stress Level Prediction
    Iliou, Theodoros
    Konstantopoulou, Georgia
    Stephanakis, Ioannis
    Anastasopoulos, Konstantinos
    Lymberopoulos, Dimitrios
    Anastassopoulos, George
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 351 - 361
  • [8] Automated prediction of mastitis infection patterns in dairy herds using machine learning
    Hyde, Robert M.
    Down, Peter M.
    Bradley, Andrew J.
    Breen, James E.
    Hudson, Chris
    Leach, Katharine A.
    Green, Martin J.
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [9] Prediction of Distillation Column Temperature Using Machine Learning and Data Preprocessing
    Lee, Yechan
    Choi, Yeongryeol
    Cho, Hyungtae
    Kim, Junghwan
    KOREAN CHEMICAL ENGINEERING RESEARCH, 2021, 59 (02): : 191 - 199
  • [10] Automated prediction of mastitis infection patterns in dairy herds using machine learning
    Robert M. Hyde
    Peter M. Down
    Andrew J. Bradley
    James E. Breen
    Chris Hudson
    Katharine A. Leach
    Martin J. Green
    Scientific Reports, 10