Influence of Preprocessing Methods of Automated Milking Systems Data on Prediction of Mastitis with Machine Learning Models

被引:0
|
作者
Kashongwe, Olivier [1 ,2 ]
Kabelitz, Tina [1 ]
Ammon, Christian [1 ]
Minogue, Lukas [1 ]
Doherr, Markus [3 ]
Bolona, Pablo Silva [4 ]
Amon, Thomas [1 ,3 ]
Amon, Barbara [5 ,6 ]
机构
[1] Leibniz Inst Agr Engn & Bioecon ATB, Dept Sensors & Modelling, Max Eyth Allee 100, D-14469 Potsdam, Germany
[2] Univ Osnabruck, Joint Lab Artificial Intelligence & Data Sci Agr, D-49069 Osnabruck, Germany
[3] Free Univ Berlin, Dept Vet Med, Robert von Ostertag Str 7-13, D-14163 Berlin, Germany
[4] Anim & Grassland Res & Innovat Ctr, Teagasc, Moorepark P61 C996, Co Cork, Ireland
[5] Leibniz Inst Agr Engn & Bioecon ATB, Dept Technol Assessment, Max Eyth Allee 100, D-14469 Potsdam, Germany
[6] Univ Zielona Gora, Fac Civil Engn Architecture & Environm Engn, PL-65046 Zielona Gora, Poland
来源
AGRIENGINEERING | 2024年 / 6卷 / 03期
关键词
oversampling; undersampling; missing-value imputation; dairy cows; performance metrics; RESAMPLING METHODS; PERFORMANCE;
D O I
10.3390/agriengineering6030195
中图分类号
S2 [农业工程];
学科分类号
0828 ;
摘要
Missing data and class imbalance hinder the accurate prediction of rare events such as dairy mastitis. Resampling and imputation are employed to handle these problems. These methods are often used arbitrarily, despite their profound impact on prediction due to changes caused to the data structure. We hypothesize that their use affects the performance of ML models fitted to automated milking systems (AMSs) data for mastitis prediction. We compare three imputations-simple imputer (SI), multiple imputer (MICE) and linear interpolation (LI)-and three resampling techniques: Synthetic Minority Oversampling Technique (SMOTE), Support Vector Machine SMOTE (SVMSMOTE) and SMOTE with Edited Nearest Neighbors (SMOTEEN). The classifiers were logistic regression (LR), multilayer perceptron (MLP), decision tree (DT) and random forest (RF). We evaluated them with various metrics and compared models with the kappa score. A complete case analysis fitted the RF (0.78) better than other models, for which SI performed best. The DT, RF, and MLP performed better with SVMSMOTE. The RF, DT and MLP had the overall best performance, contributed by imputation or resampling (SMOTE and SVMSMOTE). We recommend carefully selecting resampling and imputation techniques and comparing them with complete cases before deciding on the preprocessing approach used to test AMS data with ML models.
引用
收藏
页码:3427 / 3442
页数:16
相关论文
共 50 条
  • [21] Effect of data preprocessing and machine learning hyperparameters on mass spectrometry imaging models
    Gardner, Wil
    Winkler, David A.
    Alexander, David L. J.
    Ballabio, Davide
    Muir, Benjamin W.
    Pigram, Paul J.
    JOURNAL OF VACUUM SCIENCE & TECHNOLOGY A, 2023, 41 (06):
  • [22] Machine learning methods in data fusion systems
    Nowak, Robert
    Biedrzycki, Rafal
    Misiurewicz, Jacek
    2012 13TH INTERNATIONAL RADAR SYMPOSIUM (IRS), 2012, : 400 - 405
  • [23] Mastitis detection with recurrent neural networks in farms using automated milking systems
    Naqvi, S. Ali
    King, Meagan T. M.
    Matson, Robert D.
    DeVries, Trevor J.
    Deardon, Rob
    Barkema, Herman W.
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2022, 192
  • [24] Influence of Data Splitting on Performance of Machine Learning Models in Prediction of Shear Strength of Soil
    Nguyen, Quang Hung
    Ly, Hai-Bang
    Ho, Lanh Si
    Al-Ansari, Nadhir
    Le, Hiep Van
    Tran, Van Quan
    Prakash, Indra
    Pham, Binh Thai
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [25] Machine Learning Methods for Disease Prediction with Claims Data
    Christensen, Tanner
    Frandsen, Abraham
    Glazier, Seth
    Humpherys, Jeffrey
    Kartchner, David
    2018 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2018, : 467 - 471
  • [26] INFLUENCE OF SOME PHYSICAL CHARACTERISTICS OF MILKING MACHINE ON RATE OF NEW MASTITIS INFECTIONS
    THIEL, CC
    COUSINS, CL
    WESTGARTH, DR
    NEAVE, FK
    JOURNAL OF DAIRY RESEARCH, 1973, 40 (01) : 117 - 129
  • [27] A Study on the Prediction of Characteristics of Molding Sand Using Machine Learning and Data Preprocessing Techniques
    Lee, Jeong-Min
    Kim, Moon-Jo
    Choe, Kyeong-Hwan
    Kim, DongEung
    KOREAN JOURNAL OF METALS AND MATERIALS, 2023, 61 (01): : 18 - 27
  • [28] Chronic Diseases Prediction Using Machine Learning With Data Preprocessing Handling: A Critical Review
    Ghaniaviyanto Ramadhan, Nur
    Adiwijaya
    Maharani, Warih
    Akbar Gozali, Alfian
    IEEE ACCESS, 2024, 12 : 80698 - 80730
  • [29] Prediction models for postoperative recurrence of non-lactating mastitis based on machine learning
    Sun, Jiaye
    Shao, Shijun
    Wan, Hua
    Wu, Xueqing
    Feng, Jiamei
    Gao, Qingqian
    Qu, Wenchao
    Xie, Lu
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [30] Data Driven Natural Gas Spot Price Prediction Models Using Machine Learning Methods
    Su, Moting
    Zhang, Zongyi
    Zhu, Ye
    Zha, Donglan
    Wen, Wenying
    ENERGIES, 2019, 12 (09)