Data mining approach for dry bean seeds classification

被引:8
|
作者
Macuacua, Jaime Carlos [1 ]
Centeno, Jorge Antonio Silva [1 ]
Amisse, Caisse [2 ]
机构
[1] Univ Fed Parana, Geomat Dept, Postgrad Program Geodet Sci, Curitiba, Brazil
[2] Rovuma Univ, Nampula, Mozambique
来源
关键词
Data mining; Machine learning; Hyperparameter optimization; SMOTE technique; Dry bean seeds; SMOTE;
D O I
10.1016/j.atech.2023.100240
中图分类号
S2 [农业工程];
学科分类号
0828 ;
摘要
Product quality certification is an important process in agricultural production and productivity. Traditional methods for seed quality classification have shown limitations such as complex steps, low precision, and slow inspection for large production volumes. Automatic classification techniques based on machine learning and computer vision offer fast and high throughput solutions. Despite the major advances in state-of-the-art automatic classification models, there is still a need to improve these models by incorporating other techniques. In this article, we developed a computer vision system for the automatic classification of different seed varieties based on machine learning models, combined with data mining techniques using a set of features related to the geometry of bean seeds, extracted from binary images. Three machine learning techniques were compared, namely: Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN), including Principal Component Analysis (PCA), Hyperparameter tuning in machine learning algorithms, and dataset balancing based on Synthetic Minority Oversampling Technique (SMOTE). The results showed that data mining processes, such as Principal Component Analysis, Hyperparameter tuning, and application of the SMOTE technique, help to improve the quality of classification results. The KNN classifier showed better performance, with around 95% accuracy and 96% precision and recall. The best results were obtained applying hyperparameter tuning and the SMOTE technique, in the preprocessing step, obtaining an increase around 2.6%. The results proved that the combined use of data mining in the preprocessing step and machine learning classification methods can effectively and efficiently increase the classification accuracy and help automatic bean seed selection based on digital images. This can help small farmers and/or agricultural managers make decisions regarding seed selection to increase production.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] An Advance Boosting Approach for Multiclass Dry Bean Classification
    Nayak J.
    Dash P.B.
    Naik B.
    Journal of Engineering Science and Technology Review, 2023, 16 (02) : 107 - 115
  • [2] Clustering and classification for dry bean feature imbalanced data
    Lee, Chou-Yuan
    Wang, Wei
    Huang, Jian-Qiong
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [3] Agent Based Data Classification Approach for Data Mining
    Abu Bakar, Azuraliza
    Othman, Zulaiha Ali
    Hamdan, Abdul Razak
    Yusof, Rozianiwati
    Ismail, Ruhaizan
    INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 970 - 975
  • [4] Classification of Enterprise Portals: a data mining approach
    Elragal, Ahmed A.
    Abouseif, Heba George
    KNOWLEDGE MANAGEMENT AND INNOVATION: A BUSINESS COMPETITIVE EDGE PERSPECTIVE, VOLS 1-3, 2010, : 1287 - 1295
  • [5] A Novel Data Mining Approach for Soil Classification
    Shastry, K. Aditya
    Sanjay, H. A.
    Kavya, H.
    2014 PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2014), 2014, : 93 - 98
  • [6] A lazy data mining approach for protein classification
    Merschmann, Luiz
    Plastino, Alexandre
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2007, 6 (01) : 36 - 42
  • [7] Functional Properties of Select Dry Bean Seeds and Flours
    Gupta, Sahil
    Chhabra, Guneet S.
    Liu, Changqi
    Bakshi, Jasamrit S.
    Sathe, Shridhar K.
    JOURNAL OF FOOD SCIENCE, 2018, 83 (08) : 2052 - 2061
  • [8] Data mining approach for accelerating the classification accuracy of cardiotocography
    Potharaju, Sai Prasad
    Sreedevi, M.
    Ande, Vinay Kumar
    Tirandasu, Ravi Kumar
    CLINICAL EPIDEMIOLOGY AND GLOBAL HEALTH, 2019, 7 (02): : 160 - 164
  • [9] A Data Mining Approach to In Vivo Classification of Psychopharmacological Drugs
    Neri Kafkafi
    Daniel Yekutieli
    Greg I Elmer
    Neuropsychopharmacology, 2009, 34 : 607 - 623
  • [10] Data mining for AMD screening: A classification based approach
    Hijazi, Mohd Hanafi Ahmad
    Coenen, Frans
    Zheng, Yalin
    International Journal of Simulation: Systems, Science and Technology, 2014, 15 (02): : 57 - 69