Visual large language model for wheat disease diagnosis in the wild

被引:0
|
作者
Zhang, Kunpeng [1 ,2 ]
Ma, Li [1 ]
Cui, Beibei [1 ]
Li, Xin [1 ]
Zhang, Boqiang [3 ]
Xie, Na [4 ]
机构
[1] Henan Univ Technol, Coll Elect Engn, Zhengzhou 450001, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[3] Henan Univ Technol, Coll Mech Engn, Zhengzhou 450001, Peoples R China
[4] Cent Univ Finance & Econ, Sch Management Sci & Engn, Beijing 100081, Peoples R China
关键词
Plant disease; Wheat disease diagnosis; Wheat disease classification; Large language model; Explainable AI;
D O I
10.1016/j.compag.2024.109587
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Early detection of symptoms in wheat plants is crucial for mitigating disease effects and preventing their spread. Prompt phytosanitary treatment minimizes yield losses and enhances treatment efficacy. In recent years, numerous image analysis-based methodologies for automatic disease identification have been developed, with Convolutional Neural Networks (CNNs) achieving notable success in visual classification tasks. The existing methods often lack the necessary intelligence and reasoning for real-world applications. This study introduces an advanced wheat disease diagnosis approach using a Visual Language Model (VLM), named the Wheat Disease Language Model (WDLM). The WDLM first leverages the modified Segment Anything Model (SAM) to isolate key wheat features from complex wild environments. To enhance the logical reasoning abilities, the WDLM integrates a reasoning chain to generate clear, reasoned explanations for its diagnosis. By employing dedicated prompt engineering, this study establishes the Wheat Disease Semantic Dataset (WDSD) to fine-tune the VLM. The WDSD, which includes a diverse set of wheat images from various sources, bridges the gap between advanced VLM technology and wheat pathology. Tailored with task-specific data, the WDLM demonstrates superior intelligence by providing accurate classification of wheat diseases and suggesting potential treatment options. Compared to CNN-based models, Transformer-based models, and other VLMs, the WDLM shows improved performance in various scenarios. Integrated with mobile applications, the WDLM approach is readily applicable in the field, representing a promising advancement in the intelligent diagnosis of wheat diseases.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] FD-LLM: Large language model for fault diagnosis of complex equipment
    Lin, Lin
    Zhang, Sihao
    Fu, Song
    Liu, Yikun
    ADVANCED ENGINEERING INFORMATICS, 2025, 65
  • [22] Large-Scale Visual Language Model Boosted by Contrast Domain Adaptation for Intelligent Industrial Visual Monitoring
    Wang, Huan
    Li, Chenxi
    Li, Yan-Fu
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, : 14114 - 14123
  • [23] Zero-sample face retrieval combining large language model and visual base model for IoT
    Lu, Jin
    Chen, Meifen
    INTERNET TECHNOLOGY LETTERS, 2025, 8 (01)
  • [24] LaMoSC: Large Language Model-Driven Semantic Communication System for Visual Transmission
    Zhao, Yaru
    Yue, Yi
    Hou, Shoulu
    Cheng, Bo
    Huang, Yakun
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2024, 10 (06) : 2005 - 2018
  • [25] Vision-BioLLM: Large vision language model for visual dialogue in biomedical imagery
    Alshibli, Ahmad
    Bazi, Yakoub
    Rahhal, Mohamad Mahmoud Al
    Zuair, Mansour
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 103
  • [26] MiniMedGPT: Efficient Large Vision-Language Model for medical Visual Question Answering
    Alsabbagh, Abdel Rahman
    Mansour, Tariq
    Al-Kharabsheh, Mohammad
    Ebdah, Abdel Salam
    Al-Emaryeen, Roa'a
    Al-Nahhas, Sara
    Mahafza, Waleed
    Al-Kadi, Omar
    PATTERN RECOGNITION LETTERS, 2025, 189 : 8 - 16
  • [27] Large Language Models are Visual Reasoning Coordinators
    Chen, Liangyu
    Li, Bo
    Shen, Sheng
    Yang, Jingkang
    Li, Chunyuan
    Keutzer, Kurt
    Darrell, Trevor
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] Visual cognition in multimodal large language models
    Buschoff, Luca M. Schulze
    Akata, Elif
    Bethge, Matthias
    Schulz, Eric
    NATURE MACHINE INTELLIGENCE, 2025, 7 (01) : 96 - 106
  • [29] A LANGUAGE MODEL FOR DIAGNOSIS AND THERAPY
    KLOSTERJENSEN, M
    SPRACHE-STIMME-GEHOR, 1981, 5 (02): : 65 - 68
  • [30] Large language model in electrocatalysis
    Zhang, Chengyi
    Wang, Xingyu
    Wang, Ziyun
    CHINESE JOURNAL OF CATALYSIS, 2024, 59 : 7 - 14