VEG-MMKG: Multimodal knowledge graph construction for vegetables based on pre-trained model extraction

被引:0
|
作者
Lv, Bowen [1 ,2 ,3 ,4 ]
Wu, Huarui [1 ,3 ,4 ]
Chen, Wenbai [2 ]
Chen, Cheng [1 ]
Miao, Yisheng [1 ,3 ,4 ]
Zhao, Chunjiang [1 ]
机构
[1] Natl Engn Res Ctr Informat Technol Agr, Beijing 100097, Peoples R China
[2] Beijing Informat Sci & Technol Univ, Sch Automat, Beijing 100192, Peoples R China
[3] Beijing Acad Agr & Forestry Sci, Informat Technol Res Ctr, Beijing 100097, Peoples R China
[4] Minist Agr & Rural Affairs, Key Lab Digital Village Technol, Beijing 100097, Peoples R China
关键词
Knowledge graph; Multimodal fusion; Image-text pairs; Pre-trained model;
D O I
10.1016/j.compag.2024.109398
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Knowledge graph technology is of great significance to modern agricultural information management and datadriven decision support. However, agricultural knowledge is rich in types, and agricultural knowledge graph databases built only based on text are not conducive to users' intuitive perception and comprehensive understanding of knowledge. In view of this, this paper proposes a solution to extract knowledge and construct an agricultural multimodal knowledge graph using a pre-trained language model. This paper takes two plants, cabbage and corn, as research objects. First, a text-image collaborative representation learning method with a two-stream structure is adopted to combine the image modal information of vegetables with the text modal information, and the correlation and complementarity between the two types of information are used to achieve entity alignment. In addition, in order to solve the problem of high similarity of vegetable entities in small categories, a cross-modal fine-grained contrastive learning method is introduced, and the problem of insufficient semantic association between modalities is solved by contrastive learning of vocabulary and small areas of images. Finally, a visual multimodal knowledge graph user interface is constructed using the results of image and text matching. Experimental results show that the image and text matching efficiency of the fine-tuned pretrained model on the vegetable dataset is 76.7%, and appropriate images can be matched for text entities. The constructed visual multimodal knowledge graph database allows users to query and filter knowledge according to their needs, providing assistance for subsequent research on various applications in specific fields such as multimodal agricultural intelligent question and answer, crop pest and disease identification, and agricultural product recommendations.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Simple and Effective Multimodal Learning Based on Pre-Trained Transformer Models
    Miyazawa, Kazuki
    Kyuragi, Yuta
    Nagai, Takayuki
    IEEE ACCESS, 2022, 10 : 29821 - 29833
  • [32] Development of a baseline model for MAX/MXene synthesis recipes extraction via pre-trained model with domain knowledge
    Zhao, Meiting
    Wu, Erxiao
    Li, Dongyang
    Luo, Junfei
    Zhang, Xin
    Wang, Zhuquan
    Huang, Qing
    Du, Shiyu
    Zhang, Yiming
    JOURNAL OF MATERIALS RESEARCH AND TECHNOLOGY-JMR&T, 2023, 22 : 2262 - 2274
  • [33] SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model
    Sun, Yi
    Qiu, Hangping
    Zheng, Yu
    Wang, Zhongwei
    Zhang, Chaoran
    IEEE ACCESS, 2020, 8 : 10896 - 10906
  • [34] Knowledge Graph Completion Using a Pre-Trained Language Model Based on Categorical Information and Multi-Layer Residual Attention
    Rao, Qiang
    Wang, Tiejun
    Guo, Xiaoran
    Wang, Kaijie
    Yan, Yue
    APPLIED SCIENCES-BASEL, 2024, 14 (11):
  • [35] WebKE: Knowledge Extraction from Semi-structured Web with Pre-trained Markup Language Model
    Xie, Chenhao
    Huang, Wenhao
    Liang, Jiaqing
    Huang, Chengsong
    Xiao, Yanghua
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2211 - 2220
  • [36] Pre-trained Model Based Feature Envy Detection
    Ma, Wenhao
    Yu, Yaoxiang
    Ruan, Xiaoming
    Cai, Bo
    2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 430 - 440
  • [37] Event Evolution Analysis of Network Text Based on Pre-trained Language Model and Event Graph
    Yang, Jinshun
    Huang, Shuangxi
    Huang, Mingfeng
    COOPERATIVE DESIGN, VISUALIZATION, AND ENGINEERING, CDVE 2024, 2024, 15158 : 52 - 62
  • [38] Vision Enhanced Generative Pre-trained Language Model for Multimodal Sentence Summarization
    Jing, Liqiang
    Li, Yiren
    Xu, Junhao
    Yu, Yongcan
    Shen, Pei
    Song, Xuemeng
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 289 - 298
  • [39] Robustly Pre-Trained Neural Model for Direct Temporal Relation Extraction
    Guan, Hong
    Li, Jianfu
    Xu, Hua
    Devarakonda, Murthy
    2021 IEEE 9TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2021), 2021, : 501 - 502
  • [40] An empirical study of pre-trained language models in simple knowledge graph question answering
    Hu, Nan
    Wu, Yike
    Qi, Guilin
    Min, Dehai
    Chen, Jiaoyan
    Pan, Jeff Z.
    Ali, Zafar
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (05): : 2855 - 2886