Development of a Large-scale Korean Language Model in the Field of Geosciences

被引:0
|
作者
Lee, Sang-ho [1 ]
机构
[1] Korea Inst Geosci & Mineral Resources, Mineral Resources Div, Daejeon 34132, South Korea
来源
ECONOMIC AND ENVIRONMENTAL GEOLOGY | 2024年 / 57卷 / 05期
关键词
large language model; generative model; natural language processing; artificial intelligence; geoscience;
D O I
10.9719/EEG.2024.57.5.539
中图分类号
P5 [地质学];
学科分类号
0709 ; 081803 ;
摘要
With the rapid development and commercialization of large-scale generative language models, concerns regarding the appropriateness of model outputs, expertise, and data security have been emerged. In particular, Korean generative language models specialized in the field of geoscience have not yet been studied due to difficulties in data processing, preprocessing and a lack of development cases. This study conducted the entire process for developing a Korean language model specialized in the field of geoscience and evaluated its applicability in related fields. To achieve this, academic data related to geoscience were collected and preprocessed to create a dataset suitable for the training of the language model. The dataset was applied to the Llama2 model for the training. The trained model was quantitatively evaluated using 19 different evaluation datasets from various fields. The results demonstrated improved functionalities related to scientific question-answering and Korean text interpretation compared to the original model. The language model developed through this study can potentially enhance research productivity in the field of geoscience, offering benefits such as idea generation. The outcomes of this study are expected to stimulate further research and the utilization of generative language models in geoscience in the future.
引用
收藏
页码:539 / 550
页数:12
相关论文
共 50 条
  • [21] Model-driven development of large-scale Web applications
    Tai, H
    Mitsui, K
    Nerome, T
    Abe, M
    Ono, K
    Hori, M
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2004, 48 (5-6) : 797 - 809
  • [22] DEVELOPMENT AND VALIDATION OF A COMPREHENSIVE MODEL OF LARGE-SCALE PRODUCTION OF MICROALGAE
    HILL, DT
    LINCOLN, EP
    AGRICULTURAL WASTES, 1981, 3 (01): : 43 - 64
  • [23] A quality cost reduction model for large-scale software development
    Tihana Galinac Grbac
    Željka Car
    Darko Huljenić
    Software Quality Journal, 2015, 23 : 363 - 390
  • [24] Development and calibration of a large-scale microscopic traffic simulation model
    Jha, M
    Gopalan, G
    Garms, A
    Mahanti, BP
    Toledo, T
    Ben-Akiva, ME
    CALIBRATION AND VALIDATION OF SIMULATION MODELS 2004, 2004, (1876): : 121 - 131
  • [25] KoDF: A Large-scale Korean DeepFake Detection Dataset
    Kwon, Patrick
    You, Jaeseong
    Nam, Gyuhyeon
    Park, Sungwoo
    Chae, Gyeongsu
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10724 - 10733
  • [26] Development and application of a large-scale disaster preparedness system in the perinatal field
    Tsuda, Naotake
    Unnno, Nobuya
    Nishigaya, Yoshiko
    Sugawara, Junichi
    Nakai, Akihito
    Kimura, Tadashi
    INTERNATIONAL JOURNAL OF GYNECOLOGY & OBSTETRICS, 2023, 162 (01) : 333 - 338
  • [27] On a large-scale model of the Universe
    Grigoryan, SS
    DOKLADY PHYSICS, 2002, 47 (10) : 731 - 734
  • [28] On large-scale model of universe
    Grigorian, S.S.
    Doklady Akademii Nauk, 2002, 386 (04) : 471 - 475
  • [29] On a large-scale model of the universe
    S. S. Grigoryan
    Doklady Physics, 2002, 47 : 731 - 734
  • [30] Toward the Development of Large-Scale Word Embedding for Low-Resourced Language
    Nazir, Shahzad
    Asif, Muhammad
    Sahi, Shahbaz Ahmad
    Ahmad, Shahbaz
    Ghadi, Yazeed Yasin
    Aziz, Muhammad Haris
    IEEE ACCESS, 2022, 10 : 54091 - 54097