Implementation of a large-scale language model adaptation in a cloud environment

被引:0
|
作者
Kwang-Ho Kim
Dae-Young Jung
Donghyun Lee
Hyuk-Jun Lee
Sung-Yong Park
Myoung-Wan Koo
Ji-Hwan Kim
Jeong-sik Park
Hyung-Bae Jeon
Yun-Keun Lee
机构
[1] Sogang University,Department of Computer Science and Engineering
[2] Mokwon University,Department of Intelligent Robot Engineering
[3] Electronics and Telecommunications Research Institute,undefined
来源
关键词
Language model adaptation; Large-scale; MapReduce; Cloud;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents a system of large-scale language model adaptation for daily generated big-size text corpus using MapReduce in a cloud environment. Our large-scale trigram language model, consisting of 800 million trigram counts, was successfully implemented by a new approach using a representative cloud service (Amazon EC2), and a representative distributed processing framework (Hadoop). The ultimate goal of our research is to find the optimal number of Amazon EC2 instances in the LM adaptation under the time constraint that the daily-generated Twitter texts should be processed within 1 day. Trigram count extraction and model update for language model adaptation were performed for 200 million daily-generated Twitter texts. For trigram count extraction, we found that fewer than 3 h are required to process daily-generated Twitter texts when the number of instances is six. For model update, it was shown that fewer than 20 h are required to perform the model update when the number of instances is 10. Therefore, language model adaptation for daily generated 200 million Twitter texts can be successfully adapted within 24 h using at least 10 instances in Amazon EC2.
引用
收藏
页码:5029 / 5045
页数:16
相关论文
共 50 条
  • [1] Implementation of a large-scale language model adaptation in a cloud environment
    Kim, Kwang-Ho
    Jung, Dae-Young
    Lee, Donghyun
    Lee, Hyuk-Jun
    Park, Sung-Yong
    Koo, Myoung-Wan
    Kim, Ji-Hwan
    Park, Jeong-sik
    Jeon, Hyung-Bae
    Lee, Yun-Keun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (09) : 5029 - 5045
  • [2] A Novel Feature Extraction Model for Large-Scale Workload Prediction in Cloud Environment
    Shishira S.R.
    Kandasamy A.
    SN Computer Science, 2021, 2 (5)
  • [3] INTERACTION OF A CUMULUS CLOUD ENSEMBLE WITH LARGE-SCALE ENVIRONMENT
    SCHUBERT, W
    ARAKAWA, A
    BULLETIN OF THE AMERICAN METEOROLOGICAL SOCIETY, 1973, 54 (07) : 734 - 735
  • [4] LARGE-SCALE CONVERGENCE IN A NUMERICAL CLOUD MODEL
    CHANG, SW
    ORVILLE, HD
    JOURNAL OF THE ATMOSPHERIC SCIENCES, 1973, 30 (05) : 947 - 950
  • [5] INTERACTION OF A CUMULUS CLOUD ENSEMBLE WITH THE LARGE-SCALE ENVIRONMENT .4. THE DISCRETE MODEL
    LORD, SJ
    CHAO, WC
    ARAKAWA, A
    JOURNAL OF THE ATMOSPHERIC SCIENCES, 1982, 39 (01) : 104 - 113
  • [6] A Large-Scale Secure Image Retrieval Method in Cloud Environment
    Xu, Yanyan
    Zhao, Xiao
    Gong, Jiaying
    IEEE ACCESS, 2019, 7 : 160082 - 160090
  • [7] Romanization-based Large-scale Adaptation of Multilingual Language Models
    Purkayastha, Sukannya
    Ruder, Sebastian
    Pfeiffer, Jonas
    Gurevych, Iryna
    Vulic, Ivan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7996 - 8005
  • [8] Large-Scale Docking in the Cloud
    Tingle, Benjamin I.
    Irwin, John J.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (09) : 2735 - 2741
  • [9] A parallel and accurate method for large-scale image segmentation on a cloud environment
    Park, Gangmin
    Heo, Yong Seok
    Lee, Kisung
    Kwon, Hyuk-Yoon
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (03): : 4330 - 4357
  • [10] INTERACTION OF A CUMULUS CLOUD ENSEMBLE WITH LARGE-SCALE ENVIRONMENT .1.
    ARAKAWA, A
    SCHUBERT, WH
    JOURNAL OF THE ATMOSPHERIC SCIENCES, 1974, 31 (03) : 674 - 701