Implementation of a large-scale language model adaptation in a cloud environment

被引:0
|
作者
Kwang-Ho Kim
Dae-Young Jung
Donghyun Lee
Hyuk-Jun Lee
Sung-Yong Park
Myoung-Wan Koo
Ji-Hwan Kim
Jeong-sik Park
Hyung-Bae Jeon
Yun-Keun Lee
机构
[1] Sogang University,Department of Computer Science and Engineering
[2] Mokwon University,Department of Intelligent Robot Engineering
[3] Electronics and Telecommunications Research Institute,undefined
来源
关键词
Language model adaptation; Large-scale; MapReduce; Cloud;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents a system of large-scale language model adaptation for daily generated big-size text corpus using MapReduce in a cloud environment. Our large-scale trigram language model, consisting of 800 million trigram counts, was successfully implemented by a new approach using a representative cloud service (Amazon EC2), and a representative distributed processing framework (Hadoop). The ultimate goal of our research is to find the optimal number of Amazon EC2 instances in the LM adaptation under the time constraint that the daily-generated Twitter texts should be processed within 1 day. Trigram count extraction and model update for language model adaptation were performed for 200 million daily-generated Twitter texts. For trigram count extraction, we found that fewer than 3 h are required to process daily-generated Twitter texts when the number of instances is six. For model update, it was shown that fewer than 20 h are required to perform the model update when the number of instances is 10. Therefore, language model adaptation for daily generated 200 million Twitter texts can be successfully adapted within 24 h using at least 10 instances in Amazon EC2.
引用
收藏
页码:5029 / 5045
页数:16
相关论文
共 50 条
  • [21] Design and implementation of a hybrid cloud system for large-scale human genomic research
    Masao Nagasaki
    Yayoi Sekiya
    Akihiro Asakura
    Ryo Teraoka
    Ryoko Otokozawa
    Hiroki Hashimoto
    Takahisa Kawaguchi
    Keiichiro Fukazawa
    Yuichi Inadomi
    Ken T. Murata
    Yasuyuki Ohkawa
    Izumi Yamaguchi
    Takamichi Mizuhara
    Katsushi Tokunaga
    Yuji Sekiya
    Toshihiro Hanawa
    Ryo Yamada
    Fumihiko Matsuda
    Human Genome Variation, 10
  • [22] Design and implementation of a hybrid cloud system for large-scale human genomic research
    Nagasaki, Masao
    Sekiya, Yayoi
    Asakura, Akihiro
    Teraoka, Ryo
    Otokozawa, Ryoko
    Hashimoto, Hiroki
    Kawaguchi, Takahisa
    Fukazawa, Keiichiro
    Inadomi, Yuichi
    Murata, Ken T. T.
    Ohkawa, Yasuyuki
    Yamaguchi, Izumi
    Mizuhara, Takamichi
    Tokunaga, Katsushi
    Sekiya, Yuji
    Hanawa, Toshihiro
    Yamada, Ryo
    Matsuda, Fumihiko
    HUMAN GENOME VARIATION, 2023, 10 (01)
  • [23] PERCEPTION OF LARGE-SCALE ENVIRONMENT
    ITTELSON, WH
    TRANSACTIONS OF THE NEW YORK ACADEMY OF SCIENCES, 1970, 32 (07): : 807 - &
  • [24] Diagnosis of cirrus cloud occurrence using large-scale analysis data and a cloud-scale model
    Cautenet, G
    Gbe, D
    ANNALES GEOPHYSICAE-ATMOSPHERES HYDROSPHERES AND SPACE SCIENCES, 1996, 14 (07): : 753 - 766
  • [25] Efficient Data Delivery Scheme for Large-Scale Microservices in Distributed Cloud Environment
    Pham, Van-Nam
    Hossain, Md. Delowar
    Lee, Ga-Won
    Huh, Eui-Nam
    APPLIED SCIENCES-BASEL, 2023, 13 (02):
  • [26] Cloud Clusters and Tropical Cyclogenesis: Developing and Nondeveloping Systems and Their Large-Scale Environment
    Kerns, Brandon W.
    Chen, Shuyi S.
    MONTHLY WEATHER REVIEW, 2013, 141 (01) : 192 - 210
  • [27] The Relationship of Cloud Number and Size With Their Large-Scale Environment in Deep Tropical Convection
    Louf, Valentin
    Jakob, Christian
    Protat, Alain
    Bergernann, Martin
    Narsey, Sugata
    GEOPHYSICAL RESEARCH LETTERS, 2019, 46 (15) : 9203 - 9212
  • [28] An Analysis of Failure-Related Energy Waste in a Large-Scale Cloud Environment
    Garraghan, Peter
    Moreno, Ismael Solis
    Townend, Paul
    Xu, Jie
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (02) : 166 - 180
  • [29] TAG: A Neural Network Model for Large-Scale Optical Implementation
    Lee, Hyuek-Jae
    Lee, Soo-Young
    Shin, Sang-Yung
    Koh, Bo-Yun
    NEURAL COMPUTATION, 1991, 3 (01) : 135 - 143
  • [30] Testing large-scale cloud management
    Citron, D.
    Zlotnick, A.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2011, 55 (06)