Implementation of a large-scale language model adaptation in a cloud environment

被引:0
|
作者
Kwang-Ho Kim
Dae-Young Jung
Donghyun Lee
Hyuk-Jun Lee
Sung-Yong Park
Myoung-Wan Koo
Ji-Hwan Kim
Jeong-sik Park
Hyung-Bae Jeon
Yun-Keun Lee
机构
[1] Sogang University,Department of Computer Science and Engineering
[2] Mokwon University,Department of Intelligent Robot Engineering
[3] Electronics and Telecommunications Research Institute,undefined
来源
关键词
Language model adaptation; Large-scale; MapReduce; Cloud;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents a system of large-scale language model adaptation for daily generated big-size text corpus using MapReduce in a cloud environment. Our large-scale trigram language model, consisting of 800 million trigram counts, was successfully implemented by a new approach using a representative cloud service (Amazon EC2), and a representative distributed processing framework (Hadoop). The ultimate goal of our research is to find the optimal number of Amazon EC2 instances in the LM adaptation under the time constraint that the daily-generated Twitter texts should be processed within 1 day. Trigram count extraction and model update for language model adaptation were performed for 200 million daily-generated Twitter texts. For trigram count extraction, we found that fewer than 3 h are required to process daily-generated Twitter texts when the number of instances is six. For model update, it was shown that fewer than 20 h are required to perform the model update when the number of instances is 10. Therefore, language model adaptation for daily generated 200 million Twitter texts can be successfully adapted within 24 h using at least 10 instances in Amazon EC2.
引用
收藏
页码:5029 / 5045
页数:16
相关论文
共 50 条
  • [41] Leading change in the large-scale implementation of a service integration model in Quebec
    Couturier, Yves
    INTERNATIONAL JOURNAL OF INTEGRATED CARE, 2019, 19
  • [42] Deep Context: A Neural Language Model for Large-scale Networked Documents
    Wu, Hao
    Lerman, Kristina
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3091 - 3097
  • [43] Learning in a large-scale pervasive environment
    Barbosa, BNF
    Yamim, AC
    Augustin, I
    da Silva, LC
    Geyer, CFR
    Barbosa, JLV
    FOURTH ANNUAL IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS, PROCEEDINGS, 2006, : 226 - +
  • [44] PRODUCTION PLANNING IN A LARGE-SCALE ENVIRONMENT
    ASHFORD, HM
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 1985, 36 (12) : 1157 - 1157
  • [45] QUERY-BASED COMPOSITION FOR LARGE-SCALE LANGUAGE MODEL IN LVCSR
    Han, Yang
    Zhang, Chenwei
    Li, Xiangang
    Liu, Yi
    Wu, Xihong
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [46] An efficient radix trie-based semantic visual indexing model for large-scale image retrieval in cloud environment
    Krishnaraj, N.
    Elhoseny, Mohamed
    Lydia, E. Laxmi
    Shankar, K.
    ALDabbas, Omar
    SOFTWARE-PRACTICE & EXPERIENCE, 2021, 51 (03): : 489 - 502
  • [48] Enabling large-scale ligand discovery on the cloud
    Hawkins, Paul
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 251
  • [49] Anomaly Detection in a Large-scale Cloud Platform
    Islam, Mohammad S.
    Pourmajidi, William
    Zhang, Lei
    Steinbacher, John
    Erwin, Tony
    Miranskyy, Andriy
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2021), 2021, : 150 - 159
  • [50] Enabling Large-Scale Biomedical Analysis in the Cloud
    Lin, Ying-Chih
    Yu, Chin-Sheng
    Lin, Yen-Jen
    BIOMED RESEARCH INTERNATIONAL, 2013, 2013