Design and Implementation of Word2Vec Parallel Algorithm Based on HPC

被引:0
|
作者
Yi, Xianyong [1 ]
Zheng, Rongge [1 ]
Wang, Aoyu [1 ]
Qin, Hao [1 ]
Chen, Yufeng [1 ]
机构
[1] Shandong Univ, Sch Mech Elect & Informat Engn, Weihai, Weihai, Peoples R China
关键词
HPC; Word2Vec; Parallel Algorithm; Natural Language Processing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Word2Vec, (Word to Vector) processes natural language by calculating the cosine similarity. However, the serial algorithm of original Word2Vec fails to satisfy the demands of training of corpus text because of the explosive growth of information. It has become the bottleneck owing to its comparatively low processing efficiency. The High Performance Computing (HPC) specializes in improving the calculation efficiency; therefore, the training efficiency of corpus texts can be greatly improved by parallelizing Word2Vec algorithm. After analyzing the characteristics of the Word2Vec algorithm in detail, we design and implement a parallel Word2Vec algorithm and use it to train corpus text on HPC. Furthermore, the corpus texts of different sizes are collected and trained, and the speed-up ratio is calculated by using the serial algorithm and parallel algorithm of Word2Vec, respectively. The experimental results show that there is a higher speed-up ratio when using the Word2Vec parallel algorithm running on HPC.
引用
收藏
页码:585 / 590
页数:6
相关论文
共 50 条
  • [31] Clustering of banned food additives based on Word2vec
    Zhang, Yipeng
    Li, Xiaoli
    Wang, Kang
    Li, Yang
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 3467 - 3471
  • [32] Word2vec and Clustering based Twitter Sentiment Analysis
    Coban, Onder
    Ozyer, Gulsah Tumuklu
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [33] Duplicate Short Text Detection Based on Word2vec
    Gao, Jin
    He, Yahao
    Zhang, Xiaoyan
    Xia, Yamei
    PROCEEDINGS OF 2017 8TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2017), 2017, : 33 - 37
  • [34] Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model
    TANG Huanling
    ZHU Hui
    WEI Hongmin
    ZHENG Han
    MAO Xueli
    LU Mingyu
    GUO Jin
    ChineseJournalofElectronics, 2023, 32 (03) : 647 - 654
  • [35] Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model
    Tang Huanling
    Zhu Hui
    Wei Hongmin
    Zheng Han
    Mao Xueli
    Lu Mingyu
    Guo Jin
    CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (03) : 647 - 654
  • [36] ExMrec2vec: Explainable Movie Recommender System based on Word2vec
    Samih, Amina
    Ghadi, Abderrahim
    Fennan, Abdelhadi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 653 - 660
  • [37] Scaling Word2Vec on Big Corpus
    Bofang Li
    Aleksandr Drozd
    Yuhe Guo
    Tao Liu
    Satoshi Matsuoka
    Xiaoyong Du
    Data Science and Engineering, 2019, 4 : 157 - 175
  • [38] Scaling Word2Vec on Big Corpus
    Li, Bofang
    Drozd, Aleksandr
    Guo, Yuhe
    Liu, Tao
    Matsuoka, Satoshi
    Du, Xiaoyong
    DATA SCIENCE AND ENGINEERING, 2019, 4 (02) : 157 - 175
  • [39] Application of Word2vec in Phoneme Recognition
    Feng, Xin
    Wang, Lei
    ICMLC 2020: 2020 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2018, : 495 - 499
  • [40] Considerations about learning Word2Vec
    Giovanni Di Gennaro
    Amedeo Buonanno
    Francesco A. N. Palmieri
    The Journal of Supercomputing, 2021, 77 : 12320 - 12335