Variational Autoencoder Model Combining Deep Learning and Probability Statistics and Its Application in Large-scale Data Analysis

被引:0
|
作者
Zou, Lingguo [1 ]
Zhang, Meihua [2 ]
机构
[1] School of Public Education, Xiamen Ocean Vocational College, Xiamen,361009, China
[2] College of General Education, Xiamen Huatian International Vocation Institute, Xiamen,361102, China
来源
Informatica (Slovenia) | 2024年 / 48卷 / 22期
关键词
Large datasets;
D O I
10.31449/inf.v48i22.6921
中图分类号
学科分类号
摘要
A multi-layer generative model is proposed as a means of enhancing the accuracy of large-scale data analysis. This model addresses the problem of limited feature extraction capability and insufficient association with label information in existing topic models. The model is divided into three main modules: text encoding, autoencoder inference, and layer-by-layer learning. The model combines a hierarchical Bayesian model with a deterministic upward random downward network structure. It uses a Poisson Gamma Belief Network as a decoder to capture hierarchical implicit features in text data during text encoding, autoencoder inference, and layer-by-layer learning. Random Gradient Monte Carlo sampling is used for posterior inference to improve the model efficiency. In addition, the Fisher information matrix is used to adaptively adjust the learning rate of different levels and topic parameters, and a layer-by-layer learning strategy is introduced to construct a learning network. Based on this, text data and label information are combined for feature extraction. The results demonstrated that the test error rates of the designed model on the 20News, RCV1, and IMDB datasets were 16.52%, 18.72%, and 11.67%, respectively, all of which were the lowest. Additionally, the testing time was the shortest, at 0.020s, 0.017s, and 0.015s, respectively, indicating a high level of accuracy and efficiency. In addition, the perplexity levels on the 20News, RCV1, and Wiki datasets were 590.23, 953.12, and 982.67, respectively, significantly lower than those of other comparison models. Given this, the designed model has high data analysis and interpretation capabilities and relatively high computational efficiency, which can provide scientific tools for accurately analyzing large-scale data in batches. © 2024 Slovene Society Informatika. All rights reserved.
引用
收藏
页码:31 / 46
相关论文
共 50 条
  • [1] Deep learning for the large-scale cancer data analysis
    Tsuji, Shingo
    Aburatani, Hiroyuki
    CANCER RESEARCH, 2015, 75 (22)
  • [2] Application of a Deep Learning Approach to Analyze Large-Scale MRI Data of the Spine
    Streckenbach, Felix
    Leifert, Gundram
    Beyer, Thomas
    Mesanovic, Anita
    Waescher, Hanna
    Cantre, Daniel
    Langner, Sonke
    Weber, Marc-Andre
    Lindner, Tobias
    HEALTHCARE, 2022, 10 (11)
  • [3] A Novel Pruning Model of Deep Learning for Large-Scale Distributed Data Processing
    Sheng, Yiqiang
    Li, Chaopeng
    Wang, Jinlin
    Deng, Haojiang
    Zhao, Zhenyu
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 314 - 319
  • [4] Deep clustering of small molecules at large-scale via variational autoencoder embedding and K-means
    Hamid Hadipour
    Chengyou Liu
    Rebecca Davis
    Silvia T. Cardona
    Pingzhao Hu
    BMC Bioinformatics, 23
  • [5] Deep clustering of small molecules at large-scale via variational autoencoder embedding and K-means
    Hadipour, Hamid
    Liu, Chengyou
    Davis, Rebecca
    Cardona, Silvia T.
    Hu, Pingzhao
    BMC BIOINFORMATICS, 2022, 23 (SUPPL 4)
  • [6] Deep Learning Application for Reconstruction of Large-Scale Structure of the Universe
    Moriwaki, Kana
    BIG-DATA-ANALYTICS IN ASTRONOMY, SCIENCE, AND ENGINEERING, BDA 2021, 2022, 13167 : 73 - 82
  • [7] Deep Learning-Based Sentimental Analysis for Large-Scale Imbalanced Twitter Data
    Jamal, Nasir
    Chen, Xianqiao
    Aldabbas, Hamza
    FUTURE INTERNET, 2019, 11 (09)
  • [8] Large-Scale Integrative Analysis of Soybean Transcriptome Using an Unsupervised Autoencoder Model
    Su, Lingtao
    Xu, Chunhui
    Zeng, Shuai
    Su, Li
    Joshi, Trupti
    Stacey, Gary
    Xu, Dong
    FRONTIERS IN PLANT SCIENCE, 2022, 13
  • [9] Learning to Train and to Explain a Deep Survival Model with Large-Scale Ovarian Cancer Transcriptomic Data
    Menand, Elena Spirina
    De Vries-Brilland, Manon
    Tessier, Leslie
    Dauve, Jonathan
    Campone, Mario
    Verriele, Veronique
    Jrad, Nisrine
    Marion, Jean-Marie
    Chauvet, Pierre
    Passot, Christophe
    Morel, Alain
    BIOMEDICINES, 2024, 12 (12)
  • [10] Alleviating Load Imbalance in Data Processing for Large-Scale Deep Learning
    Pumma, Sarunya
    Buono, Daniele
    Checconi, Fabio
    Que, Xinyu
    Feng, Wu-chun
    2020 20TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2020), 2020, : 262 - 271