Scalable disk-based topic modeling for memory limited devices

被引:2
|
作者
Kim, Byungju [1 ]
Lee, Dongha [1 ]
Oh, Jinoh [2 ]
Yu, Hwanjo [1 ]
机构
[1] Pohang Univ Sci & Technol, Pohang, South Korea
[2] Adobe Syst Inc, San Jose, CA USA
基金
新加坡国家研究基金会;
关键词
Latent Dirichlet Allocation; Parallel algorithm; Disk-based algorithm; ALGORITHM;
D O I
10.1016/j.ins.2019.12.058
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Disk-based algorithms have the ability to process large-scale data which do not fit into the memory, so they provide good scalability to a mobile device with limited memory resources. In general, the speed of disk I/O is much slower than that of memory access, the total amount of disk I/O is the most crucial factor which determines the efficiency of disk-based algorithms. This paper proposes BlockLDA, an efficient disk-based Latent Dirichlet Allocation (LDA) inference algorithm which can efficiently infer an LDA model when both of the data and model do not fit into the memory. BlockLDA manages the data and model as a set of small blocks so that it can support efficient disk I/O as well as process the LDA inference in a block-wise manner. In addition, it utilizes advanced techniques which help to minimize the amount of disk I/O, including 1) a space reduction algorithm to dynamically manage the block-wise model considering its changing sparsity and 2) a local scheduling algorithm to carefully select the next data blocks so that the number of page faults is minimized. Our experimental results demonstrate that BlockLDA shows better scalability and efficiency than its disk-based and in-memory competitors under the memory-limited environment. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:353 / 369
页数:17
相关论文
共 50 条
  • [21] DISK-BASED MULTIPLE TERMINAL DISPLAY SYSTEM
    HLADY, AM
    SID JOURNAL, 1973, 10 (06): : 9 - &
  • [22] A cache optimized multidimensional index in disk-based environments
    Park, M
    Lee, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (08) : 1932 - 1939
  • [23] DISK-BASED GC-MS-COMPUTER SYSTEMS
    KARASEK, FW
    RESEARCH-DEVELOPMENT, 1974, 25 (09): : 42 - &
  • [24] Efficient layout transformation for disk-based multidimensional arrays
    Krishnamoorthy, S
    Baumgartner, G
    Lam, CC
    Nieplocha, J
    Sadayappan, P
    HIGH PERFORMANCE COMPUTING - HIPC 2004, 2004, 3296 : 386 - 398
  • [25] Disk-based k-mer counting on a PC
    Deorowicz, Sebastian
    Debudaj-Grabysz, Agnieszka
    Grabowski, Szymon
    BMC BIOINFORMATICS, 2013, 14
  • [26] Modeling and experimentation for novel aerofoil embedded mesh disk-based partially submerged rotating reactor
    Saini, Sunil K.
    CHEMICAL ENGINEERING JOURNAL ADVANCES, 2022, 12
  • [27] DISK-BASED PROGRAM SWAPPING IN 8080-BASED MICROCOMPUTERS
    NEUMANN, PG
    BEHAVIOR RESEARCH METHODS & INSTRUMENTATION, 1979, 11 (05): : 512 - 518
  • [28] The application of disk-based video sewers in news and sports
    Shaw, J
    SMPTE JOURNAL, 1999, 108 (02): : 109 - 112
  • [29] A fast data structure for disk-based audio editing
    Mazzoni, D
    Dannenberg, RB
    COMPUTER MUSIC JOURNAL, 2002, 26 (02) : 62 - 76
  • [30] A DISK-BASED STORAGE ARCHITECTURE FOR MOVIE ON DEMAND SERVERS
    OZDEN, B
    BILIRIS, A
    RASTOGI, R
    SILBERSCHATZ, A
    INFORMATION SYSTEMS, 1995, 20 (06) : 465 - 482