Scalable disk-based topic modeling for memory limited devices

被引:2
|
作者
Kim, Byungju [1 ]
Lee, Dongha [1 ]
Oh, Jinoh [2 ]
Yu, Hwanjo [1 ]
机构
[1] Pohang Univ Sci & Technol, Pohang, South Korea
[2] Adobe Syst Inc, San Jose, CA USA
基金
新加坡国家研究基金会;
关键词
Latent Dirichlet Allocation; Parallel algorithm; Disk-based algorithm; ALGORITHM;
D O I
10.1016/j.ins.2019.12.058
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Disk-based algorithms have the ability to process large-scale data which do not fit into the memory, so they provide good scalability to a mobile device with limited memory resources. In general, the speed of disk I/O is much slower than that of memory access, the total amount of disk I/O is the most crucial factor which determines the efficiency of disk-based algorithms. This paper proposes BlockLDA, an efficient disk-based Latent Dirichlet Allocation (LDA) inference algorithm which can efficiently infer an LDA model when both of the data and model do not fit into the memory. BlockLDA manages the data and model as a set of small blocks so that it can support efficient disk I/O as well as process the LDA inference in a block-wise manner. In addition, it utilizes advanced techniques which help to minimize the amount of disk I/O, including 1) a space reduction algorithm to dynamically manage the block-wise model considering its changing sparsity and 2) a local scheduling algorithm to carefully select the next data blocks so that the number of page faults is minimized. Our experimental results demonstrate that BlockLDA shows better scalability and efficiency than its disk-based and in-memory competitors under the memory-limited environment. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:353 / 369
页数:17
相关论文
共 50 条
  • [1] Disk-based Matrix Completion for Memory Limited Devices
    Lee, Dongha
    Oh, Jinoh
    Faloutsos, Christos
    Kim, Byungju
    Yu, Hwanjo
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1093 - 1102
  • [2] Disk-based storage for scalable video
    Chang, E
    Zakhor, A
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1997, 7 (05) : 758 - 770
  • [3] Disk-based storage for scalable video
    Chang, E
    Zakhor, A
    MULTIMEDIA COMPUTING AND NETWORKING 1997, 1997, 3020 : 156 - 168
  • [4] SSDMiner: A Scalable and Fast Disk-Based Frequent Pattern Miner
    Chon, Kang-Wook
    Kim, Min-Soo
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON EMERGING DATABASES: TECHNOLOGIES, APPLICATIONS, AND THEORY, 2018, 461 : 99 - 110
  • [5] RepMaestro: scalable repeat detection on disk-based genome sequences
    Askitis, Nikolas
    Sinha, Ranjan
    BIOINFORMATICS, 2010, 26 (19) : 2368 - 2374
  • [6] THE DISK-BASED BIBLE
    EDWARDS, J
    POPULAR COMPUTING, 1983, 2 (04): : 128 - &
  • [7] A Disk-Based Index for Trajectories with an In-Memory Compressed Cache
    Campos, Daniela
    Gomoz-Brandon, Adrian
    Navarro, Gonzalo
    2021 DATA COMPRESSION CONFERENCE (DCC 2021), 2021, : 340 - 340
  • [8] Disk-based VLBI recording
    Whitney, A
    Future Directions in High Resolution Astronomy: The 10th Anniversary of the VLBA, 2005, 340 : 588 - 594
  • [9] DISK-BASED DATABASES - TIME FOR REFLECTION
    JOYCE, J
    DATA PROCESSING, 1979, 21 (04): : 32 - 34
  • [10] Disk-based recording applications for the VLBA
    Romney, JD
    NEW TECHNOLOGIES IN VLBI, 2003, 306 : 153 - 160