Disk-based Matrix Completion for Memory Limited Devices

被引:4
|
作者
Lee, Dongha [1 ]
Oh, Jinoh [2 ]
Faloutsos, Christos [3 ]
Kim, Byungju [1 ]
Yu, Hwanjo [1 ]
机构
[1] Pohang Univ Sci & Technol, Pohang, South Korea
[2] Adobe Syst Inc, San Jose, CA USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Matrix completion; Stochastic gradient descent; Data management;
D O I
10.1145/3269206.3271685
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
More and more data need to be processed or analyzed within mobile devices for efficiency or privacy reasons, but performing machine learning tasks with large data within the devices is challenging because of their limited memory resources. For this reason, disk-based machine learning methods have been actively researched, which utilize storage resources without holding all the data in memory. This paper proposes D-MC2, a novel disk-based matrix completion method that (1) supports incremental data update (i.e., data insertion and deletion) and (2) spills both data and model to disk when necessary; these functionalities are not supported by existing methods. First, D-MC2 builds a two-layered index to efficiently support incremental data update; there exists a trade-off relationship between model learning and data update costs, and our two-layered index simultaneously optimizes the two costs. Second, we develop a window-based stochastic gradient descent (SGD) scheduler to efficiently support the dual spilling; a huge amount of disk I/O is incurred when the size of model is larger than that of memory, and our new scheduler substantially reduces it. Our evaluation results show that D-MC2 is significantly more scalable and faster than other disk-based competitors under the limited memory environment. In terms of the co-optimization, DMC2 outperforms the baselines that only optimize one of the two costs up to 48x. Furthermore, the window-based scheduler improves the training speed 12.4x faster compared to a naive scheduler.
引用
收藏
页码:1093 / 1102
页数:10
相关论文
共 50 条
  • [41] Disk-based k-mer counting on a PC
    Sebastian Deorowicz
    Agnieszka Debudaj-Grabysz
    Szymon Grabowski
    BMC Bioinformatics, 14
  • [42] 3-DIMENSIONAL DISK-BASED OPTICAL CORRELATOR
    CURTIS, K
    PSALTIS, D
    OPTICAL ENGINEERING, 1994, 33 (12) : 4051 - 4054
  • [43] AN ALGORITHM FOR DISK CACHING WITH LIMITED MEMORY
    MCKEON, B
    BYTE, 1985, 10 (09): : 129 - &
  • [44] An efficient disk-based tool for solving large Markov models
    Deavours, DD
    Sanders, WH
    PERFORMANCE EVALUATION, 1998, 33 (01) : 67 - 84
  • [45] A Performance Study on Large-Scale Data Analytics Using Disk-Based and In-Memory Database Systems
    Chao, Pingfu
    He, Dan
    Sadiq, Shazia
    Zheng, Kai
    Zhou, Xiaofang
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2017, : 247 - 254
  • [46] Energy-Efficient Techniques for Disk-Based Mobile Systems
    Kim, Young-Jin
    Kim, Jihong
    JOURNAL OF LOW POWER ELECTRONICS, 2007, 3 (03) : 302 - 317
  • [47] Linear-Time Disk-Based Implicit Graph Search
    Korf, Richard E.
    JOURNAL OF THE ACM, 2008, 55 (06)
  • [48] eRAID: Conserving energy in conventional disk-based RAID system
    Wang, Jun
    Zhu, Huijun
    Li, Dong
    IEEE TRANSACTIONS ON COMPUTERS, 2008, 57 (03) : 359 - 374
  • [49] SSDMiner: A Scalable and Fast Disk-Based Frequent Pattern Miner
    Chon, Kang-Wook
    Kim, Min-Soo
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON EMERGING DATABASES: TECHNOLOGIES, APPLICATIONS, AND THEORY, 2018, 461 : 99 - 110
  • [50] Supporting High Updates Disk-based Index in Road Network
    Liu, Liangxu
    Li, Weimin
    Guo, Yongming
    Le, Jiajin
    PROCEEDINGS OF THE ICEBE 2008: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, 2008, : 517 - +