Disk-based Matrix Completion for Memory Limited Devices

被引:4
|
作者
Lee, Dongha [1 ]
Oh, Jinoh [2 ]
Faloutsos, Christos [3 ]
Kim, Byungju [1 ]
Yu, Hwanjo [1 ]
机构
[1] Pohang Univ Sci & Technol, Pohang, South Korea
[2] Adobe Syst Inc, San Jose, CA USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Matrix completion; Stochastic gradient descent; Data management;
D O I
10.1145/3269206.3271685
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
More and more data need to be processed or analyzed within mobile devices for efficiency or privacy reasons, but performing machine learning tasks with large data within the devices is challenging because of their limited memory resources. For this reason, disk-based machine learning methods have been actively researched, which utilize storage resources without holding all the data in memory. This paper proposes D-MC2, a novel disk-based matrix completion method that (1) supports incremental data update (i.e., data insertion and deletion) and (2) spills both data and model to disk when necessary; these functionalities are not supported by existing methods. First, D-MC2 builds a two-layered index to efficiently support incremental data update; there exists a trade-off relationship between model learning and data update costs, and our two-layered index simultaneously optimizes the two costs. Second, we develop a window-based stochastic gradient descent (SGD) scheduler to efficiently support the dual spilling; a huge amount of disk I/O is incurred when the size of model is larger than that of memory, and our new scheduler substantially reduces it. Our evaluation results show that D-MC2 is significantly more scalable and faster than other disk-based competitors under the limited memory environment. In terms of the co-optimization, DMC2 outperforms the baselines that only optimize one of the two costs up to 48x. Furthermore, the window-based scheduler improves the training speed 12.4x faster compared to a naive scheduler.
引用
收藏
页码:1093 / 1102
页数:10
相关论文
共 50 条
  • [31] Serial disk-based analysis of large stochastic models
    Mehmood, R
    VALIDATION OF STOCHASTIC SYSTEMS: A GUIDE TO CURRENT RESEARCH, 2004, 2925 : 230 - 255
  • [32] B-tries for disk-based string management
    Askitis, Nikolas
    Zobel, Justin
    VLDB JOURNAL, 2009, 18 (01): : 157 - 179
  • [33] AN APPROACH TO EXECUTIVE SYSTEM MAINTENANCE IN DISK-BASED SYSTEMS
    ROSIN, RF
    COMPUTER JOURNAL, 1966, 9 (03): : 242 - &
  • [34] Small-Term Distribution for Disk-Based Search
    Kane, Andrew
    Tompa, Frank Wm
    PROCEEDINGS OF THE 2017 ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 17), 2017, : 49 - 58
  • [35] Efficient layout transformation for disk-based multidimensional arrays
    Krishnamoorthy, Sriram
    Baumgartner, Gerald
    Lam, Chi-Chung
    Nieplocha, Jarek
    Sadayappan, P.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004, 3296 : 386 - 398
  • [36] Incremental Text Indexing for Fast Disk-Based Search
    Margaritis, Giorgos
    Anastasiadis, Stergios V.
    ACM TRANSACTIONS ON THE WEB, 2014, 8 (03)
  • [37] Accuracy of dynamic disk-based DDA with respect to a single sliding disk cluster
    Beyabanaki, S. Amir Reza
    Bagtzoglou, Amvrossios C.
    GEOMECHANICS AND GEOENGINEERING-AN INTERNATIONAL JOURNAL, 2014, 9 (03): : 231 - 240
  • [38] Redio: Accelerating Disk-Based Graph Processing by Reducing Disk I/Os
    Wu, Chengwen
    Zhang, Guangyan
    Wang, Yang
    Jiang, Xinyang
    Zheng, Weimin
    IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (03) : 414 - 425
  • [39] LoneStar Stack: Architecture of a Disk-Based Archival System
    Grawinkel, Matthias
    Best, Gregor
    Splietker, Malte
    Brinkmann, Andre
    2014 9th IEEE International Conference on Networking, Architecture, and Storage (NAS), 2014, : 176 - 185
  • [40] Disk-based compression of data from genome sequencing
    Grabowski, Szymon
    Deorowicz, Sebastian
    Roguski, Lukasz
    BIOINFORMATICS, 2015, 31 (09) : 1389 - 1395