Disk-based Matrix Completion for Memory Limited Devices

被引:4
|
作者
Lee, Dongha [1 ]
Oh, Jinoh [2 ]
Faloutsos, Christos [3 ]
Kim, Byungju [1 ]
Yu, Hwanjo [1 ]
机构
[1] Pohang Univ Sci & Technol, Pohang, South Korea
[2] Adobe Syst Inc, San Jose, CA USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Matrix completion; Stochastic gradient descent; Data management;
D O I
10.1145/3269206.3271685
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
More and more data need to be processed or analyzed within mobile devices for efficiency or privacy reasons, but performing machine learning tasks with large data within the devices is challenging because of their limited memory resources. For this reason, disk-based machine learning methods have been actively researched, which utilize storage resources without holding all the data in memory. This paper proposes D-MC2, a novel disk-based matrix completion method that (1) supports incremental data update (i.e., data insertion and deletion) and (2) spills both data and model to disk when necessary; these functionalities are not supported by existing methods. First, D-MC2 builds a two-layered index to efficiently support incremental data update; there exists a trade-off relationship between model learning and data update costs, and our two-layered index simultaneously optimizes the two costs. Second, we develop a window-based stochastic gradient descent (SGD) scheduler to efficiently support the dual spilling; a huge amount of disk I/O is incurred when the size of model is larger than that of memory, and our new scheduler substantially reduces it. Our evaluation results show that D-MC2 is significantly more scalable and faster than other disk-based competitors under the limited memory environment. In terms of the co-optimization, DMC2 outperforms the baselines that only optimize one of the two costs up to 48x. Furthermore, the window-based scheduler improves the training speed 12.4x faster compared to a naive scheduler.
引用
收藏
页码:1093 / 1102
页数:10
相关论文
共 50 条
  • [1] Scalable disk-based topic modeling for memory limited devices
    Kim, Byungju
    Lee, Dongha
    Oh, Jinoh
    Yu, Hwanjo
    INFORMATION SCIENCES, 2020, 516 : 353 - 369
  • [2] THE DISK-BASED BIBLE
    EDWARDS, J
    POPULAR COMPUTING, 1983, 2 (04): : 128 - &
  • [3] A Disk-Based Index for Trajectories with an In-Memory Compressed Cache
    Campos, Daniela
    Gomoz-Brandon, Adrian
    Navarro, Gonzalo
    2021 DATA COMPRESSION CONFERENCE (DCC 2021), 2021, : 340 - 340
  • [4] Disk-based VLBI recording
    Whitney, A
    Future Directions in High Resolution Astronomy: The 10th Anniversary of the VLBA, 2005, 340 : 588 - 594
  • [5] DISK-BASED DATABASES - TIME FOR REFLECTION
    JOYCE, J
    DATA PROCESSING, 1979, 21 (04): : 32 - 34
  • [6] Disk-based recording applications for the VLBA
    Romney, JD
    NEW TECHNOLOGIES IN VLBI, 2003, 306 : 153 - 160
  • [7] DIGITAL POSTPRODUCTION IN A DISK-BASED ENVIRONMENT
    BENNETT, P
    SMPTE JOURNAL, 1986, 95 (01): : 191 - 191
  • [8] Disk-Based Management of Interaction Graphs
    Gedik, Bugra
    Bordawekar, Rajesh
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (11) : 2689 - 2702
  • [9] Disk-based storage for scalable video
    Chang, E
    Zakhor, A
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1997, 7 (05) : 758 - 770
  • [10] Disk-Based Indexing of Recent Trajectories
    Mahmood, Ahmed R.
    Aly, Ahmed M.
    Kuznetsova, Tatiana
    Basalamah, Saleh
    Aref, Walid G.
    ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2018, 4 (03)