Disk-based Matrix Completion for Memory Limited Devices

被引:4
|
作者
Lee, Dongha [1 ]
Oh, Jinoh [2 ]
Faloutsos, Christos [3 ]
Kim, Byungju [1 ]
Yu, Hwanjo [1 ]
机构
[1] Pohang Univ Sci & Technol, Pohang, South Korea
[2] Adobe Syst Inc, San Jose, CA USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Matrix completion; Stochastic gradient descent; Data management;
D O I
10.1145/3269206.3271685
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
More and more data need to be processed or analyzed within mobile devices for efficiency or privacy reasons, but performing machine learning tasks with large data within the devices is challenging because of their limited memory resources. For this reason, disk-based machine learning methods have been actively researched, which utilize storage resources without holding all the data in memory. This paper proposes D-MC2, a novel disk-based matrix completion method that (1) supports incremental data update (i.e., data insertion and deletion) and (2) spills both data and model to disk when necessary; these functionalities are not supported by existing methods. First, D-MC2 builds a two-layered index to efficiently support incremental data update; there exists a trade-off relationship between model learning and data update costs, and our two-layered index simultaneously optimizes the two costs. Second, we develop a window-based stochastic gradient descent (SGD) scheduler to efficiently support the dual spilling; a huge amount of disk I/O is incurred when the size of model is larger than that of memory, and our new scheduler substantially reduces it. Our evaluation results show that D-MC2 is significantly more scalable and faster than other disk-based competitors under the limited memory environment. In terms of the co-optimization, DMC2 outperforms the baselines that only optimize one of the two costs up to 48x. Furthermore, the window-based scheduler improves the training speed 12.4x faster compared to a naive scheduler.
引用
收藏
页码:1093 / 1102
页数:10
相关论文
共 50 条
  • [21] Disk-based k-mer counting on a PC
    Deorowicz, Sebastian
    Debudaj-Grabysz, Agnieszka
    Grabowski, Szymon
    BMC BIOINFORMATICS, 2013, 14
  • [22] DISK-BASED GC-MS-COMPUTER SYSTEMS
    KARASEK, FW
    RESEARCH-DEVELOPMENT, 1974, 25 (09): : 42 - &
  • [23] DISK-BASED PROGRAM SWAPPING IN 8080-BASED MICROCOMPUTERS
    NEUMANN, PG
    BEHAVIOR RESEARCH METHODS & INSTRUMENTATION, 1979, 11 (05): : 512 - 518
  • [24] The application of disk-based video sewers in news and sports
    Shaw, J
    SMPTE JOURNAL, 1999, 108 (02): : 109 - 112
  • [25] A DISK-BASED STORAGE ARCHITECTURE FOR MOVIE ON DEMAND SERVERS
    OZDEN, B
    BILIRIS, A
    RASTOGI, R
    SILBERSCHATZ, A
    INFORMATION SYSTEMS, 1995, 20 (06) : 465 - 482
  • [26] DISK-BASED SEARCH DEMOS AND TUTORIALS - STN MENTOR
    BUNTROCK, RE
    DATABASE, 1988, 11 (01): : 87 - 88
  • [27] A fast data structure for disk-based audio editing
    Mazzoni, D
    Dannenberg, RB
    COMPUTER MUSIC JOURNAL, 2002, 26 (02) : 62 - 76
  • [28] Hybrid Materialization in a Disk-Based Column-Store
    Klyuchikov, Evgeniy
    Polyntsov, Michael
    Chizhov, Anton
    Mikhailova, Elena
    Chernishev, George
    PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 164 - 172
  • [29] B-tries for disk-based string management
    Nikolas Askitis
    Justin Zobel
    The VLDB Journal, 2009, 18 : 157 - 179
  • [30] Creating disk-based FP-tree in oracle
    Lan, YJ
    Qiu, Y
    ICEMI 2005: Conference Proceedings of the Seventh International Conference on Electronic Measurement & Instruments, Vol 8, 2005, : 536 - 540