Memcached Optimization on High Performance I/O Technology

被引：0

作者：

An Z. ^{[1
]}

Du H. ^{[1
,2
]}

Li Q. ^{[1
]}

Huo Z. ^{[1
]}

Ma J. ^{[1
]}

机构：

[1] High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing

[2] School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing

来源：

Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2018年 / 55卷 / 04期

关键词：

!text type='Java']Java[!/text] virtual machine (JVM); Memcached; NVMe SSD; Remote direct memory access (RDMA); User-level I/O;

D O I：

10.7544/issn1000-1239.2018.20160890

中图分类号：

学科分类号：

摘要：

Existing in-memory object caching systems are bottlenecked by the latency overhead of traditional Ethernet and the limited DRAM amount within the servers. Modern high-performance I/O technologies such as RDMA and NVMe provide a promising solution to address such challenges. In this paper, we focus on the data plane efficiency of in-memory object caching systems and undertake a study on the widely deployed Memcached for fast message transfer and cost-effective storage extension based on high-performance I/O. First, the communication protocol is re-designed on RDMA semantics, and different strategies are applied according to the Memcached operation type and message payload size for optimal overall latency. Second, Memcached is altered to incorporate the NVMe SSDs to expand storage capacity. A circular log structure is adopted to manage the two-level hierarchy of DRAM and SSD. The SSD is directly accessed from the user-space to reduce software overhead. Finally, a JVM-enabled caching system named U2cache is presented. U2cache significantly improves the performance by bypassing both the OS kernel and the JVM runtime. The latency is further hidden through pipelining and overlapping of memory copy, RDMA transfer and SSD access. Benchmarking results indicate that U2cache achieves near-optimal performance of the underlying RDMA interconnect. Performance is further improved by 20% with careful optimization for transferring large messages. For accessing data located in SSD, the latency is reduced by up to 31% compared with the kernel-based I/O. © 2018, Science Press. All right reserved.

引用

页码：864 / 874

页数：10

共 25 条

[1] Memcached-A distributed memory object caching system, (2016)
[2] Nishtala R., Fugal H., Grimm S., Et al., Scaling memcache at Facebook, Proc of the 10th USENIX Symp on Networked Systems Design and Implementation, pp. 385-398, (2013)
[3] Twemcache, (2016)
[4] Cen W., memcache-client-forjava
[5] Tiwari P., Infographic: The cost of your website and mobile App's poor performance in 2015
[6] Cisco visual networking index: Forecast and methodology, (2015)
[7] Cohen D., Talpey T., Kanevsky A., Et al., Remote direct memory access over the converged enhanced Ethernet fabric: Evaluating the options, Proc of the 17th IEEE Symp on High Performance Interconnects, pp. 123-130, (2009)
[8] 3D NAND, (2016)
[9] Xu Q., Siyamwala H., Ghosh M., Et al., Performance analysis of NVMe SSDs and their implication on real world databases, Proc of the 8th ACM Int Systems and Storage Conf, (2015)
[10] 3D XPoint technology

← 1 2 3 →