Research on Small File Processing Technology Based on HDFS

被引:0
|
作者
Gu, Rui
机构
关键词
HDFS; cloud storage; small files; file merge; insert;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
With the rapid development of the Internet and the rapid growth of Internet users, the Internet data is also a sharp expansion. The emergence of cloud computing is a good solution to the large data computing and storage problems, massive data storage and analysis has become a very popular research field. HDFS uses a single NameNode to manage the metadata of the entire system, and stores metadata in memory in order to improve access efficiency, but when the system stores a large number of small files, it generates a lot of metadata, occupies larger NameNode memory. In addition, a large number of small file access need to frequently send a request to the NameNode, resulting in the NameNode overload. In view of this problem, this paper analyzes some of the previous research and improvement programs, and on this basis to do a corresponding improvement. On the basis of the original distributed file system, an independent small file processing module was added. The small file processing module merged the small files, created the index of the file, and passed the file cache to HDFS for data processing.
引用
收藏
页码:286 / 289
页数:4
相关论文
共 50 条
  • [21] Efficient Handling of Heterogeneous File Formats in HDFS
    Prashant, More Vaishali
    Raut, Suhas D.
    2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [22] Hadoop Perfect File: A fast and memory-efficient metadata access archive file to face small files problem in HDFS
    Zhai, Yanlong
    Tchaye-Kondi, Jude
    Lin, Kwei-Jay
    Zhu, Liehuang
    Tao, Wenjun
    Du, Xiaojiang
    Guizani, Mohsen
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 156 : 119 - 130
  • [23] A method to calculate the number of dynamic HDFS copies based on file access popularity
    Cao, Xi-yue
    Wang, Chao
    Wang, Biao
    He, Zhen-xue
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2022, 19 (12) : 12212 - 12231
  • [24] Implementing WebGIS on Hadoop: A Case Study of Improving Small File I/O Performance on HDFS
    Liu, Xuhui
    Han, Jizhong
    Zhong, Yunqin
    Han, Chengde
    He, Xubin
    2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, 2009, : 429 - +
  • [25] A Novel Approach to Record File Correlation and Reduce Mapping Frequency on HDFS Based on ExtendHDFS
    Xiao, Chang
    Li, Qiang
    Zheng, Dong
    2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 244 - 248
  • [26] A Forensic Method for Efficient File Extraction in HDFS Based on Three-Level Mapping
    GAO Yuanzhao
    LI Binglong
    WuhanUniversityJournalofNaturalSciences, 2017, 22 (02) : 114 - 126
  • [27] A DYNAMIC REPLICA STRATEGY BASED ON MARKOV MODEL FOR HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
    Qu, Kaiyang
    Meng, Luoming
    Yang, Yang
    PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 337 - 342
  • [28] Hadoop Massive Small File Merging Technology Based on Visiting Hot-Spot and Associated File Optimization
    Peng, Jian-Feng
    Wei, Wen-Guo
    Zhao, Hui-Min
    Dai, Qing-Yun
    Xie, Gui-Yuan
    Cai, Jun
    He, Ke-Jing
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2018, 2018, 10989 : 517 - 524
  • [29] RESEARCH AND IMPROVEMENT OF HDFS
    Tang, Xiaolong
    Tao, Zhongyu
    Tang, Panshi
    Li, Jianping
    2014 11TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2014, : 427 - 429
  • [30] The Research and Implementation of Metadata Cache Backup Technology Based on CEPH File System
    Zhan, Ling
    Fang, Xieyun
    Li, Duping
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 72 - 77