XML Structural Similarity Search Using MapReduce

被引:0
|
作者
Yuan, Peisen [1 ,2 ]
Sha, Chaofeng [1 ,2 ]
Wang, Xiaoling [3 ]
Yang, Bin [1 ,2 ]
Zhou, Aoying [2 ,3 ]
Yang, Su [1 ,2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[3] East China Normal Univ, Shanghai Key Lab Trustworthy Comp, Software Engn Inst, Shanghai, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
XML is a de-facto standard for web data exchange and information representation. Efficient management of these large volumes of XML data brings challenges to conventional technique. To cope with large scale data, MapReduce computing framework as an efficient solution has attracted more and more attention in the database community recently. In this paper, an efficient and scalable framework is proposed for XML structural similarity search on large cluster with MapReduce. First, sub-structures of XML structure are extracted from large XML corpus located on a large cluster in parallel. Then Min-Hashing and locality sensitive hashing techniques are developed on the distributed and parallel computing framework for efficient structural similarity search processing. An empirical study on the cluster with real large datasets demonstrates the effectiveness and efficiency of our approach.
引用
收藏
页码:169 / +
页数:3
相关论文
共 50 条
  • [21] Semantic Structural Similarity for Clustering XML Documents
    Kim, Tae-Soon
    Lee, Ju-Hong
    Song, Jae-Won
    ICHIT 2008: INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 552 - 557
  • [22] Clustering XML documents based on structural similarity
    Xing, Guangming
    Xia, Zhonghang
    Guo, Jinhua
    ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 905 - +
  • [23] Clustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach
    Kutty, Sangeetha
    Tran, Tien
    Nayak, Richi
    Li, Yuefeng
    FOCUSED ACCESS TO XML DOCUMENTS, 2008, 4862 : 183 - 194
  • [24] Search for structural similarity in proteins
    Leluk, J
    Konieczny, L
    Roterman, I
    BIOINFORMATICS, 2003, 19 (01) : 117 - 124
  • [25] Distributed XML Twig Query Processing Using MapReduce
    Bi, Xin
    Wang, Guoren
    Zhao, Xiangguo
    Zhang, Zhen
    Chen, Shuang
    WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015), 2015, 9313 : 203 - 214
  • [26] Scalable Metric Similarity Join using MapReduce
    Wu, Jiacheng
    Zhang, Yong
    Wang, Jin
    Lin, Chunbin
    Fu, Yingjia
    Xing, Chunxiao
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1662 - 1665
  • [27] Detecting Text Similarity Using MapReduce Framework
    Birjali, Marouane
    Beni-Hssane, Abderrahim
    Erritali, Mohammed
    Madani, Youness
    EUROPE AND MENA COOPERATION ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGIES, 2017, 520 : 383 - 389
  • [28] Privacy preserving similarity joins using MapReduce
    Ding, Xiaofeng
    Yang, Wanlu
    Choo, Kim-Kwang Raymond
    Wang, Xiaoli
    Jin, Hai
    INFORMATION SCIENCES, 2019, 493 : 20 - 33
  • [29] Semantic Structural Similarity Measure for Clustering XML Documents
    Song, Ling
    Ma, Jun
    Lei, Jingsheng
    Zhang, Dongmei
    Wang, Zhen
    WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, 5854 : 232 - +
  • [30] Measuring the structural similarity among XML documents and DTDs
    Elisa Bertino
    Giovanna Guerrini
    Marco Mesiti
    Journal of Intelligent Information Systems, 2008, 30 : 55 - 92