Detection for Approximately Duplicate Records Based on Fuzzy Comprehensive Evaluation

被引:0
|
作者
Zhou, Lijuan [1 ]
Xiao, Zhe [1 ]
机构
[1] Hunan Univ Technol, Coll Sci & Technol, Zhuzhou 412008, Hunan, Peoples R China
关键词
Approximately Duplicate Records; Attribute Weight; Fuzzy Comprehensive Evaluation; Record Grouping; Similarity;
D O I
10.4028/www.scientific.net/AMM.397-400.2464
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
To solve the problem of attribute weight determination in the approximately duplicate records, we put forward a method based on fuzzy comprehensive evaluation to get attribute weight in data set. We first perform an analysis of the composition factors of attribute. Then we carry out an evaluation of their rank. Finally, we make a determination of the attribute weight using the fuzzy comprehensive evaluation method, on the basis of which the approximately duplicate records are detected. Theoretical analysis and experimental results show that the method can objectively determine all attributes weight, and effectively detect the approximately duplicate records in massive data set.
引用
收藏
页码:2464 / 2468
页数:5
相关论文
共 50 条
  • [1] Approximately duplicate records detection based on complete sub-graph
    Software School, Xiamen University, Xiamen, Fujian, China
    Adv. Inf. Sci. Serv. Sci., 11 (352-361):
  • [2] Identification of approximately duplicate material records in ERP systems
    Zong, Wei
    Wu, Feng
    Chu, Lap-Keung
    Sculli, Domenic
    ENTERPRISE INFORMATION SYSTEMS, 2017, 11 (03) : 434 - 451
  • [3] Efficient approach for detecting approximately duplicate database records
    Qiu, Y.F.
    Tian, Z.P.
    Ji, W.Y.
    Zhou, A.Y.
    Jisuanji Xuebao/Chinese Journal of Computers, 2001, 24 (01): : 69 - 77
  • [5] Elimination method for approximately duplicate records in vehicle inspection based on Apriori algorithm
    An, Xiang-Bi
    Du, Ai-Yong
    Li, Shu-Min
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2010, 43 (07): : 606 - 610
  • [6] An n-gram-based approach for detecting approximately duplicate database records
    Tian Z.
    Lu H.
    Ji W.
    Zhou A.
    Tian Z.
    International Journal on Digital Libraries, 2002, 3 (4) : 325 - 331
  • [7] EVALUATION OF SEARCH KEYS FOR THE DETECTION OF DUPLICATE DONOR RECORDS
    FLOYD, GM
    GREGG, MH
    WOTEKI, TH
    TRANSFUSION, 1995, 35 (10) : A23 - A23
  • [8] The Application of MPN Algorithm in the Field of Detecting Approximately Duplicate Records
    Zhong, Qi
    Hu, Jian-feng
    INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION ENGINEERING (CSIE 2015), 2015, : 276 - 282
  • [9] Method for detecting approximately duplicate database records in data warehouse
    Li, Xing-Yi
    Bao, Cong-Jian
    Shi, Hua-Ji
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2007, 36 (06): : 1273 - 1277
  • [10] Approximately Duplicated Records Detection Based on Feature Selection
    Xi, Yu
    Zhang, Zhipeng
    You, Jinguo
    Jia, Lianyin
    Wang, Yang
    Li, Bo
    ASIA-PACIFIC MANAGEMENT AND ENGINEERING CONFERENCE (APME 2014), 2014, : 256 - 263