Multi-component Similarity Method for Web Product Duplicate Detection

被引:12
|
作者
van Bezu, Ronald [1 ]
Borst, Sjoerd [1 ]
Rijkse, Rick [1 ]
Verhagen, Jim [1 ]
Vandic, Damir [1 ]
Frasincar, Flavius [1 ]
机构
[1] Erasmus Univ, POB 1738, NL-3000 DR Rotterdam, Netherlands
关键词
D O I
10.1145/2695664.2695818
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Due to the growing number of Web shops, aggregating product data from the Web is growing in importance. One of the problems encountered in product aggregation is duplicate detection. In this paper, we extend and significantly improve an existing state-of-the-art product duplicate detection method. Our approach employs a novel method for combining the titles' and the attributes' similarities into a final product similarity. We use q-grams to handle partial matching of words, such as abbreviations. Where existing methods cluster products of only two Web shops, we propose a hierarchical clustering method to handle multiple Web shops. Applying our new method to a dataset of TV's from four Web shops reveals that it significantly outperforms the Hybrid Similarity Method, the Title Model Words Method, and the well-known TF-IDF method, with an F-1 score of 0.475 compared to 0.287, 0.298, and 0.335, respectively.
引用
收藏
页码:761 / 768
页数:8
相关论文
共 50 条
  • [21] A Multi-Component Sensor System for Detection of Amphiphilic Compounds
    Xu, Ming
    Kelley, Steven P.
    Glass, Timothy E.
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2018, 57 (39) : 12741 - 12744
  • [22] Detection and Separation of Multi-component Radar Emitter Signal
    Wang, Xiaofeng
    Zhang, Guoyi
    Qi, Lijun
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND ENGINEERING INNOVATION, 2015, 12 : 1909 - 1913
  • [23] Lipid detection by a multi-component fluorescent sensor system
    Xu, Ming
    Littlefield, Charles
    Glass, Timothy
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256
  • [24] A multi-component matrix loop algebra and a unified expression of the multi-component AKNS hierarchy and the multi-component BPT hierarchy
    Zhang, YF
    PHYSICS LETTERS A, 2005, 342 (1-2) : 82 - 89
  • [25] A Multi-component Decomposition Method for Polarimetric SAR Data
    WEI Jujie
    ZHAO Zheng
    YU Xiaoping
    LU Lijun
    ChineseJournalofElectronics, 2017, 26 (01) : 205 - 210
  • [26] A Multi-component Decomposition Method for Polarimetric SAR Data
    Wei Jujie
    Zhao Zheng
    Yu Xiaoping
    Lu Lijun
    CHINESE JOURNAL OF ELECTRONICS, 2017, 26 (01) : 205 - 210
  • [27] Multi-component and multi-array TEM detection in karst tunnels
    Sun, Huaifeng
    Li, Xiu
    Li, Shucai
    Qi, Zhipeng
    Su, Maoxin
    Xue, Yiguo
    JOURNAL OF GEOPHYSICS AND ENGINEERING, 2012, 9 (04) : 359 - 373
  • [28] Sequential method of topological optimization in multi-component systems
    Ferro, Rafael Marin
    Pavanello, Renato
    LATIN AMERICAN JOURNAL OF SOLIDS AND STRUCTURES, 2023, 20 (06):
  • [29] Suppressing the coalescence in the multi-component lattice Boltzmann method
    Farhat, H.
    Lee, J. S.
    MICROFLUIDICS AND NANOFLUIDICS, 2011, 11 (02) : 137 - 143
  • [30] APPLICATION OF THICK MULTI-COMPONENT COATINGS BY THERMODIFFUSION METHOD
    MALINOV, LS
    YNENSKII, NE
    WELDING PRODUCTION, 1970, 17 (10): : 64 - &