The WDC Gold Standards for Product Feature Extraction and Product Matching

被引:9
|
作者
Petrovski, Petar [1 ]
Primpeli, Anna [1 ]
Meusel, Robert [1 ]
Bizer, Christian [1 ]
机构
[1] Univ Mannheim, Data & Web Sci Grp, Mannheim, Germany
关键词
e-commerce; Product feature extraction; Product matching;
D O I
10.1007/978-3-319-53676-7_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Finding out which e-shops offer a specific product is a central challenge for building integrated product catalogs and comparison shopping portals. Determining whether two offers refer to the same product involves extracting a set of features (product attributes) from the web pages containing the offers and comparing these features using a matching function. The existing gold standards for product matching have two shortcomings: (i) they only contain offers from a small number of e-shops and thus do not properly cover the heterogeneity that is found on the Web. (ii) they only provide a small number of generic product attributes and therefore cannot be used to evaluate whether detailed product attributes have been correctly extracted from textual product descriptions. To overcome these shortcomings, we have created two public gold standards: The WDC Product Feature Extraction Gold Standard consists of over 500 product web pages originating from 32 different websites on which we have annotated all product attributes (338 distinct attributes) which appear in product titles, product descriptions, as well as tables and lists. The WDC Product Matching Gold Standard consists of over 75 000 correspondences between 150 products (mobile phones, TVs, and headphones) in a central catalog and offers for these products on the 32 web sites. To verify that the gold standards are challenging enough, we ran several baseline feature extraction and matching methods, resulting in F-score values in the range 0.39 to 0.67. In addition to the gold standards, we also provide a corpus consisting of 13 million product pages from the same websites which might be useful as background knowledge for training feature extraction and matching methods.
引用
收藏
页码:73 / 86
页数:14
相关论文
共 50 条
  • [1] The WDC Training Dataset and Gold Standard for Large-Scale Product Matching
    Primpeli, Anna
    Peeters, Ralph
    Bizer, Christian
    COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, : 381 - 386
  • [2] Product Feature Extraction with a Combined Approach
    Li, Zhixing
    2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 686 - 690
  • [3] Propagation based product feature extraction
    Qiu G.
    Zheng M.
    Bu J.-J.
    Shi Y.
    Chen C.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2010, 44 (11): : 2188 - 2193+2228
  • [4] Product style recognition based on feature matching
    Huang, Qi
    Sun, Shouqian
    Lu, Liang
    Pan, Yunhe
    Zhongguo Jixie Gongcheng/China Mechanical Engineering, 2003, 14 (21):
  • [5] Research on Product Feature Extraction for Chinese Reviews
    Wei, Guiying
    Cai, Wenming
    Qin, Lan
    2016 INTERNATIONAL CONFERENCE ON LOGISTICS, INFORMATICS AND SERVICE SCIENCES (LISS' 2016), 2016,
  • [6] Product feature extraction with co-training
    Wu, Xing
    He, Zhongshi
    Huang, Yongwen
    Journal of Information and Computational Science, 2009, 6 (03): : 1515 - 1523
  • [7] Product feature extraction from Chinese online reviews: application to product improvement
    Shi, Lili
    Lin, Jun
    Liu, Guoquan
    RAIRO-OPERATIONS RESEARCH, 2023, 57 (03) : 1125 - 1147
  • [8] Unsupervised product feature extraction for feature-oriented opinion determination
    Quan, Changqin
    Ren, Fuji
    INFORMATION SCIENCES, 2014, 272 : 16 - 28
  • [9] Automated segmentation and feature extraction of product inspection items
    Talukder, A
    Casasent, D
    OPTICAL PATTERN RECOGNITION VIII, 1997, 3073 : 96 - 107
  • [10] Simultaneous Opinion Lexicon Expansion and Product Feature Extraction
    Wai, Myat Su
    Aung, Sint Sint
    2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017), 2017, : 107 - 112