Large-Scale Taxonomy Categorization for Noisy Product Listings

被引:0
|
作者
Das, Pradipto [1 ]
Xia, Yandi
Levine, Aaron
Di Fabbrizio, Giuseppe
Datta, Ankur
机构
[1] Rakuten Inst Technol, Boston, MA 02110 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
E-commerce catalogs include a continuously growing number of products that are constantly updated. Each item in a catalog is characterized by several attributes and identified by a taxonomy label. Categorizing products with their taxonomy labels is fundamental to effectively search and organize listings in a catalog. However, manual and/or rule based approaches to categorization are not scalable. In this paper, we compare several classifiers to product taxonomy categorization of toplevel categories. We first investigate a number of feature sets and observe that a combination of word unigrams from product names and navigational breadcrumbs work best for categorization. Secondly, we apply correspondence topic models to detect noisy data and introduce a lightweight manual process to improve dataset quality. Finally, we evaluate linear models, gradient boosted trees (GBTs) and convolutional neural networks (CNNs) with pre-trained word embeddings demonstrating that, compared to other baselines, GBTs and CNNs yield the highest gains in error reduction.
引用
收藏
页码:3885 / 3894
页数:10
相关论文
共 50 条
  • [21] Software Product Management in Large-Scale Agile
    Moe, Nils Brede
    Berntzen, Marthe
    Barbala, Astri
    Stray, Viktoria
    AGILE PROCESSES IN SOFTWARE ENGINEERING AND EXTREME PROGRAMMING, XP 2024, 2024, 512 : 53 - 69
  • [22] Integrative Cancer Pharmacogenomics to Infer Large-Scale Drug Taxonomy
    El-Hachem, Nehme
    Gendoo, Deena M. A.
    Ghoraie, Laleh Soltan
    Safikhani, Zhaleh
    Smirnov, Petr
    Chung, Christina
    Deng, Kenan
    Fang, Ailsa
    Birkwood, Erin
    Ho, Chantal
    Isserlin, Ruth
    Bader, Gary D.
    Goldenberg, Anna
    Haibe-Kains, Benjamin
    CANCER RESEARCH, 2017, 77 (11) : 3057 - 3069
  • [23] Large-scale Taxonomy Induction Using Entity and Word Embeddings
    Ristoski, Petar
    Faralli, Stefano
    Ponzetto, Simone Paolo
    Paulheim, Heiko
    2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), 2017, : 81 - 87
  • [24] The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
    Salatino, Angelo A.
    Thanapalasingam, Thiviyan
    Mannocci, Andrea
    Osborne, Francesco
    Motta, Enrico
    SEMANTIC WEB - ISWC 2018, PT II, 2018, 11137 : 187 - 205
  • [25] Biological Agency: Its Subjective Foundations and a Large-Scale Taxonomy
    Brizio, Adelina
    Tirassa, Maurizio
    FRONTIERS IN PSYCHOLOGY, 2016, 7
  • [26] Birdsnap: Large-scale Fine-grained Visual Categorization of Birds
    Berg, Thomas
    Liu, Jiongxin
    Lee, Seung Woo
    Alexander, Michelle L.
    Jacobs, David W.
    Belhumeur, Peter N.
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2019 - 2026
  • [27] Directing Requests in a Large-Scale Grid System based on Resource Categorization
    Karaoglanoglou, Konstantinos
    Karatza, Helen
    PROCEEDINGS OF THE 2011 INTERNATIONAL SYMPOSIUM ON PERFORMANCE EVALUATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS, 2011, : 9 - 15
  • [28] Large-Scale Linguistic Ontology as a Basis for Text Categorization of Legislative Documents
    Loukachevitch, Natalia
    Dobrov, Boris
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 134 : 109 - 110
  • [29] A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
    Yang, Linjie
    Luo, Ping
    Loy, Chen Change
    Tang, Xiaoou
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 3973 - 3981
  • [30] Decline of Pearson’s r with categorization of variables: a large-scale simulation
    Onoshima T.
    Shiina K.
    Ueda T.
    Kubo S.
    Behaviormetrika, 2019, 46 (2) : 389 - 399