Large-Scale Taxonomy Categorization for Noisy Product Listings

被引:0
|
作者
Das, Pradipto [1 ]
Xia, Yandi
Levine, Aaron
Di Fabbrizio, Giuseppe
Datta, Ankur
机构
[1] Rakuten Inst Technol, Boston, MA 02110 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
E-commerce catalogs include a continuously growing number of products that are constantly updated. Each item in a catalog is characterized by several attributes and identified by a taxonomy label. Categorizing products with their taxonomy labels is fundamental to effectively search and organize listings in a catalog. However, manual and/or rule based approaches to categorization are not scalable. In this paper, we compare several classifiers to product taxonomy categorization of toplevel categories. We first investigate a number of feature sets and observe that a combination of word unigrams from product names and navigational breadcrumbs work best for categorization. Secondly, we apply correspondence topic models to detect noisy data and introduce a lightweight manual process to improve dataset quality. Finally, we evaluate linear models, gradient boosted trees (GBTs) and convolutional neural networks (CNNs) with pre-trained word embeddings demonstrating that, compared to other baselines, GBTs and CNNs yield the highest gains in error reduction.
引用
收藏
页码:3885 / 3894
页数:10
相关论文
共 50 条
  • [31] A sparse version of the ridge logistic regression for large-scale text categorization
    Aseervatham, Sujeevan
    Antoniadis, Anestis
    Gaussier, Eric
    Burlet, Michel
    Denneulin, Yves
    PATTERN RECOGNITION LETTERS, 2011, 32 (02) : 101 - 106
  • [32] Meta-Class Features for Large-Scale Object Categorization on a Budget
    Bergamo, Alessandro
    Torresani, Lorenzo
    2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 3085 - 3092
  • [33] Large-Scale Aerial Image Categorization Using a Multitask Topological Codebook
    Zhang, Luming
    Wang, Meng
    Hong, Richang
    Yin, Bao-Cai
    Li, Xuelong
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (02) : 535 - 545
  • [34] Large-Scale Sparse Learning From Noisy Tags for Semantic Segmentation
    Li, Aoxue
    Lu, Zhiwu
    Wang, Liwei
    Han, Peng
    Wen, Ji-Rong
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (01) : 253 - 263
  • [35] Deep Product Quantization for Large-Scale Image Retrieval
    Zhai, Qi
    Jiang, Mingyan
    2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019), 2019, : 198 - 202
  • [36] The Large-Scale Optimization Problem of Product Distribution in Orders
    Semenkina, Olga
    Ryzhikov, Ivan
    Semenkin, Eugene
    2018 INTERNATIONAL CONFERENCE ON APPLIED MATHEMATICS & COMPUTATIONAL SCIENCE (ICAMCS.NET 2018), 2018, : 83 - 87
  • [37] Effective consistency management for large-scale product data
    Semenov, Vitaly
    Ilyin, Denis
    Morozov, Sergey
    Tarlapan, Oleg
    JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION, 2019, 13 : 13 - 21
  • [38] Large-Scale Quantum Dynamics with Matrix Product States
    Baiardi, Alberto
    Reiher, Markus
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2019, 15 (06) : 3481 - 3498
  • [39] Product Embedding for Large-Scale Disaggregated Sales Data
    Li, Yinxing
    Terui, Nobuhiko
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1:, 2021, : 69 - 75
  • [40] LARGE-SCALE FADING PRECODING FOR MAXIMIZING THE PRODUCT OF SINRS
    Demir, Ozlem Tugfe
    Bjornson, Emil
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 5150 - 5154