Large-Scale Taxonomy Categorization for Noisy Product Listings

被引:0
|
作者
Das, Pradipto [1 ]
Xia, Yandi
Levine, Aaron
Di Fabbrizio, Giuseppe
Datta, Ankur
机构
[1] Rakuten Inst Technol, Boston, MA 02110 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
E-commerce catalogs include a continuously growing number of products that are constantly updated. Each item in a catalog is characterized by several attributes and identified by a taxonomy label. Categorizing products with their taxonomy labels is fundamental to effectively search and organize listings in a catalog. However, manual and/or rule based approaches to categorization are not scalable. In this paper, we compare several classifiers to product taxonomy categorization of toplevel categories. We first investigate a number of feature sets and observe that a combination of word unigrams from product names and navigational breadcrumbs work best for categorization. Secondly, we apply correspondence topic models to detect noisy data and introduce a lightweight manual process to improve dataset quality. Finally, we evaluate linear models, gradient boosted trees (GBTs) and convolutional neural networks (CNNs) with pre-trained word embeddings demonstrating that, compared to other baselines, GBTs and CNNs yield the highest gains in error reduction.
引用
收藏
页码:3885 / 3894
页数:10
相关论文
共 50 条
  • [41] LARGE-SCALE FISSION-PRODUCT CONTAINMENT TESTS
    HILLIARD, RK
    POSTMA, AK
    NUCLEAR TECHNOLOGY, 1981, 53 (02) : 163 - 175
  • [42] A Taxonomy of Inter-Team Coordination Mechanisms in Large-Scale Agile
    Berntzen, Marthe
    Hoda, Rashina
    Moe, Nils Brede
    Stray, Viktoria
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (02) : 699 - 718
  • [43] The Research on Automatic Construction Techniques of Large-scale Corpus for Chinese Text Categorization
    Hu, Yan
    Wu, Wei
    Miao, Miao
    IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 640 - 645
  • [44] Words Clustering Based on Keywords Indexing from Large-scale Categorization Corpora
    Hua, Liu
    FIFTH INTERNATIONAL CONFERENCE ON INFORMATION ASSURANCE AND SECURITY, VOL 1, PROCEEDINGS, 2009, : 407 - 410
  • [45] Product-oriented Product Service System for Large-scale Vision Inspection
    Zhou, Binggui
    Yang, Guanghua
    Ma, Shaodan
    11TH CIRP CONFERENCE ON INDUSTRIAL PRODUCT-SERVICE SYSTEMS, 2019, 83 : 675 - 679
  • [46] Extraction of surface primitives from noisy large-scale point-clouds
    Masuda, H.
    Tanaka, Ichiro
    Computer-Aided Design and Applications, 2009, 6 (03): : 387 - 398
  • [47] Fast and Reliable Map Matching from Large-Scale Noisy Positioning Records
    Wang, Yanyu
    Xiong, Ruoxin
    Tang, Pingbo
    Liu, Yongming
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2023, 37 (01)
  • [48] An Efficient Tag Search Protocol in Large-Scale RFID Systems With Noisy Channel
    Chen, Min
    Luo, Wen
    Mo, Zhen
    Chen, Shigang
    Fang, Yuguang
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2016, 24 (02) : 703 - 716
  • [50] Successful global development of a large-scale embedded telecommunications product
    Leszak, Marek
    Meier, Manfred
    SECOND IEEE INTERNATIONAL CONFERENCE ON GLOBAL SOFTWARE ENGINEERING, PROCEEDINGS, 2007, : 23 - 32