Web-Scale Semantic Product Search with Large Language Models

被引:3
|
作者
Muhamed, Aashiq [1 ]
Srinivasan, Sriram [1 ]
Teo, Choon-Hui [1 ]
Cui, Qingjun [1 ]
Zeng, Belinda [2 ]
Chilimbi, Trishul [2 ]
Vishwanathan, S. V. N. [1 ]
机构
[1] Amazon, Palo Alto, CA 94303 USA
[2] Amazon, Seattle, WA USA
关键词
Matching; Retrieval; Search; Pretrained Language Models;
D O I
10.1007/978-3-031-33380-4_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dense embedding-based semantic matching is widely used in e-commerce product search to address the shortcomings of lexical matching such as sensitivity to spelling variants. The recent advances in BERT-like language model encoders, have however, not found their way to realtime search due to the strict inference latency requirement imposed on e-commerce websites. While bi-encoder BERT architectures enable fast approximate nearest neighbor search, training them effectively on query-product data remains a challenge due to training instabilities and the persistent generalization gap with cross-encoders. In this work, we propose a four-stage training procedure to leverage large BERT-like models for product search while preserving low inference latency. We introduce query-product interaction pre-finetuning to effectively pretrain BERT bi-encoders for matching and improve generalization. Through offline experiments on an e-commerce product dataset, we show that a distilled small BERT-based model (75M params) trained using our approach improves the search relevance metric by up to 23% over a baseline DSSM-based model with similar inference latency. The small model only suffers a 3% drop in relevance metric compared to the 20x larger teacher. We also show using online A/B tests at scale, that our approach improves over the production model in exact and substitute products retrieved.
引用
收藏
页码:73 / 85
页数:13
相关论文
共 50 条
  • [31] Computing Web-scale Topic Models using an Asynchronous Parameter Server
    Jagerman, Rolf
    Eickhoff, Carsten
    de Rijke, Maarten
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1337 - 1340
  • [32] Duplicate-Search-Based Image Annotation Using Web-Scale Data
    Wang, Xin-Jing
    Zhang, Lei
    Ma, Wei-Ying
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2705 - 2721
  • [33] Querying Web-Scale Knowledge Graphs Through Effective Pruning of Search Space
    Jin, Jiahui
    Luo, Junzhou
    Khemmarat, Samamon
    Gao, Lixin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (08) : 2342 - 2356
  • [34] Web-scale system for image similarity search: When the dreams are coming true
    Novak, David
    Batko, Michal
    Zezula, Pavel
    2008 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 2008, : 430 - 437
  • [35] Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage
    Thapliyal, Ashish, V
    Soricut, Radu
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 160 - 170
  • [36] SWSNL: Semantic Web Search Using Natural Language
    Habernal, Ivan
    Konopik, Miloslav
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (09) : 3649 - 3664
  • [37] Web-Scale Media Recommendation Systems
    Dror, Gideon
    Koenigstein, Noam
    Koren, Yehuda
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2722 - 2736
  • [38] Web Service Search on Large Scale
    Steinmetz, Nathalie
    Lausen, Holger
    Brunner, Manuel
    SERVICE-ORIENTED COMPUTING - ICSOC 2009, PROCEEDINGS, 2009, 5900 : 437 - +
  • [39] Web-Scale Extraction of Structured Data
    Cafarella, Michael J.
    Madhavan, Jayant
    Halevy, Alon
    SIGMOD RECORD, 2008, 37 (04) : 55 - 61
  • [40] Web-scale image clustering revisited
    Avrithis, Yannis
    Kalantidis, Yannis
    Anagnostopoulos, Evangelos
    Emiris, Ioannis Z.
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1502 - 1510