Web-Scale Semantic Product Search with Large Language Models

被引:3
|
作者
Muhamed, Aashiq [1 ]
Srinivasan, Sriram [1 ]
Teo, Choon-Hui [1 ]
Cui, Qingjun [1 ]
Zeng, Belinda [2 ]
Chilimbi, Trishul [2 ]
Vishwanathan, S. V. N. [1 ]
机构
[1] Amazon, Palo Alto, CA 94303 USA
[2] Amazon, Seattle, WA USA
关键词
Matching; Retrieval; Search; Pretrained Language Models;
D O I
10.1007/978-3-031-33380-4_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dense embedding-based semantic matching is widely used in e-commerce product search to address the shortcomings of lexical matching such as sensitivity to spelling variants. The recent advances in BERT-like language model encoders, have however, not found their way to realtime search due to the strict inference latency requirement imposed on e-commerce websites. While bi-encoder BERT architectures enable fast approximate nearest neighbor search, training them effectively on query-product data remains a challenge due to training instabilities and the persistent generalization gap with cross-encoders. In this work, we propose a four-stage training procedure to leverage large BERT-like models for product search while preserving low inference latency. We introduce query-product interaction pre-finetuning to effectively pretrain BERT bi-encoders for matching and improve generalization. Through offline experiments on an e-commerce product dataset, we show that a distilled small BERT-based model (75M params) trained using our approach improves the search relevance metric by up to 23% over a baseline DSSM-based model with similar inference latency. The small model only suffers a 3% drop in relevance metric compared to the 20x larger teacher. We also show using online A/B tests at scale, that our approach improves over the production model in exact and substitute products retrieved.
引用
收藏
页码:73 / 85
页数:13
相关论文
共 50 条
  • [41] Web-Scale Multimedia Information Networks
    Qi, Guo-Jun
    Tsai, Min-Hsuan
    Tsai, Shen-Fu
    Cao, Liangliang
    Huang, Thomas S.
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2688 - 2704
  • [42] D2LLM: Decomposed and Distilled Large Language Models for Semantic Search
    Liao, Zihan
    Yu, Hang
    Li, Jianguo
    Wang, Jun
    Zhang, Wei
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 14798 - 14814
  • [43] DirectLoad: A Fast Web-scale Index System across Large Regional Centers
    Qin, An
    Xiao, Mengbai
    Ma, Jin
    Tan, Dai
    Lee, Rubao
    Zhang, Xiaodong
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1790 - 1801
  • [44] Web-Scale Human Task Management
    Schulte, Daniel
    SOFTWARE ARCHITECTURE, 2011, 6903 : 190 - 193
  • [45] Web-Scale Training for Face Identification
    Taigman, Yaniv
    Yang, Ming
    Ranzato, Marc'Aurelio
    Wolf, Lior
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2746 - 2754
  • [46] KAON - Towards a large scale Semantic Web
    Bozsak, E
    Ehrig, M
    Handschuh, S
    Hotho, A
    Maedche, A
    Motik, B
    Oberle, D
    Schmitz, C
    Staab, S
    Stojanovic, L
    Stojanovic, N
    Studer, R
    Stumme, G
    Sure, Y
    Tane, J
    Volz, R
    Zacharias, V
    E-COMMERCE AND WEB TECHNOLOGIES, PROCEEDINGS, 2002, 2455 : 304 - 313
  • [47] Social Web-Scale Provenance in the Cloud
    Simmhan, Yogesh
    Gomadam, Karthik
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, 2010, 6378 : 298 - 300
  • [48] DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
    Kapelyukh, Ivan
    Vosylius, Vitalis
    Johns, Edward
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (07) : 3956 - 3963
  • [49] Web-Scale Multimedia Processing and Applications
    Chang, Edward
    Chang, Shih-Fu
    Hauptmann, Alexander G.
    Huang, Thomas S.
    Slaney, Malcolm
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2580 - 2583
  • [50] Face recognition for web-scale datasets
    Ortiz, Enrique G.
    Becker, Brian C.
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 118 : 153 - 170