InteraRec: Interactive Recommendations Using Multimodal Large Language Models

被引：2

作者：

Karra, Saketh Reddy ^{[1
]}

Tulabandhula, Theja ^{[1
]}

机构：

[1] Univ Illinois, Chicago, IL 60607 USA

来源：

TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2024 WORKSHOPS, RAFDA AND IWTA | 2024年 / 14658卷

关键词：

Large language models; Screenshots; User preferences; Recommendations;

D O I：

10.1007/978-981-97-2650-9_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Numerous recommendation algorithms leverage weblogs, employing strategies such as collaborative filtering, content-based filtering, and hybrid methods to provide personalized recommendations to users. Weblogs, comprised of records detailing user activities on any website, offer valuable insights into user preferences, behavior, and interests. Despite the wealth of information weblogs provide, extracting relevant features requires extensive feature engineering. The intricate nature of the data also poses a challenge for interpretation, especially for non-experts. Additionally, they often fall short of capturing visual details and contextual nuances that influence user choices. In the present study, we introduce a sophisticated and interactive recommendation framework denoted as InteraRec, which diverges from conventional approaches that exclusively depend on weblogs for recommendation generation. This framework provides recommendations by capturing high-frequency screenshots of web pages as users navigate through a website. Leveraging advanced multimodal large language models (MLLMs), we extract valuable insights into user preferences from these screenshots by generating a user profile summary. Subsequently, we employ the InteraRec framework to extract relevant information from the summary to generate optimal recommendations. Through extensive experiments, we demonstrate the remarkable effectiveness of our recommendation system in providing users with valuable and personalized offerings.

引用

页码：32 / 43

页数：12

共 50 条

[41] Large Language and Emerging Multimodal Foundation Models: Boundless Opportunities
Forghani, Reza
RADIOLOGY, 2024, 313 (01)
[42] VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jain, Jitesh
Yang, Jianwei
Shi, Humphrey
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27992 - 28002
[43] Multimodal large language models for inclusive collaboration learning tasks
Lewis, Armanda
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 202 - 210
[44] Large Language and Multimodal Models Don't Come Cheap
Anderson, Margo
Perry, Tekla S.
IEEE SPECTRUM, 2023, 60 (07) : 13 - 13
[45] Large Language Models in Rheumatologic Diagnosis: A Multimodal Performance Analysis
Omar, Mahmud
Agbareia, Reem
Klang, Eyal
Naffaa, Mohammaed E.
JOURNAL OF RHEUMATOLOGY, 2025, 52 (02) : 187 - 188
[46] Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Zhang, Yichi
Dong, Yinpeng
Zhang, Siyuan
Min, Tianzan
Su, Hang
Zhu, Jun
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26552 - 26562
[47] Enhancing Urban Walkability Assessment with Multimodal Large Language Models
Blecic, Ivan
Saiu, Valeria
Trunfio, Giuseppe A.
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2024 WORKSHOPS, PT V, 2024, 14819 : 394 - 411
[48] Large Language Models Empower Multimodal Integrated Sensing and Communication
Cheng, Lu
Zhang, Hongliang
Di, Boya
Niyato, Dusit
Song, Lingyang
IEEE COMMUNICATIONS MAGAZINE, 2025,
[49] UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Zheng, Sipeng
Zhou, Bohan
Feng, Yicheng
Wang, Ye
Lu, Zongqing
COMPUTER VISION - ECCV 2024, PT VIII, 2025, 15066 : 426 - 443
[50] QueryMintAI: Multipurpose Multimodal Large Language Models for Personal Data
Ghosh, Ananya
Deepa, K.
IEEE ACCESS, 2024, 12 : 144631 - 144651

← 1 2 3 4 5 →