Presenting an Order-Aware Multimodal Fusion Framework for Financial Advisory Summarization With an Exclusive Video Dataset

被引：0

作者：

Das, Sarmistha ^{[1
]}

Ghosh, Samrat ^{[2
]}

Tiwari, Abhisek ^{[1
]}

Lynghoi, R. E. Zera Marveen ^{[1
]}

Saha, Sriparna ^{[1
]}

Murad, Zak ^{[3
]}

Maurya, Alka ^{[3
]}

机构：

[1] IIT Patna, CSE Dept, Patna 801106, India

[2] Ramakrishna Mission Vivekananda Educ & Res Inst, Belur 711202, West Bengal, India

[3] CRISIL Ltd, Mumbai 400072, India

来源：

IEEE ACCESS | 2025年 / 13卷

关键词：

Finance; Social networking (online); Web sites; Digital audio broadcasting; Visualization; Investment; Blogs; Biological system modeling; Video on demand; Transformers; Financial dataset; financial advisory videos; multimodality; summary generation; order-aware fusion; LLMs in finance;

D O I：

10.1109/ACCESS.2025.3551124

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Amidst the current digital era, global entrepreneurship and financial awareness dissemination has surged via online podcasts and videos showcasing insightful expertise from diverse financial domain professionals. However, existing financial summarization techniques predominantly focus on textual and numerical data, neglecting the potential of a time-saving multimodal data. Addressing this gap, we introduce FinMSG, a pioneering framework for generating concise and informative summaries from lengthy financial expert videos. Leveraging a multimodal transformer-based architecture and an ordered-aware fusion algorithm, FinMSG processes text, audio, and video features to distil key opinions and insights from diverse financial domains. Subsequently, we present FAV, a one-of-its-kind multimodal financial advice video corpus comprising 420 videos across diverse domains, with gold-standard summary annotations. Through extensive experimentation and human evaluation, we demonstrate the efficacy of FinMSG in producing high-quality financial summaries while also investigating the interplay between different modalities. In addition, we investigated the capabilities of widely recognized small language models, such as BART and T5, alongside advanced large language models, including LLaMA-2 and GPT-3.5, to evaluate their proficiency in handling financial tasks within a multimodal configuration. By offering a transparent and time-efficient means for laypeople to access and comprehend finance insights, our work represents a significant advancement in multimodal financial summarisation. Code and dataset are available at (https://github.com/sarmistha-D/Fin-OAF).

引用

页码：48367 / 48379

页数：13