Deep Learning;
Multimodal Models;
Large Language Models;
Machine Learning;
Natural Language Processing;
Vision;
Vision-Language Models;
D O I:
10.1145/3689091.3690086
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Video is an increasingly prominent and information-dense medium, yet it poses substantial challenges for language models. A typical video consists of a sequence of shorter segments, or shots, that collectively form a coherent narrative. Each shot is analogous to a word in a sentence where multiple data streams of information (such as visual and auditory data) must be processed simultaneously. Comprehension of the entire video requires not only understanding the visual-audio information of each shot but also requires that the model links the ideas between each shot to generate a larger, all-encompassing story. Despite significant progress in the field, current works often overlook videos' more granular shot-by-shot semantic information. In this project, we propose a family of efficient large language vision models (LLVMs) to boost video summarization and captioning called Shotluck Holmes. By leveraging better pretraining and data collection strategies, we extend the abilities of existing small LLVMs from being able to understand a picture to being able to understand a sequence of frames. Specifically, we show that Shotluck Holmes achieves better performance than state-of-the-art results on the Shot2Story video captioning and summary task with significantly smaller and more computationally efficient models.
机构:
Chongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
Li, Yinghong
Yan, Yudong
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
Yan, Yudong
Tong, Zhuohao
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
Tong, Zhuohao
Wang, Yu
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
Wang, Yu
Yang, Yinqi
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
Yang, Yinqi
Bai, Mingze
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
Bai, Mingze
Pu, Dan
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
Pu, Dan
Xie, Jiazheng
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
Xie, Jiazheng
Liu, Chuan
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
Liu, Chuan
Li, Bo
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Normal Univ, Coll Life Sci, Chongqing 401331, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
Li, Bo
Liu, Mingwei
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Med Univ, Coll Lab Med, Key Lab Clin Lab Diagnost, Chongqing 400016, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China
Liu, Mingwei
Shu, Kunxian
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R ChinaChongqing Univ Posts & Telecommun, Chongqing Key Lab Big Data Bio Intelligence, Chongqing 400065, Peoples R China