Beyond audio and video retrieval: topic-oriented multimedia summarization

被引:13
|
作者
Metze, Florian [1 ]
Ding, Duo [1 ]
Younessian, Ehsan [1 ]
Hauptmann, Alexander [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Multimedia summarization; Event detection and recounting; Natural language generation;
D O I
10.1007/s13735-012-0028-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given the deluge of multimedia content that is becoming available over the Internet, it is increasingly important to be able to effectively examine and organize these large stores of information inways that go beyond browsing or collaborative filtering. In this paper, we review previous work on audio and video processing, and define the task of topic-oriented multimedia summarization (TOMS) using natural language generation (NLG): given a set of automatically extracted features from a video, a TOMS system will automatically generate a paragraph of natural language, which summarizes the important information in a video belonging to a certain topic, and for example provides explanations for why a video was matched and retrieved. Possible features include visual semantic concepts, objects, and actions, environmental sounds, and transcripts from automatic speech recognition (ASR). We see this as a first step towards systems that will be able to discriminate visually similar, but semantically different videos, compare two videos and provide textual output or summarize a large number of videos at once. In this paper, we introduce our approach of solving the TOMS problem. We extract various visual concept features, environmental sounds and ASR transcription features from a given video, and develop a template-based NLG system to produce a textual recounting based on the extracted features. We also propose possible experimental designs for continuously evaluating and improving TOMS systems, and present results of a pilot evaluation of our initial system.
引用
收藏
页码:131 / 144
页数:14
相关论文
共 50 条
  • [11] TOMDS (Topic-Oriented Multi-Document Summarization): Enabling Personalized Customization of Multi-Document Summaries
    Zhang, Xin
    Wei, Qiyi
    Song, Qing
    Zhang, Pengzhou
    APPLIED SCIENCES-BASEL, 2024, 14 (05):
  • [12] Discovering Topic-Oriented Highly Interactive Online Communities
    Das, Swarna
    Anwar, Md Musfique
    FRONTIERS IN BIG DATA, 2019, 2
  • [13] A distributed, graphical, topic-oriented document search system
    Light, J
    VISUAL DATA EXPLORATION AND ANALYSIS IV, 1997, 3017 : 129 - 135
  • [14] Video retrieval and summarization
    Sebe, N
    Lew, MS
    Smeulders, AWM
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2003, 92 (2-3) : 141 - 146
  • [15] Topic-Oriented Controlled Text Generation for Social Networks
    Zhian Yang
    Hao Jiang
    Aobo Deng
    Yang Li
    Journal of Signal Processing Systems, 2024, 96 : 131 - 151
  • [16] Topic-Oriented Exploratory Search Based on an Indexing Network
    Sun, HaiChun
    Jiang, ChangJun
    Ding, ZhiJun
    Wang, PengWei
    Zhou, MengChu
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2016, 46 (02): : 234 - 247
  • [17] Learning Topic-Oriented Word Embedding for Query Classification
    Yang, Hebin
    Hu, Qinmin
    He, Liang
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART I, 2015, 9077 : 188 - 198
  • [18] ROLE OF AUDIO IN VIDEO SUMMARIZATION
    Shoer, Ibrahim
    Kopru, Berkay
    Erzin, Engin
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [19] ZoomNet for Topic-Oriented Fragment Recognition in Long Documents
    Yan, Yukun
    Zheng, Daqi
    Lu, Zhengdong
    Song, Sen
    IEEE ACCESS, 2022, 10 : 39545 - 39554
  • [20] Detecting Topic-Oriented Speaker Stance in Conversational Speech
    Lai, Catherine
    Alex, Beatrice
    Moore, Johanna D.
    Tian, Leimin
    Hori, Tatsuro
    Francesca, Gianpiero
    INTERSPEECH 2019, 2019, : 46 - 50