Beyond audio and video retrieval: topic-oriented multimedia summarization

被引:13
|
作者
Metze, Florian [1 ]
Ding, Duo [1 ]
Younessian, Ehsan [1 ]
Hauptmann, Alexander [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Multimedia summarization; Event detection and recounting; Natural language generation;
D O I
10.1007/s13735-012-0028-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given the deluge of multimedia content that is becoming available over the Internet, it is increasingly important to be able to effectively examine and organize these large stores of information inways that go beyond browsing or collaborative filtering. In this paper, we review previous work on audio and video processing, and define the task of topic-oriented multimedia summarization (TOMS) using natural language generation (NLG): given a set of automatically extracted features from a video, a TOMS system will automatically generate a paragraph of natural language, which summarizes the important information in a video belonging to a certain topic, and for example provides explanations for why a video was matched and retrieved. Possible features include visual semantic concepts, objects, and actions, environmental sounds, and transcripts from automatic speech recognition (ASR). We see this as a first step towards systems that will be able to discriminate visually similar, but semantically different videos, compare two videos and provide textual output or summarize a large number of videos at once. In this paper, we introduce our approach of solving the TOMS problem. We extract various visual concept features, environmental sounds and ASR transcription features from a given video, and develop a template-based NLG system to produce a textual recounting based on the extracted features. We also propose possible experimental designs for continuously evaluating and improving TOMS systems, and present results of a pilot evaluation of our initial system.
引用
收藏
页码:131 / 144
页数:14
相关论文
共 50 条
  • [31] Multi-Document Biased Summarization Based on Topic-Oriented Characteristic Database of Term-Pair Co-Occurrence
    Liu, Nan
    He, Yanxiang
    Chen, Qiang
    Peng, Min
    Fang, Wenqi
    2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 832 - 837
  • [32] Keyword-guided Topic-oriented Conversational Recommender System
    Pan, Yiming
    Yin, Yunfei
    Huang, Faliang
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [33] Foxinfo1.0: A Chinese Topic-oriented Search Engine
    Sun, Ke
    Lin, Lei
    Liu, Bingquan
    Sun, Chengjie
    Wang, Xiaolong
    2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 91 - 96
  • [34] Topic-oriented search model based on multi-agent
    Shen Jie
    Sun Rong-Shuang
    Wei Liu-Hua
    Mang Hui
    Zhu Yan
    Chen Chen
    CIS: 2007 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PROCEEDINGS, 2007, : 276 - 280
  • [35] Time oriented video summarization
    Liu, CQ
    Xia, T
    Li, H
    IMAGE ANALYSIS AND RECOGNITION, 2005, 3656 : 99 - 106
  • [36] Latent topic model for audio retrieval
    Hu, Pengfei
    Liu, Wenju
    Jiang, Wei
    Yang, Zhanlei
    PATTERN RECOGNITION, 2014, 47 (03) : 1138 - 1143
  • [37] Detecting Topic-oriented Overlapping Community Using Hybrid a Hypergraph Model
    Shen, G. L.
    Yang, X. P.
    Sun, J.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2016, 11 (04) : 538 - 552
  • [38] Latent Topic Modeling 14 Audio Corpus Summarization
    Hazen, Timothy J.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 920 - 923
  • [39] Beyond audio and video: Multimedia networking support for distributed, immersive virtual environments
    Jeffay, K
    Hudson, T
    Parris, M
    PROCEEDINGS OF THE 27TH EUROMICRO CONFERENCE - 2001: A NET ODYSSEY, 2001, : 300 - 307
  • [40] Clustering and visualizing audiovisual dataset on mobile devices in a topic-oriented manner
    Wang, Lei
    Tjondrongoro, Dian
    Liu, Yuee
    ADVANCES IN VISUAL INFORMATION SYSTEMS, 2007, 4781 : 310 - 321