Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing

被引：3

作者：

Koorathota, Sharath ^{[1
,2
]}

Adelman, Patrick ^{[2
]}

Cotton, Kelly ^{[3
]}

Sajda, Paul ^{[1
]}

机构：

[1] Columbia Univ, Dept Biomed Engn, New York, NY 10027 USA

[2] Fovea Inc, New York, NY 10001 USA

[3] CUNY, Grad Ctr, Dept Psychol, New York, NY USA

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021 | 2021年

关键词：

D O I：

10.1109/CVPRW53098.2021.00186

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose an automated video editing model, which we term contextual and multimodal video editing (CMVE). The model leverages visual and textual metadata describing videos, integrating essential information from both modalities, and uses a learned editing style from a single example video to coherently combine clips. The editing model is useful for tasks such as generating news clip montages and highlight reels given a text query that describes the video storyline. The model exploits the perceptual similarity between video frames, objects in videos and text descriptions to emulate coherent video editing. Amazon Mechanical Turk participants made judgements comparing CMVE to expert human editing. Experimental results showed no significant difference in the CMVE vs human edited video in terms of matching the text query and the level of interest each generates, suggesting CMVE is able to effectively integrate semantic information across visual and textual modalities and create perceptually coherent quality videos typical of human video editors. We publicly release an online demonstration of our method.

引用

页码：1701 / 1709

页数：9

共 50 条

[31] Interactive Intrinsic Video Editing
Bonneel, Nicolas
Sunkavalli, Kalyan
Tompkin, James
Sun, Deqing
Paris, Sylvain
Pfister, Hanspeter
ACM TRANSACTIONS ON GRAPHICS, 2014, 33 (06):
[32] FILM EDITING THE VIDEO WAY
LANG, S
LANG, S
INDUSTRIAL PHOTOGRAPHY, 1984, 33 (09): : 34 - 35
[33] Basic Thinking of Video Editing
Cao, Yimei
APPLIED ECONOMICS, BUSINESS AND DEVELOPMENT, 2011, 208 : 99 - 104
[34] Physically Based Video Editing
Bazin, J-C.
Pluss , C.
Yu, G.
Martin, T.
Jacobson, A.
Gross, M.
COMPUTER GRAPHICS FORUM, 2016, 35 (07) : 421 - 429
[35] Analogies based video editing
Yan, WQ
Wang, J
Kankanhalli, MS
MULTIMEDIA SYSTEMS, 2005, 11 (01) : 3 - 18
[36] Narrative Annotation and Editing of Video
Lombardo, Vincenzo
Damiano, Rossana
INTERACTIVE STORYTELLING, 2010, 6432 : 62 - +
[37] Timeline Editing of Objects in Video
Lu, Shao-Ping
Zhang, Song-Hai
Wei, Jin
Hu, Shi-Min
Martin, Ralph R.
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2013, 19 (07) : 1218 - 1227
[38] VIDEO EDITING AND SPECIAL EFFECTS
FERGUSON, PR
CONFERENCE PROCEEDINGS FOR THE 1989 NAUI INTERNATIONAL CONFERENCE ON UNDERWATER EDUCATION, 1989, : 85 - 88
[39] Geodesic Image and Video Editing
Criminisi, Antonio
Sharp, Toby
Rother, Carsten
Perez, Patrick
ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (05):
[40] Nonlinear editing by generative video
Jasinschi, RS
Moura, JMF
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 1220 - 1223

← 1 2 3 4 5 →