Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing

被引:3
|
作者
Koorathota, Sharath [1 ,2 ]
Adelman, Patrick [2 ]
Cotton, Kelly [3 ]
Sajda, Paul [1 ]
机构
[1] Columbia Univ, Dept Biomed Engn, New York, NY 10027 USA
[2] Fovea Inc, New York, NY 10001 USA
[3] CUNY, Grad Ctr, Dept Psychol, New York, NY USA
关键词
D O I
10.1109/CVPRW53098.2021.00186
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an automated video editing model, which we term contextual and multimodal video editing (CMVE). The model leverages visual and textual metadata describing videos, integrating essential information from both modalities, and uses a learned editing style from a single example video to coherently combine clips. The editing model is useful for tasks such as generating news clip montages and highlight reels given a text query that describes the video storyline. The model exploits the perceptual similarity between video frames, objects in videos and text descriptions to emulate coherent video editing. Amazon Mechanical Turk participants made judgements comparing CMVE to expert human editing. Experimental results showed no significant difference in the CMVE vs human edited video in terms of matching the text query and the level of interest each generates, suggesting CMVE is able to effectively integrate semantic information across visual and textual modalities and create perceptually coherent quality videos typical of human video editors. We publicly release an online demonstration of our method.
引用
收藏
页码:1701 / 1709
页数:9
相关论文
共 50 条
  • [21] VIDEO EDITING TECHNIQUE
    不详
    BELL LABORATORIES RECORD, 1968, 46 (11): : 385 - &
  • [22] Seamless video editing
    Wang, HC
    Raskar, R
    Ahuja, N
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, 2004, : 858 - 861
  • [23] UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing
    Cao, Meng
    Huang, Haozhi
    Wang, Hao
    Wang, Xuan
    Shen, Li
    Wang, Sheng
    Bao, Linchao
    Li, Zhifeng
    Luo, Jiebo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6107 - 6116
  • [24] GENE EDITING First in vivo CRISPR gene editing in humans
    Cross, Ryan
    CHEMICAL & ENGINEERING NEWS, 2020, 98 (09) : 25 - 25
  • [25] A generic framework for editing and synthesizing multimodal data with relative emotion strength
    Chan, Jacky C. P.
    Shum, Hubert P. H.
    Wang, He
    Yi, Li
    Wei, Wei
    Ho, Edmond S. L.
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (06)
  • [26] Nonlinear hierarchical editing: A powerful framework for face editing
    Niu, Yongjie
    Zhou, Pengbo
    Chi, Hao
    Zhou, Mingquan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
  • [27] Information-theoretic content selection for automated home video editing
    Wang, Patricia P.
    Wang, Tho
    Li, Jianguo
    Zhang, Yimin
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 2233 - 2236
  • [28] VIDEO TRICKERY - VIDEO AND FILM EDITING
    SUMPTER, MJ
    SIGHT AND SOUND, 1991, 1 (05): : 63 - 63
  • [29] Ethics of gene editing in humans
    Penchaszadeh, Victor B.
    REVISTA COLOMBIANA DE BIOETICA, 2022, 17 (01): : 1 - 12
  • [30] The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing
    Argaw, Dawit Mureja
    Heilbron, Fabian Caba
    Lee, Joon-Young
    Woodson, Markus
    Kweon, In So
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 201 - 218