A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model

被引:0
|
作者
Hu, Panwen [1 ]
Xiao, Nan [1 ]
Li, Feifei [1 ]
Chen, Yongquan [2 ]
Huang, Rui [1 ]
机构
[1] Chinese Univ Hong Kong, SSE, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, AIRS, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
关键词
video editing; video representation; reinforcement learning; BROADCAST; CAPTURE; FILM;
D O I
10.1145/3581783.3611878
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this era of videos, automatic video editing techniques attract more and more attention from industry and academia since they can reduce workloads and lower the requirements for human editors. Existing automatic editing systems are mainly scene- or event-specific, e.g., soccer game broadcasting, yet the automatic systems for general editing, e.g., movie or vlog editing which covers various scenes and events, were rarely studied before, and converting the event-driven editing method to a general scene is nontrivial. In this paper, we propose a two-stage scheme for general editing. Firstly, unlike previous works that extract scene-specific features, we leverage the pre-trained Vision-Language Model (VLM) to extract the editing-relevant representations as editing context. Moreover, to close the gap between the professional-looking videos and the automatic productions generated with simple guidelines, we propose a Reinforcement Learning (RL)-based editing framework to formulate the editing problem and train the virtual editor to make better sequential editing decisions. Finally, we evaluate the proposed method on a more general editing task with a real movie dataset. Experimental results demonstrate the effectiveness and benefits of the proposed context representation and the learning ability of our RL-based editing framework.
引用
收藏
页码:6441 / 6450
页数:10
相关论文
共 50 条
  • [31] Enhancing Real-Time Semantic Segmentation with Textual Knowledge of Pre-Trained Vision-Language Model: A Lightweight Approach
    Lin, Chia-Yi
    Chen, Jun-Cheng
    Wu, Ja-Ling
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 551 - 558
  • [32] Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model
    Xu, Zipeng
    Lin, Tianwei
    Tang, Hao
    Li, Fu
    He, Dongliang
    Sebe, Nicu
    Timofte, Radu
    Van Gool, Luc
    Ding, Errui
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18208 - 18217
  • [33] Using Pre-trained Language Model to Enhance Active Learning for Sentence Matching
    Bai, Guirong
    He, Shizhu
    Liu, Kang
    Zhao, Jun
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [34] Grammatical Error Correction by Transferring Learning Based on Pre-Trained Language Model
    Han M.
    Wang Y.
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2022, 56 (11): : 1554 - 1560
  • [35] Monkeypox Virus Detection Using Pre-trained Deep Learning-based Approaches
    Sitaula, Chiranjibi
    Shahi, Tej Bahadur
    JOURNAL OF MEDICAL SYSTEMS, 2022, 46 (11)
  • [36] Monkeypox Virus Detection Using Pre-trained Deep Learning-based Approaches
    Chiranjibi Sitaula
    Tej Bahadur Shahi
    Journal of Medical Systems, 46
  • [37] Learning and Evaluating a Differentially Private Pre-trained Language Model
    Hoory, Shlomo
    Feder, Amir
    Tendler, Avichai
    Cohen, Alon
    Erell, Sofia
    Laish, Itay
    Nakhost, Hootan
    Stemmer, Uri
    Benjamini, Ayelet
    Hassidim, Avinatan
    Matias, Yossi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1178 - 1189
  • [38] Automatic Title Generation for Text with Pre-trained Transformer Language Model
    Mishra, Prakhar
    Diwan, Chaitali
    Srinivasa, Srinath
    Srinivasaraghavan, G.
    2021 IEEE 15TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2021), 2021, : 17 - 24
  • [39] Fusing Pre-trained Language Models with Multimodal Prompts through Reinforcement Learning
    Yu, Youngjae
    Chung, Jiwan
    Yun, Heeseung
    Hessel, Jack
    Park, Jae Sung
    Lu, Ximing
    Zellers, Rowan
    Ammanabrolu, Prithviraj
    Le Bras, Ronan
    Kim, Gunhee
    Choi, Yejin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10845 - 10856
  • [40] Pre-trained Bert for Natural Language Guided Reinforcement Learning in Atari Game
    Li, Xin
    Zhang, Yu
    Luo, Junren
    Liu, Yifeng
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 5119 - 5124