Joint Attention for Automated Video Editing

被引:2
|
作者
Wu, Hui-Yin [1 ]
Santarra, Trevor [2 ]
Leece, Michael [3 ]
Vargas, Rolando [3 ]
Jhala, Arnav [4 ]
机构
[1] Univ Cote dAzur, INRIA, Sophia Antipolis, France
[2] Unity Technol, San Francisco, CA USA
[3] Univ Calif Santa Cruz, Santa Cruz, CA USA
[4] North Carolina State Univ, Raleigh, NC USA
关键词
smart conferencing; automated video editing; joint attention; LSTM;
D O I
10.1145/3391614.3393656
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Joint attention refers to the shared focal points of attention for occupants in a space. In this work, we introduce a computational definition of joint attention for the automated editing of meetings in multi-camera environments from the AMI corpus. Using extracted head pose and individual headset amplitude as features, we developed three editing methods: (1) a naive audio-based method that selects the camera using only the headset input, (2) a rule-based edit that selects cameras at a fixed pacing using pose data, and (3) an editing algorithm using LSTM (Long-short term memory) learned joint-attention from both pose and audio data, trained on expert edits. The methods are evaluated qualitatively against the human edit, and quantitatively in a user study with 22 participants. Results indicate that LSTM-trained joint attention produces edits that are comparable to the expert edit, offering a wider range of camera views than audio, while being more generalizable as compared to rule-based methods.
引用
收藏
页码:55 / 64
页数:10
相关论文
共 50 条
  • [21] Automated Social Text Annotation With Joint Multilabel Attention Networks
    Dong, Hang
    Wang, Wei
    Huang, Kaizhu
    Coenen, Frans
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) : 2224 - 2238
  • [23] A digital video system for the automated measurement of repetitive joint motion
    Lu, CM
    Ferrier, NJ
    IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2004, 8 (03): : 399 - 404
  • [24] THE IMPORTANCE OF VIDEO EDITING IN AUTOMATED IMAGE-ANALYSIS IN STUDIES OF THE CEREBRAL-CORTEX
    TERRY, RD
    DETERESA, R
    JOURNAL OF THE NEUROLOGICAL SCIENCES, 1982, 53 (03) : 413 - 421
  • [25] EFFECTS OF VIDEO MODELING ON TEACHING BIDS FOR JOINT ATTENTION TO CHILDREN WITH AUTISM
    Rudy, Nikki A.
    Betz, Alison M.
    Malone, Evadne
    Henry, Justine E.
    Chong, Ivy M.
    BEHAVIORAL INTERVENTIONS, 2014, 29 (04) : 269 - 285
  • [26] ALANET: Adaptive Latent Attention Network for Joint Video Deblurring and Interpolation
    Gupta, Akash
    Aich, Abhishek
    Roy-Chowdhury, Amit K.
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 256 - 264
  • [27] A JOINT MODEL FOR ACTION LOCALIZATION AND CLASSIFICATION IN UNTRIMMED VIDEO WITH VISUAL ATTENTION
    Li, Weimian
    Wang, Wenmin
    Chen, Xiongtao
    Wang, Jinzhuo
    Li, Ge
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 619 - 624
  • [28] VIDEO TRICKERY - VIDEO AND FILM EDITING
    SUMPTER, MJ
    SIGHT AND SOUND, 1991, 1 (05): : 63 - 63
  • [29] MAPS: Joint Multimodal Attention and POS Sequence Generation for Video Captioning
    Zou, Cong
    Wang, Xuchen
    Hu, Yaosi
    Chen, Zhenzhong
    Liu, Shan
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [30] Video editing tutorial
    Lamison-White, E
    7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XII, PROCEEDINGS: INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS: II, 2003, : 80 - 87