Joint Attention for Automated Video Editing

被引：2

作者：

Wu, Hui-Yin ^{[1
]}

Santarra, Trevor ^{[2
]}

Leece, Michael ^{[3
]}

Vargas, Rolando ^{[3
]}

Jhala, Arnav ^{[4
]}

机构：

[1] Univ Cote dAzur, INRIA, Sophia Antipolis, France

[2] Unity Technol, San Francisco, CA USA

[3] Univ Calif Santa Cruz, Santa Cruz, CA USA

[4] North Carolina State Univ, Raleigh, NC USA

来源：

PROCEEDINGS OF THE 2020 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES, IMX 2020 | 2020年

关键词：

smart conferencing; automated video editing; joint attention; LSTM;

D O I：

10.1145/3391614.3393656

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Joint attention refers to the shared focal points of attention for occupants in a space. In this work, we introduce a computational definition of joint attention for the automated editing of meetings in multi-camera environments from the AMI corpus. Using extracted head pose and individual headset amplitude as features, we developed three editing methods: (1) a naive audio-based method that selects the camera using only the headset input, (2) a rule-based edit that selects cameras at a fixed pacing using pose data, and (3) an editing algorithm using LSTM (Long-short term memory) learned joint-attention from both pose and audio data, trained on expert edits. The methods are evaluated qualitatively against the human edit, and quantitatively in a user study with 22 participants. Results indicate that LSTM-trained joint attention produces edits that are comparable to the expert edit, offering a wider range of camera views than audio, while being more generalizable as compared to rule-based methods.

引用

页码：55 / 64

页数：10

共 50 条

[21] Automated Social Text Annotation With Joint Multilabel Attention Networks
Dong, Hang
Wang, Wei
Huang, Kaizhu
Coenen, Frans
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) : 2224 - 2238
[22] Workshop: Infants and joint attention: A clinical perspective using video
Downing, G.
INFANT MENTAL HEALTH JOURNAL, 2010, 31 (03) : 146 - 146
[23] A digital video system for the automated measurement of repetitive joint motion
Lu, CM
Ferrier, NJ
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2004, 8 (03): : 399 - 404
[24] THE IMPORTANCE OF VIDEO EDITING IN AUTOMATED IMAGE-ANALYSIS IN STUDIES OF THE CEREBRAL-CORTEX
TERRY, RD
DETERESA, R
JOURNAL OF THE NEUROLOGICAL SCIENCES, 1982, 53 (03) : 413 - 421
[25] EFFECTS OF VIDEO MODELING ON TEACHING BIDS FOR JOINT ATTENTION TO CHILDREN WITH AUTISM
Rudy, Nikki A.
Betz, Alison M.
Malone, Evadne
Henry, Justine E.
Chong, Ivy M.
BEHAVIORAL INTERVENTIONS, 2014, 29 (04) : 269 - 285
[26] ALANET: Adaptive Latent Attention Network for Joint Video Deblurring and Interpolation
Gupta, Akash
Aich, Abhishek
Roy-Chowdhury, Amit K.
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 256 - 264
[27] A JOINT MODEL FOR ACTION LOCALIZATION AND CLASSIFICATION IN UNTRIMMED VIDEO WITH VISUAL ATTENTION
Li, Weimian
Wang, Wenmin
Chen, Xiongtao
Wang, Jinzhuo
Li, Ge
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 619 - 624
[28] VIDEO TRICKERY - VIDEO AND FILM EDITING
SUMPTER, MJ
SIGHT AND SOUND, 1991, 1 (05): : 63 - 63
[29] MAPS: Joint Multimodal Attention and POS Sequence Generation for Video Captioning
Zou, Cong
Wang, Xuchen
Hu, Yaosi
Chen, Zhenzhong
Liu, Shan
2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
[30] Video editing tutorial
Lamison-White, E
7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XII, PROCEEDINGS: INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS: II, 2003, : 80 - 87

← 1 2 3 4 5 →