GLM-Net : Global and Local Motion Estimation via Task-Oriented Encoder-Decoder Structure

被引：2

作者：

Yang, Yuchen ^{[1
]}

Xiang, Ye ^{[1
]}

Liu, Shuaicheng ^{[2
]}

Wu, Lifang ^{[1
]}

Zhao, Boxuan ^{[1
]}

Zeng, Bing ^{[2
]}

机构：

[1] Beijing Univ Technol, Beijing, Peoples R China

[2] Univ Elect Sci & Technol China, Chengdu, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

基金：

中国国家自然科学基金;

关键词：

Video understanding; motion pattern; optical flow; motion estimation;

D O I：

10.1145/3474085.3475556

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we study the problem of separating the global camera motion and the local dynamic motion from an optical flow. Previous methods either estimate global motions by a parametric model, such as a homography, or estimate both of them by an optical flow field. However, none of these methods can directly estimate global and local motions through an end-to-end manner. In addition, separating the two motions accurately from a hybrid flow field is challenging. Because one motion can easily confuse the estimate of the other one when they are compounded together. To this end, we propose an end-to-end global and local motion estimation network GLM-Net. We design two encoder-decoder structures for the motion separation in the optical flow based on different task orientations. One structure adopts a mask autoencoder to extract the global motion, while the other one uses attention U-net for the local motion refinement. We further designed two effective training methods to overcome the problem of lacking supervisions. We apply our method on the action recognition datasets NCAA and UCF-101 to verify the accuracy of the local motion, and the homography estimation dataset DHE for the accuracy of the global motion. Experimental results show that our method can achieve competitive performance in both tasks at the same time, validating the effectiveness of the motion separation.

引用

页码：4211 / 4219

页数：9

共 2 条

[1] Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability
Zhao, Tiancheng
Lu, Allen
Lee, Kyusong
Eskenazi, Maxine
18TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2017), 2017, : 27 - 36
[2] Self-Attention (SA)-ConvLSTM Encoder-Decoder Structure-Based Video Prediction for Dynamic Motion Estimation
Kim, Jeongdae
Choo, Hyunseung
Jeong, Jongpil
APPLIED SCIENCES-BASEL, 2024, 14 (23):

← 1 →