GLM-Net : Global and Local Motion Estimation via Task-Oriented Encoder-Decoder Structure

被引:2
|
作者
Yang, Yuchen [1 ]
Xiang, Ye [1 ]
Liu, Shuaicheng [2 ]
Wu, Lifang [1 ]
Zhao, Boxuan [1 ]
Zeng, Bing [2 ]
机构
[1] Beijing Univ Technol, Beijing, Peoples R China
[2] Univ Elect Sci & Technol China, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
Video understanding; motion pattern; optical flow; motion estimation;
D O I
10.1145/3474085.3475556
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we study the problem of separating the global camera motion and the local dynamic motion from an optical flow. Previous methods either estimate global motions by a parametric model, such as a homography, or estimate both of them by an optical flow field. However, none of these methods can directly estimate global and local motions through an end-to-end manner. In addition, separating the two motions accurately from a hybrid flow field is challenging. Because one motion can easily confuse the estimate of the other one when they are compounded together. To this end, we propose an end-to-end global and local motion estimation network GLM-Net. We design two encoder-decoder structures for the motion separation in the optical flow based on different task orientations. One structure adopts a mask autoencoder to extract the global motion, while the other one uses attention U-net for the local motion refinement. We further designed two effective training methods to overcome the problem of lacking supervisions. We apply our method on the action recognition datasets NCAA and UCF-101 to verify the accuracy of the local motion, and the homography estimation dataset DHE for the accuracy of the global motion. Experimental results show that our method can achieve competitive performance in both tasks at the same time, validating the effectiveness of the motion separation.
引用
收藏
页码:4211 / 4219
页数:9
相关论文
共 2 条
  • [1] Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability
    Zhao, Tiancheng
    Lu, Allen
    Lee, Kyusong
    Eskenazi, Maxine
    18TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2017), 2017, : 27 - 36
  • [2] Self-Attention (SA)-ConvLSTM Encoder-Decoder Structure-Based Video Prediction for Dynamic Motion Estimation
    Kim, Jeongdae
    Choo, Hyunseung
    Jeong, Jongpil
    APPLIED SCIENCES-BASEL, 2024, 14 (23):