FastClip: An Efficient Video Understanding System with Heterogeneous Computing and Coarse-to-fine Processing

被引：0

作者：

Zhao, Liming ^{[1
]}

Sun, Siyang ^{[1
]}

Zhang, Yanhao ^{[1
]}

Zheng, Yun ^{[1
]}

Pan, Pan ^{[1
]}

机构：

[1] Alibaba Grp, Hangzhou, Peoples R China

来源：

COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION | 2022年

关键词：

video understanding; heterogeneous computing; system speedup;

D O I：

10.1145/3487553.3524209

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, video medias are exponentially growing in many areas such as E-commerce shopping and gaming. Understanding the video contents is critical for real-world applications. However, processing long videos is usually time-consuming and expensive. In this paper, we present an efficient video understanding system, which aims to speed up the video processing with a coarse-to-fine two-stage pipeline and heterogeneous computing framework. First, we use a coarse but fast multi-modal filtering module to recognize and remove useless video segments from a long video, which could be deployed on an edge device and reduce computations for the next processing. Second, several semantic models are applied for finely parsing the remained sequences. To accelerate the model inference, we propose a novel heterogeneous computing framework, which trains a model with lightweight and heavyweight backbones to support a distributed deployment on a powerful device (e.g., cloud or GPU) and another different device (e.g., edge or CPU). In this way, the model could be both efficient and effective. The proposed system has been widely used in Alibaba, including "Taobao Live Analysis" and "Commodity Short-Video Generation", which could achieve a 10x speedup for the system.

引用

页码：67 / 71

页数：5

共 50 条

[21] Coarse-to-Fine: A hierarchical DNN inference framework for edge computing
Zhang, Zao
Zhang, Yuning
Bao, Wei
Li, Changyang
Yuan, Dong
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 157 : 180 - 192
[22] EFFICIENT HUMAN ACTION DETECTION: A COARSE-TO-FINE STRATEGY
Wu, Xian
Lai, Jianhuang
Chen, Xilin
2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 701 - 704
[23] Coarse-to-fine strategy for robust and efficient change detectors
Bevilacqua, A
Di Stefano, L
Lanza, A
AVSS 2005: ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, PROCEEDINGS, 2005, : 87 - 92
[24] Balanced coarse-to-fine federated learning for noisy heterogeneous clients
Han, Longfei
Zhai, Ying
Jia, Yanan
Cai, Qiang
Li, Haisheng
Huang, Xiankai
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (02)
[25] Efficient Monocular Coarse-to-Fine Object Pose Estimation
Feng, Rong
Zhang, Hong
2016 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, 2016, : 1617 - 1622
[26] Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
Tian, Kaibin
Cheng, Yanhua
Liu, Yi
Hou, Xinglin
Chen, Quan
Li, Han
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5207 - 5214
[27] Fully Convolutional Video Captioning with Coarse-to-Fine and Inherited Attention
Fang, Kuncheng
Zhou, Lian
Jin, Cheng
Zhang, Yuejie
Weng, Kangnian
Zhang, Tao
Fan, Weiguo
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8271 - 8278
[28] Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Wang, Ziyang
Sung, Yi-Lin
Cheng, Feng
Bertasius, Gedas
Bansal, Mohit
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2804 - 2815
[29] Coarse-to-fine online learning for hand segmentation in egocentric video
Ying Zhao
Zhiwei Luo
Changqin Quan
EURASIP Journal on Image and Video Processing, 2018
[30] Coarse-to-fine online learning for hand segmentation in egocentric video
Zhao, Ying
Luo, Zhiwei
Quan, Changqin
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2018,

← 1 2 3 4 5 →