Joint Coding of Local and Global Deep Features in Videos for Visual Search

被引:15
|
作者
Ding, Lin [1 ]
Tian, Yonghong [1 ,2 ]
Fan, Hongfei [3 ]
Chen, Changhuai [4 ]
Huang, Tiejun [1 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Natl Engn Lab Video Technol, Beijing 100871, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518066, Peoples R China
[3] Kingsoft Cloud Co, Beijing 100085, Peoples R China
[4] Hikvision Co, Hangzhou 310012, Peoples R China
基金
中国国家自然科学基金;
关键词
Local deep feature; joint coding; visual search; inter-feature correlation; EFFICIENT APPROACH; QUANTIZATION; DESCRIPTORS; RETRIEVAL;
D O I
10.1109/TIP.2020.2965306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Practically, it is more feasible to collect compact visual features rather than the video streams from hundreds of thousands of cameras into the cloud for big data analysis and retrieval. Then the problem becomes which kinds of features should be extracted, compressed and transmitted so as to meet the requirements of various visual tasks. Recently, many studies have indicated that the activations from the convolutional layers in convolutional neural networks (CNNs) can be treated as local deep features describing particular details inside an image region, which are then aggregated (e.g., using Fisher Vectors) as a powerful global descriptor. Combination of local and global features can satisfy those various needs effectively. It has also been validated that, if only local deep features are coded and transmitted to the cloud while the global features are recovered using the decoded local features, the aggregated global features should be lossy and consequently would degrade the overall performance. Therefore, this paper proposes a joint coding framework for local and global deep features (DFJC) extracted from videos. In this framework, we introduce a coding scheme for real-valued local and global deep features with intra-frame lossy coding and inter-frame reference coding. The theoretical analysis is performed to understand how the number of inliers varies with the number of local features. Moreover, the inter-feature correlations are exploited in our framework. That is, local feature coding can be accelerated by making use of the frame types determined with global features, while the lossy global features aggregated with the decoded local features can be used as a reference for global feature coding. Extensive experimental results under three metrics show that our DFJC framework can significantly reduce the bitrate of local and global deep features from videos while maintaining the retrieval performance.
引用
收藏
页码:3734 / 3749
页数:16
相关论文
共 50 条
  • [31] Learning Visual Object Categories with Global Descriptors and Local Features
    Pereira, Rui
    Lopes, Luis Seabra
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5816 : 225 - 236
  • [32] Binding global and local object features in visual working memory
    Ericson, Justin M.
    Beck, Melissa R.
    van Lamsweerde, Amanda E.
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2016, 78 (01) : 94 - 106
  • [33] Binding global and local object features in visual working memory
    Justin M. Ericson
    Melissa R. Beck
    Amanda E. van Lamsweerde
    Attention, Perception, & Psychophysics, 2016, 78 : 94 - 106
  • [34] Interpreting local visual features as a global shape requires awareness
    Schwarzkopf, D. Samuel
    Rees, Geraint
    PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2011, 278 (1715) : 2207 - 2215
  • [35] A Combined Visual Tracker based on Global Appearance and Local Features
    Yang, Tianyang
    Jin, Lizuo
    Li, Yawei
    Cui, Tong
    2016 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION (ICIA), 2016, : 602 - 607
  • [36] Conformer: Local Features Coupling Global Representations for Visual Recognition
    Peng, Zhiliang
    Huang, Wei
    Gu, Shanzhi
    Xie, Lingxi
    Wang, Yaowei
    Jiao, Jianbin
    Ye, Qixiang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 357 - 366
  • [37] An Occlusion-Aware Tracker With Local-Global Features Modeling in UAV Videos
    Jin, Qiuyu
    Han, Yuqi
    Wang, Wenzheng
    Tang, Linbo
    Li, Jianan
    Deng, Chenwei
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 5403 - 5415
  • [38] Video Google: Efficient visual search of videos
    Sivic, Josef
    Zisserman, Andrew
    TOWARD CATEGORY-LEVEL OBJECT RECOGNITION, 2006, 4170 : 127 - +
  • [39] Deep Joint Source-Channel Coding for Image Transmission With Visual Protection
    Xu, Jialong
    Ai, Bo
    Chen, Wei
    Wang, Ning
    Rodrigues, Miguel
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2023, 9 (06) : 1399 - 1411
  • [40] LOCAL SEARCH IN CODING THEORY
    AARTS, EHL
    VANLAARHOVEN, PJM
    DISCRETE MATHEMATICS, 1992, 106 : 11 - 18