Embedding-based pair generation for contrastive representation learning in audio-visual surveillance data

被引:0
|
作者
Wang, Wei-Cheng [1 ]
De Coninck, Sander [1 ]
Leroux, Sam [1 ]
Simoens, Pieter [1 ]
机构
[1] Univ Ghent, IDLab, imec, Ghent, Belgium
来源
关键词
self-supervised learning; surveillance; audio-visual representation learning; contrastive learning; audio-visual event localization; anomaly detection; event search;
D O I
10.3389/frobt.2024.1490718
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Smart cities deploy various sensors such as microphones and RGB cameras to collect data to improve the safety and comfort of the citizens. As data annotation is expensive, self-supervised methods such as contrastive learning are used to learn audio-visual representations for downstream tasks. Focusing on surveillance data, we investigate two common limitations of audio-visual contrastive learning: false negatives and the minimal sufficient information bottleneck. Irregular, yet frequently recurring events can lead to a considerable number of false-negative pairs and disrupt the model's training. To tackle this challenge, we propose a novel method for generating contrastive pairs based on the distance between embeddings of different modalities, rather than relying solely on temporal cues. The semantically synchronized pairs can then be used to ease the minimal sufficient information bottleneck along with the new loss function for multiple positives. We experimentally validate our approach on real-world data and show how the learnt representations can be used for different downstream tasks, including audio-visual event localization, anomaly detection, and event search. Our approach reaches similar performance as state-of-the-art modality- and task-specific approaches.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] ENHANCING CONTRASTIVE LEARNING WITH TEMPORAL COGNIZANCE FOR AUDIO-VISUAL REPRESENTATION GENERATION
    Lavania, Chandrashekhar
    Sundaram, Shiva
    Srinivasan, Sundararajan
    Kirchhoff, Katrin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4728 - 4732
  • [2] Contrastive embedding-based feature generation for generalized zero-shot learning
    Han Wang
    Tingting Zhang
    Xiaoxuan Zhang
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 1669 - 1681
  • [3] Contrastive embedding-based feature generation for generalized zero-shot learning
    Wang, Han
    Zhang, Tingting
    Zhang, Xiaoxuan
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (05) : 1669 - 1681
  • [4] Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
    Chen, Yanbei
    Xian, Yongqin
    Koepke, A. Sophia
    Shan, Ying
    Akata, Zeynep
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7012 - 7021
  • [5] Improving speech embedding using crossmodal transfer learning with audio-visual data
    Nam Le
    Jean-Marc Odobez
    Multimedia Tools and Applications, 2019, 78 : 15681 - 15704
  • [6] Improving speech embedding using crossmodal transfer learning with audio-visual data
    Nam Le
    Odobez, Jean-Marc
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (11) : 15681 - 15704
  • [7] Embedding-based Representation of Categorical Data by Hierarchical Value Coupling Learning
    Jian, Songlei
    Cao, Longbing
    Pang, Guansong
    Lu, Kai
    Gao, Hang
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1937 - 1943
  • [8] Robust Contrastive Learning Against Audio-Visual Noisy Correspondence
    Zhao, Yihan
    Xi, Wei
    Bai, Gairui
    Liu, Xinhui
    Zhao, Jizhong
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 526 - 540
  • [9] Audio-Visual Contrastive Learning with Temporal Self-Supervision
    Jenni, Simon
    Black, Alexander
    Collomosse, John
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 7996 - 8004
  • [10] Learning Bimodal Structure in Audio-Visual Data
    Monaci, Gianluca
    Vandergheynst, Pierre
    Sommer, Friedrich T.
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (12): : 1898 - 1910