Relation-Aware Distribution Representation Network for Person Clustering With Multiple Modalities

被引：0

作者：

Liu, Kaijian ^{[1
]}

Tang, Shixiang ^{[2
]}

Li, Ziyue ^{[3
,4
]}

Li, Zhishuai ^{[1
]}

Bai, Lei ^{[5
]}

Zhu, Feng ^{[1
]}

Zhao, Rui ^{[1
,6
]}

机构：

[1] SenseTime Res, Shanghai 200030, Peoples R China

[2] Univ Sydney, Sydney, NSW 2050, Australia

[3] Univ Cologne, D-50923 Cologne, Germany

[4] EWI gGmbH, D-50827 Cologne, Germany

[5] Shanghai AI Lab, Shanghai 200030, Peoples R China

[6] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai 200040, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Faces; Feature extraction; Streaming media; Task analysis; Measurement; Semantics; Motion pictures; Person clustering; Multi-modality clues; Distribution learning; Multi-modal representations; AFFINITY GRAPH; MULTIVIEW; FACES; RANK;

D O I：

10.1109/TMM.2023.3304454

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for various tasks, such as movie parsing and identity-based movie editing. Related methods such as multi-view clustering mainly project multi-modal features into a joint feature space. However, multi-modal clue features are usually rather weakly correlated due to the semantic gap from the modality-specific uniqueness. As a result, these methods are not suitable for person clustering. In this article, we propose a Relation-Aware Distribution representation Network (RAD-Net) to generate a distribution representation for multi-modal clues. The distribution representation of a clue is a vector consisting of the relation between this clue and all other clues from all modalities, thus being modality agnostic and good for person clustering. Accordingly, we introduce a graph-based method to construct distribution representation and employ a cyclic update policy to refine distribution representation progressively. Our method achieves substantial improvements of +6% and +8.2% in F-score on the Video Person-Clustering Dataset (VPCD) and VoxCeleb2 multi-view clustering dataset, respectively.

引用

页码：8371 / 8382

页数：12

共 50 条

[1] RelationTrack: Relation-Aware Multiple Object Tracking With Decoupled Representation
Yu, En
Li, Zhuoling
Han, Shoudong
Wang, Hongwei
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2686 - 2697
[2] RAN: A Relation-aware Network for Relation Extraction
Li, Yile
Gu, Xiaoyan
Yue, Yinliang
Wang, Zhuo
Li, Bo
Wang, Weiping
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[3] Selective arguments representation with dual relation-aware network for video situation recognition
Liu W.
He Q.
Wang C.
Peng Y.
Xie S.
Neural Computing and Applications, 2024, 36 (17) : 9945 - 9961
[4] Relation-aware non-local attention network for person re-identification
Li, Sujuan
Xie, Gengsheng
MULTIMEDIA SYSTEMS, 2025, 31 (01)
[5] Relation-aware aggregation network with auxiliary guidance for text-based person search
Zeng, Pengpeng
Jing, Shuaiqi
Song, Jingkuan
Fan, Kaixuan
Li, Xiangpeng
We, Liansuo
Guo, Yuan
World Wide Web, 2022, 25 (04) : 1565 - 1582
[6] Relation-aware aggregation network with auxiliary guidance for text-based person search
Pengpeng Zeng
Shuaiqi Jing
Jingkuan Song
Kaixuan Fan
Xiangpeng Li
Liansuo We
Yuan Guo
World Wide Web, 2022, 25 : 1565 - 1582
[7] Person Re-Identification Using Local Relation-Aware Graph Convolutional Network
Lian, Yu
Huang, Wenmin
Liu, Shuang
Guo, Peng
Zhang, Zhong
Durrani, Tariq S.
SENSORS, 2023, 23 (19)
[8] Relation-aware aggregation network with auxiliary guidance for text-based person search
Zeng, Pengpeng
Jing, Shuaiqi
Song, Jingkuan
Fan, Kaixuan
Li, Xiangpeng
We, Liansuo
Guo, Yuan
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2022, 25 (04): : 1565 - 1582
[9] A relation-aware representation approach for the question matching system
Chen, Yanmin
Chen, Enhong
Zhang, Kun
Liu, Qi
Sun, Ruijun
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (02):
[10] Task Relation-aware Continual User Representation Learning
Kim, Sein
Lee, Namkyeong
Kim, Donghyun
Yang, Minchul
Park, Chanyoung
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 1107 - 1119

← 1 2 3 4 5 →