Learning Semantic Alignment Using Global Features and Multi-Scale Confidence

被引：0

作者：

Xu, Huaiyuan ^{[1
]}

Liao, Jing ^{[2
]}

Liu, Huaping ^{[3
]}

Sun, Yuxiang ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Mech Engn, Kowloon, Hong Kong, Peoples R China

[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China

[3] Tsinghua Univ, Inst Artificial Intelligence, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Semantics; Correlation; Feature extraction; Transformers; Training; Task analysis; Probabilistic logic; Semantic alignment; enhancement transformer; probabilistic correlation computation; cross-domain alignment;

D O I：

10.1109/TCSVT.2023.3288370

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Semantic alignment aims to establish pixel correspondences between images based on semantic consistency. It can serve as a fundamental component for various downstream computer vision tasks, such as style transfer and exemplar-based colorization, etc. Many existing methods use local features and their cosine similarities to infer semantic alignment. However, they struggle with significant intra-class variation of objects, such as appearance, size, etc. In other words, contents with the same semantics tend to be significantly different in vision. To address this issue, we propose a novel deep neural network of which the core lies in global feature enhancement and adaptive multi-scale inference. Specifically, two modules are proposed: an enhancement transformer for enhancing semantic features with global awareness; a probabilistic correlation module for adaptively fusing multi-scale information based on the learned confidence scores. We use the unified network architecture to achieve two types of semantic alignment, namely, cross-object semantic alignment and cross-domain semantic alignment. Experimental results demonstrate that our method achieves competitive performance on five standard cross-object semantic alignment benchmarks, and outperforms the state of the arts in cross-domain semantic alignment.

引用

页码：897 / 910

页数：14

共 50 条

[21] Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild
Zhao, Zengqun
Liu, Qingshan
Wang, Shanmin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6544 - 6556
[22] Prediction of Aeroengine Remaining Life by Combining Multi-scale Local Features and Transformer Global Learning
Chen, Jun-Ying
Xi, Yue-Yun
Li, Zhao-Yang
Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (09): : 1818 - 1830
[23] Multi-Scale Dictionary Learning Using Wavelets
Ophir, Boaz
Lustig, Michael
Elad, Michael
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2011, 5 (05) : 1014 - 1024
[24] Learning relevant features of data with multi-scale tensor networks
Stoudenmire, E. Miles
QUANTUM SCIENCE AND TECHNOLOGY, 2018, 3 (03):
[25] LEARNING MULTI-SCALE FEATURES FOR JPEG IMAGE ARTIFACTS REMOVAL
Ji, Jiahuan
Zhong, Baojiang
Song, Weigang
Ma, Kai-Kuang
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1565 - 1569
[26] LEARNING MULTI-SCALE ATTENTIVE FEATURES FOR SERIES PHOTO SELECTION
Huang, Jin
Cui, Chaoran
Zhang, Chunyun
Shen, Zhen
Yu, Jun
Yin, Yilong
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2742 - 2746
[27] Global and Local Multi-scale Feature Fusion for Object Detection and Semantic Segmentation
Lim, Young-Chul
Kang, Minsung
2019 30TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV19), 2019, : 2557 - 2562
[28] EnCoSum: enhanced semantic features for multi-scale multi-modal source code summarization
Gao, Yuexiu
Zhang, Hongyu
Lyu, Chen
EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (05)
[29] EnCoSum: enhanced semantic features for multi-scale multi-modal source code summarization
Yuexiu Gao
Hongyu Zhang
Chen Lyu
Empirical Software Engineering, 2023, 28
[30] Multi-orientation and multi-scale features discriminant learning for palmprint recognition
Ma, Fei
Zhu, Xiaoke
Wang, Cailing
Liu, Huajun
Jing, Xiao-Yuan
NEUROCOMPUTING, 2019, 348 : 169 - 178

← 1 2 3 4 5 →