Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation

被引:0
|
作者
Lei, Sen [1 ]
Xiao, Xinyu [2 ]
Zhang, Tianlin [3 ]
Li, Heng-Chao [1 ]
Shi, Zhenwei [4 ]
Zhu, Qing [5 ]
机构
[1] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu 611756, Peoples R China
[2] Co Ant Grp, Hangzhou 688688, Peoples R China
[3] AVIC, Luoyang Inst Electroopt Equipment, Luoyang 471000, Peoples R China
[4] Beihang Univ, Image Proc Ctr, Sch Astronaut, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
[5] Southwest Jiaotong Univ, Fac Geosci & Engn, Chengdu 611756, Peoples R China
基金
中国国家自然科学基金;
关键词
Remote sensing; Image segmentation; Visualization; Feature extraction; Linguistics; Transformers; Electronic mail; Adaptation models; Object recognition; Grounding; Fine-grained image-text alignment; referring image segmentation; remote sensing images; CLASSIFICATION; NETWORK;
D O I
10.1109/TGRS.2024.3522293
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Given a language expression, referring remote sensing image segmentation (RRSIS) aims to identify ground objects and assign pixelwise labels within the imagery. One of the key challenges for this task is to capture discriminative multimodal features via image-text alignment. However, the existing RRSIS methods use one vanilla and coarse alignment, where the language expression is directly extracted to be fused with the visual features. In this article, we argue that a "fine-grained image-text alignment" can improve the extraction of multimodal information. To this point, we propose a new RRSIS method to fully exploit the visual and linguistic representations. Specifically, the original referring expression is regarded as context text, which is further decoupled into the ground object and spatial position texts. The proposed fine-grained image-text alignment module (FIAM) would simultaneously leverage the features of the input image and the corresponding texts, obtaining better discriminative multimodal representation. Meanwhile, to handle the various scales of ground objects in remote sensing, we introduce a text-aware multiscale enhancement module (TMEM) to adaptively perform cross-scale fusion and intersections. We evaluate the effectiveness of the proposed method on two public referring remote sensing datasets including RefSegRS and RRSIS-D, and our method obtains superior performance over several state-of-the-art methods. The code will be publicly available at https://github.com/Shaosifan/FIANet.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Fine-Grained Information Supplementation and Value-Guided Learning for Remote Sensing Image-Text Retrieval
    Zhou, Zihui
    Feng, Yong
    Qiu, Agen
    Duan, Guofan
    Zhou, Mingliang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 19194 - 19210
  • [2] Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting
    Chen, Wenting
    Wang, Pengyu
    Ren, Hui
    Sun, Lichao
    Li, Quanzheng
    Yuan, Yixuan
    Li, Xiang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 240 - 250
  • [3] ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
    Messina, Nicola
    Stefanini, Matteo
    Cornia, Marcella
    Baraldi, Lorenzo
    Falchi, Fabrizio
    Amato, Giuseppe
    Cucchiara, Rita
    19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 64 - 70
  • [4] Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
    Yang, Cong
    Li, Zuchao
    Zhang, Lefei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
  • [5] Referring Image Segmentation With Fine-Grained Semantic Funneling Infusion
    Yang, Jiaxing
    Zhang, Lihe
    Lu, Huchuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14727 - 14738
  • [6] Towards Fast and Accurate Image-Text Retrieval With Self-Supervised Fine-Grained Alignment
    Zhuang, Jiamin
    Yu, Jing
    Ding, Yang
    Qu, Xiangyan
    Hu, Yue
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1361 - 1372
  • [7] Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment
    Zhuang, Jiamin
    Yu, Jing
    Ding, Yang
    Qu, Xiangyan
    Hu, Yue
    arXiv, 2023,
  • [8] Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval
    Li, Jiangtong
    Liu, Liu
    Niu, Li
    Zhang, Liqing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 (30) : 9193 - 9207
  • [9] Collaborative fine-grained interaction learning for image-text sentiment analysis
    Xiao, Xingwang
    Pu, Yuanyuan
    Zhou, Dongming
    Cao, Jinde
    Gu, Jinjing
    Zhao, Zhengpeng
    Xu, Dan
    KNOWLEDGE-BASED SYSTEMS, 2023, 279
  • [10] RSITR-FFT: Efficient Fine-Grained Fine-Tuning Framework With Consistency Regularization for Remote Sensing Image-Text Retrieval
    Xiu, Di
    Ji, Luyan
    Geng, Xiurui
    Wu, Yirong
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21