Appearance Label Balanced Triplet Loss for Multi-modal Aerial View Object Classification

被引:2
|
作者
Puttagunta, Raghunath Sai [1 ]
Li, Zhu [1 ]
Bhattacharyya, Shuvra [2 ]
York, George [3 ]
机构
[1] Univ Missouri, Kansas City, MO 64110 USA
[2] Univ Maryland, College Pk, MD USA
[3] US Air Force Acad, Colorado Springs, CO USA
关键词
LONG-TAILED RECOGNITION; NETWORK;
D O I
10.1109/CVPRW59228.2023.00060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic target recognition (ATR) using image data is an important computer vision task with widespread applications in remote sensing for surveillance, object tracking, urban planning, agriculture, and more. Although there have been continuous advancements in this task, there is still significant room for further advancements, particularly with aerial images. This work extracts rich information from multimodal synthetic aperture radar (SAR) and electro-optical (EO) aerial images to perform object classification. Compared to EO images, the advantages of SAR images are that they can be captured at night and in any weather condition. Compared to EO images, the disadvantage of SAR images is that they are noisy. Overcoming the noise inherent to SAR images is a challenging, but worthwhile, task because of the additional information SAR images provide the model. This work proposes a training strategy that involves the creation of appearance labels to generate triplet pairs for training the network with both triplet loss and cross-entropy loss. During the development phase of the 2023 Perception Beyond Visual Spectrum (PBVS) Multi-modal Aerial Image Object Classification (MAVOC) challenge, our ResNet-34 model achieved a top-1 accuracy of 64.29% for Track 1 and our ensemble learning model achieved a top-1 accuracy 75.84% for Track 2. These values are 542% and 247% higher than the baseline values. Overall, this work ranked 3rd in both Track 1 and Track 2.
引用
收藏
页码:534 / 542
页数:9
相关论文
共 50 条
  • [1] NTIRE 2021 Multi-modal Aerial View Object Classification Challenge
    Liu, Jerrick
    Inkawhich, Nathan
    Nina, Oliver
    Timofte, Radu
    Duan, Yuru
    Li, Gongzhe
    Geng, Xueli
    Cai, Huanqia
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 588 - 595
  • [2] Efficient CNN Architecture for Multi-modal Aerial View Object Classification
    Miron, Casian
    Pasarica, Alexandru
    Timofte, Radu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 560 - 565
  • [3] Multi-modal Aerial View Object Classification Challenge Results - PBVS 2023
    Low, Spencer
    Nina, Oliver
    Sappa, Angel D.
    Blasch, Erik
    Inkawhich, Nathan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 412 - 421
  • [4] Cross Modality Knowledge Distillation for Multi-modal Aerial View Object Classification
    Yang, Lehan
    Xu, Kele
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 382 - 387
  • [5] Multi-modal Aerial View Object Classification Challenge Results - PBVS 2022
    Low, Spencer
    Nina, Oliver
    Sappa, Angel D.
    Blasch, Erik
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 349 - 357
  • [6] Complex Object Classification: A Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport
    Yang, Yang
    Wu, Yi-Feng
    Zhan, De-Chuan
    Liu, Zhi-Bin
    Jiang, Yuan
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 2594 - 2603
  • [7] Aerial Image Classification with Label Splitting and Optimized Triplet Loss Learning
    Liao, Rijun
    Li, Zhu
    Bhattacharyya, Shuvra S.
    York, George
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [8] Single-Label Multi-modal Field of Research Classification
    Ruosch, Florian
    Vasu, Rosni
    Wang, Ruijie
    Rossetto, Luca
    Bernstein, Abraham
    NATURAL SCIENTIFIC LANGUAGE PROCESSING AND RESEARCH KNOWLEDGE GRAPHS, NSLP 2024, 2024, 14770 : 224 - 233
  • [9] Multi-modal of object trajectories
    Partsinevelos, P.
    JOURNAL OF SPATIAL SCIENCE, 2008, 53 (01) : 17 - 30
  • [10] A MULTI-MODAL VIEW OF MEMORY
    HERRMANN, DJ
    SEARLEMAN, A
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1988, 26 (06) : 503 - 503