Appearance Label Balanced Triplet Loss for Multi-modal Aerial View Object Classification

被引:2
|
作者
Puttagunta, Raghunath Sai [1 ]
Li, Zhu [1 ]
Bhattacharyya, Shuvra [2 ]
York, George [3 ]
机构
[1] Univ Missouri, Kansas City, MO 64110 USA
[2] Univ Maryland, College Pk, MD USA
[3] US Air Force Acad, Colorado Springs, CO USA
关键词
LONG-TAILED RECOGNITION; NETWORK;
D O I
10.1109/CVPRW59228.2023.00060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic target recognition (ATR) using image data is an important computer vision task with widespread applications in remote sensing for surveillance, object tracking, urban planning, agriculture, and more. Although there have been continuous advancements in this task, there is still significant room for further advancements, particularly with aerial images. This work extracts rich information from multimodal synthetic aperture radar (SAR) and electro-optical (EO) aerial images to perform object classification. Compared to EO images, the advantages of SAR images are that they can be captured at night and in any weather condition. Compared to EO images, the disadvantage of SAR images is that they are noisy. Overcoming the noise inherent to SAR images is a challenging, but worthwhile, task because of the additional information SAR images provide the model. This work proposes a training strategy that involves the creation of appearance labels to generate triplet pairs for training the network with both triplet loss and cross-entropy loss. During the development phase of the 2023 Perception Beyond Visual Spectrum (PBVS) Multi-modal Aerial Image Object Classification (MAVOC) challenge, our ResNet-34 model achieved a top-1 accuracy of 64.29% for Track 1 and our ensemble learning model achieved a top-1 accuracy 75.84% for Track 2. These values are 542% and 247% higher than the baseline values. Overall, this work ranked 3rd in both Track 1 and Track 2.
引用
收藏
页码:534 / 542
页数:9
相关论文
共 50 条
  • [41] Label-Guided Cross-Modal Attention Network for Multi-Label Aerial Image Classification
    Chen, Ying
    Zhang, Ding
    Han, Tao
    Meng, Xiaoliang
    Gao, Mianxin
    Wang, Teng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [42] Cross-Modal Retrieval Augmentation for Multi-Modal Classification
    Gur, Shir
    Neverova, Natalia
    Stauffer, Chris
    Lim, Ser-Nam
    Kiela, Douwe
    Reiter, Austin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 111 - 123
  • [43] MMDistill: Multi-Modal BEV Distillation Framework for Multi-View 3D Object Detection
    Jiao, Tianzhe
    Chen, Yuming
    Zhang, Zhe
    Guo, Chaopeng
    Song, Jie
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (03): : 4307 - 4325
  • [44] Three-View Cotton Flower Counting through Multi-Object Tracking and Multi-Modal Imaging
    Tan, Chenjiao
    Li, Changying
    Sun, Jin
    Song, Huaibo
    2023 ASABE Annual International Meeting, 2023,
  • [45] Improved Sentiment Classification by Multi-modal Fusion
    Gan, Lige
    Benlamri, Rachid
    Khoury, Richard
    2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 11 - 16
  • [46] Multi-modal optical microcavities for loss avoidance
    Shainline, Jeffrey M.
    Orcutt, Jason
    Wade, Mark T.
    Meade, Roy
    Tehar-Zahav, Ofer
    Sternberg, Zvi
    Stojanovic, Vladimir
    Popovic, Milos A.
    2013 CONFERENCE ON LASERS AND ELECTRO-OPTICS (CLEO), 2013,
  • [47] Multi-modal classification in digital news libraries
    Chen, MY
    Hauptmann, A
    JCDL 2004: PROCEEDINGS OF THE FOURTH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES: GLOBAL REACH AND DIVERSE IMPACT, 2004, : 212 - 213
  • [48] Multi-modal Music Genre Classification Approach
    Zhen, Chao
    Xu, Jieping
    PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (ICCSIT 2010), VOL 8, 2010, : 398 - 402
  • [49] An approach to multi-modal multi-view video coding
    Zhang, Yun
    Jiang, Gangyi
    Yi, Wenjuan
    Yu, Mei
    Jiang, Zhidi
    Kim, Yong Deak
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1405 - +
  • [50] Toward Multi-modal Music Emotion Classification
    Yang, Yi-Hsuan
    Lin, Yu-Ching
    Cheng, Heng-Tze
    Liao, I-Bin
    Ho, Yeh-Chin
    Chen, Homer H.
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2008, 9TH PACIFIC RIM CONFERENCE ON MULTIMEDIA, 2008, 5353 : 70 - +