Shuffled Grouping Cross-Channel Attention-based Bilateral-Filter-Interpolation Deformable ConvNet with Applications to Benthonic Organism Detection

被引:1
|
作者
Chen T. [1 ]
Wang N. [2 ]
机构
[1] School of Marine Electrical Engineering, Dalian Maritime University, Dalian
[2] School of Marine Engineering and the Dalian Key Laboratory of Green Power Control and Test for Intelligent Ships, Dalian Maritime University, Dalian
来源
基金
中国国家自然科学基金;
关键词
Artificial intelligence; Benthonic organism detection; bilateral-filterinterpolation deformable ConvNet; Color; Convolution; deep learning; Feature extraction; Interpolation; Kernel; Organisms; shuffled grouping crosschannel attention;
D O I
10.1109/TAI.2024.3385387
中图分类号
学科分类号
摘要
In this paper, to holistically tackle underwater detection degradation due to unknown geometric variation arising from scale, pose, viewpoint and occlusion under low-contrast and color-distortion circumstances, a shuffled grouping cross-channel attention-based bilateral-filter-interpolation deformable ConvNet (SGCA-BDC) framework is established for benthonic organism detection. Main contributions are as follows: 1) By comprehensively considering spatial and feature similarities between offset and integral coordinate positions, the bilateral-filter-interpolation deformable ConvNet (BDC) with modulation weight mechanism is created, such that sampling ability of convolutional kernel for benthonic organism with unknown geometric variation can be adaptively augmented from spatial perspective. 2) By utilizing 1-D convolution to recalibrate channel weight for grouped sub-feature via information entropy statistic technique, a shuffled grouping cross-channel attention (SGCA) module is innovated, such that seabed background noise can be suppressed from channel aspect. 3) The proposed SGCA-BDC scheme is eventually built in an organic manner by incorporating BDC and SGCA modules. Comprehensive experiments and comparisons demonstrate that the SGCA-BDC scheme remarkably outperforms typical detection approaches including Faster RCNN, SSD, YOLOv6, YOLOv7, YOLOv8, RetinaNet and CenterNet in terms of mean average precision by 8.54%, 4.4%, 5.18%, 3.1%, 3.01%, 12.53% and 7.09%, respectively. IEEE
引用
收藏
页码:1 / 13
页数:12
相关论文
共 1 条
  • [1] CROSS-CHANNEL ATTENTION-BASED TARGET SPEAKER VOICE ACTIVITY DETECTION: EXPERIMENTAL RESULTS FOR THE M2MET CHALLENGE
    Wang, Weiqing
    Qin, Xiaoyi
    Li, Ming
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9171 - 9175