Multi-stream CNN based Video Semantic Segmentation for Automated Driving

被引:3
|
作者
Sistu, Ganesh [1 ]
Chennupati, Sumanth [2 ]
Yogamani, Senthil [1 ]
机构
[1] Valeo Vis Syst, Dublin, Ireland
[2] Valeo Troy, Troy, NY USA
关键词
Semantic Segmentation; Visual Perception; Automated Driving;
D O I
10.5220/0007248401730180
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Majority of semantic segmentation algorithms operate on a single frame even in the case of videos. In this work, the goal is to exploit temporal information within the algorithm model for leveraging motion cues and temporal consistency. We propose two simple high-level architectures based on Recurrent FCN (RFCN) and Multi-Stream FCN (MSFCN) networks. In case of RFCN, a recurrent network namely LSTM is inserted between the encoder and decoder. MSFCN combines the encoders of different frames into a fused encoder via 1x1 channel-wise convolution. We use a ResNet50 network as the baseline encoder and construct three networks namely MSFCN of order 2 & 3 and RFCN of order 2. MSFCN-3 produces the best results with an accuracy improvement of 9% and 15% for Highway and New York-like city scenarios in the SYNTHIA-CVPR'16 dataset using mean IoU metric. MSFCN-3 also produced 11% and 6% for SegTrack V2 and DAVIS datasets over the baseline FCN network. We also designed an efficient version of MSFCN-2 and RFCN-2 using weight sharing among the two encoders. The efficient MSFCN-2 provided an improvement of 11% and 5% for KITTI and SYNTHIA with negligible increase in computational complexity compared to the baseline version.
引用
收藏
页码:173 / 180
页数:8
相关论文
共 50 条
  • [41] Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification
    Wu, Zuxuan
    Jiang, Yu-Gang
    Wang, Xi
    Ye, Hao
    Xue, Xiangyang
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 791 - 800
  • [42] Efficient multi-resolution multi-stream video systems with standard codecs
    Civanlar, MR
    Gaglianello, RD
    Cash, GL
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 1997, 17 (2-3): : 269 - 279
  • [43] Efficient Multi-Resolution Multi-Stream Video Systems with Standard Codecs
    M. Reha Civanlar
    Robert D. Gaglianello
    Glenn L. Cash
    Journal of VLSI signal processing systems for signal, image and video technology, 1997, 17 : 269 - 279
  • [44] DBN based multi-stream models for speech
    Zhang, YM
    Diao, Q
    Huang, S
    Hu, W
    Bartels, C
    Bilmes, J
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 836 - 839
  • [45] Sleep Posture Classification with Multi-Stream CNN Using Vertical Distance Map
    Li, Yan-Ying
    Lei, Yan-Jing
    Chen, Lyn Chao-ling
    Hung, Yi-Ping
    2018 INTERNATIONAL WORKSHOP ON ADVANCED IMAGE TECHNOLOGY (IWAIT), 2018,
  • [46] TOPIC DETECTION IN CONVERSATIONAL TELEPHONE SPEECH USING CNN WITH MULTI-STREAM INPUTS
    Sun, Jian
    Guo, Wu
    Chen, Zhi
    Song, Yan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7285 - 7289
  • [47] A Single-Stream Segmentation and Depth Prediction CNN for Autonomous Driving
    Aladem, Mohamed
    Rawashdeh, Samir A.
    IEEE INTELLIGENT SYSTEMS, 2021, 36 (04) : 79 - 85
  • [48] DeepWriter: A Multi-Stream Deep CNN for Text-independent Writer Identification
    Xing, Linjie
    Qiao, Yu
    PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, : 584 - 589
  • [49] Multi-stream Adaptive Offloading of Joint Compressed Video Streams, Feature Streams, and Semantic Streams in Edge Computing Systems
    Hu, Dieli
    Ji, Wen
    Wang, Zhi
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 996 - 1001
  • [50] FMSNet: A Multi-Stream CNN for Multi-Stereo Image Classification by Feature Map Sharing
    Can, Ferit
    Eyupoglu, Can
    IEEE ACCESS, 2024, 12 : 105566 - 105572