Multi-stream CNN based Video Semantic Segmentation for Automated Driving

被引：3

作者：

Sistu, Ganesh ^{[1
]}

Chennupati, Sumanth ^{[2
]}

Yogamani, Senthil ^{[1
]}

机构：

[1] Valeo Vis Syst, Dublin, Ireland

[2] Valeo Troy, Troy, NY USA

来源：

PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5 | 2019年

关键词：

Semantic Segmentation; Visual Perception; Automated Driving;

D O I：

10.5220/0007248401730180

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Majority of semantic segmentation algorithms operate on a single frame even in the case of videos. In this work, the goal is to exploit temporal information within the algorithm model for leveraging motion cues and temporal consistency. We propose two simple high-level architectures based on Recurrent FCN (RFCN) and Multi-Stream FCN (MSFCN) networks. In case of RFCN, a recurrent network namely LSTM is inserted between the encoder and decoder. MSFCN combines the encoders of different frames into a fused encoder via 1x1 channel-wise convolution. We use a ResNet50 network as the baseline encoder and construct three networks namely MSFCN of order 2 & 3 and RFCN of order 2. MSFCN-3 produces the best results with an accuracy improvement of 9% and 15% for Highway and New York-like city scenarios in the SYNTHIA-CVPR'16 dataset using mean IoU metric. MSFCN-3 also produced 11% and 6% for SegTrack V2 and DAVIS datasets over the baseline FCN network. We also designed an efficient version of MSFCN-2 and RFCN-2 using weight sharing among the two encoders. The efficient MSFCN-2 provided an improvement of 11% and 5% for KITTI and SYNTHIA with negligible increase in computational complexity compared to the baseline version.

引用

页码：173 / 180

页数：8

共 50 条

[31] Medical image segmentation based on active fusion-transduction of multi-stream features?
Shu, Yucheng
Zhang, Jing
Xiao, Bin
Li, Weisheng
KNOWLEDGE-BASED SYSTEMS, 2021, 220
[32] Arabic Handwriting Recognition Based on Synchronous Multi-stream HMM Without Explicit Segmentation
Jayech, Khaoula
Mahjoub, Mohamed Ali
Ben Amara, Najoua Essoukri
HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2015), 2015, 9121 : 136 - 145
[33] Multi-stream Attention-based BLSTM with Feature Segmentation for Speech Emotion Recognition
Chiba, Yuya
Nose, Takashi
Ito, Akinori
INTERSPEECH 2020, 2020, : 3301 - 3305
[34] Automated speech recognition by multi-stream dynamic time warping
Mohamadi, T
Gharbi, AH
Mezaache, S
Harrag, A
CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING 2001, VOLS I AND II, CONFERENCE PROCEEDINGS, 2001, : 527 - 531
[35] Stochastic Fusion for Multi-stream Neural Network in Video Classification
Huang, Yu-Min
Tseng, Huan-Hsin
Chien, Jen-Tzung
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 69 - 74
[36] BiSwift: Bandwidth Orchestrator for Multi-Stream Video Analytics on Edge
Su, Lin
Wang, Weijun
Yuan, Tingting
Li, Liang
Dai, Aipeng
Liu, Yunxin
Fu, Xiaoming
IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2024, : 1181 - 1190
[37] Multi-stream video transport over DiffServ wireless LANs
Man, H
Li, Y
2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, 2003, : 541 - 544
[38] On Adversarial Robustness of Semantic Segmentation Models for Automated Driving
Yin, Huilin
Wang, Ruining
Liu, Boyu
Yan, Jun
2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 867 - 873
[39] Automated Evaluation of Semantic Segmentation Robustness for Autonomous Driving
Zhou, Wei
Berrio, Julie Stephany
Worrall, Stewart
Nebot, Eduardo
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (05) : 1951 - 1963
[40] Leveraging Semantic Scene Characteristics and Multi-Stream Convolutional Architectures in a Contextual Approach for Video-Based Visual Emotion Recognition in the Wild
Pikoulis, Ioannis
Filntisis, Panagiotis P.
Maragos, Petros
2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,

← 1 2 3 4 5 →