Controllable Image Caption Generation Based on Encoder-decoder for Power Construction Scene

被引：0

作者：

Yang R. ^{[1
]}

Shao J. ^{[1
]}

Luo Y. ^{[1
]}

Bai W. ^{[2
]}

机构：

[1] College of Electronics and Information Engineering, Shanghai University of Electric Power, Pudong New District, Shanghai

[2] Gansu Electric Power Research Institute, Gansu Province, Lanzhou

来源：

Dianwang Jishu/Power System Technology | 2022年 / 46卷 / 07期

基金：

中国国家自然科学基金;

关键词：

activation function; controllable image caption; FVC R-CNN model; MT-LSTM neural network; multi-branch decision strategy; power construction scene;

D O I：

10.13335/j.1000-3673.pst.2021.2400

中图分类号：

学科分类号：

摘要：

Image caption generation of electric power construction scene adopts deep learning based on encoding and decoding technology to understand the image information and convert it into text description output, so as to warn the potential security risks and enrich the output forms of traditional image analysis technology. The traditional image caption generation method lacks controllability and has insufficient detail descriptions, and there are few the researches on image description of electric power construction scene. Therefore, an optimization method of controllable image caption generation based on encoding and decoding is proposed. A new feature extraction model, the FVC R-CNN, is introduced as an encoder to extract the salient features and common visual features of the images. An improved MT-LSTM network for feature decoding is obtained by improving the activation function. Finally, the output is optimized by a multi-branch decision strategy. The power scene description dataset is trained and tested on the Ubuntu16.04 and PyTorch deep learning framework. Experimental results show that the accuracy of image caption generation is significantly improved, and the controllability of scene description is enhanced, which effectively improves the intelligent level of safety management in the power construction scene. © 2022 Power System Technology Press. All rights reserved.

引用

页码：2572 / 2580

页数：8

共 36 条

[1] XIAO Zhiyun, WANG Haiqiang, Typical Small Target Fault Identification of High-voltage Transmission Lines Based on Image Double Seg-mentation and Fusion of Multi-features in Wavelet Domain[J], Power System Technology, 45, 11, pp. 4461-4469, (2021)
[2] 24, 128
[3] XU Shoukun, NI Chuhan, JI Chenchen, Research on image caption method based on safety helmet wearing detection[J], Journal of Chinese Computer Systems, 41, 4, pp. 812-819, (2020)
[4] YANG Liqiong, CAI Liqiang, GU Song, Detection on wearing behavior of safety helmet based on machine learning method[J], Journal of Safety Science And Technology, 15, 10, pp. 152-157, (2019)
[5] ZHANG Mingyuan, CAO Zhiying, ZHAO Xuefeng, On the identification of the safety helmet wearing manners for the construction company workers based on the deep learning theory[J], Journal of Safety and Environment, 19, 2, pp. 535-541, (2019)
[6] FANG Ming, SUN Tengteng, SHAO Zhen, Fast helmet-wearing-condition detection based on improved YOLOv2[J], Optics and Precision Engineering, 27, 5, pp. 1196-1205, (2019)
[7] Wei LIANG, JING Zhao, ZHOU Zhiguo, Research on detection algorithm of helmet wearing state in electric construction[J], Computer Engineering and Applications, 24, 56, pp. 499-503, (2020)
[8] XU Shoukun, NI Chuhan, JI Chenchen, Image caption of safety helmets wearing in construction scene based on YOLOv3[J], Computer Science, 47, 8, pp. 233-240, (2020)
[9] GAO Ming, ZUO Hongqun, BAI Fan, Visual relationship detection-based emergency early-warning description generation in electric power industry[J], Journal of Image and Graphics, 26, 7, pp. 1583-1593, (2021)
[10] XU Shoukun, JI Chenchen, NI Chuhan, Image description generation model integrating construction scenes and spatial relationship[J], Computer Engineering, 46, 6, pp. 256-265, (2020)

← 1 2 3 4 →