Fine-Grained Length Controllable Video Captioning With Ordinal Embeddings

被引：0

作者：

Nitta, Tomoya ^{[1
,2
]}

Fukuzawa, Takumi ^{[2
]}

Tamaki, Toru ^{[2
]}

机构：

[1] Toshiba, Kawasaki 2128582, Japan

[2] Nagoya Inst Technol, Nagoya 4668555, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

日本学术振兴会;

关键词：

Decoding; Vectors; Earth Observing System; Training; Long short term memory; Data models; Web sites; Video on demand; Reviews; Reliability; Video captioning; length controllable generation; ordinal embedding;

D O I：

10.1109/ACCESS.2024.3506751

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes a method for video captioning that controls the length of generated captions. Previous work on length control often had few levels for expressing length. In this study, we propose two methods of length embedding for fine-grained length control. A traditional embedding method is linear, using a one-hot vector and an embedding matrix. In this study, we propose methods that represent length in multi-hot vectors. One is bit embedding that expresses length in bit representation, and the other is ordinal embedding that uses the binary representation often used in ordinal regression. These length representations of multi-hot vectors are converted into length embedding by a nonlinear MLP. This method allows for not only the length control of caption sentences but also the control of the time when reading the caption. Experiments using ActivityNet Captions and Spoken Moments in Time show that the proposed method effectively controls the length of the generated captions. Analysis of the embedding vectors with Independent Component Analysis (ICA) shows that length and semantics were learned separately, demonstrating the effectiveness of the proposed embedding methods. Our code and online demo are available at https://huggingface.co/spaces/fztkm/length_controllable_video_captioning.

引用

页码：189667 / 189688

页数：22

共 50 条

[41] Spotting Temporally Precise, Fine-Grained Events in Video
Hong, James
Zhang, Haotian
Gharbi, Michael
Fisher, Matthew
Fatahalian, Kayvon
COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 33 - 51
[42] Fine-Grained Video Categorization with Redundancy Reduction Attention
Zhu, Chen
Tan, Xiao
Zhou, Feng
Liu, Xiao
Yue, Kaiyu
Ding, Errui
Ma, Yi
COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 139 - 155
[43] FiGO: Fine-Grained Query Optimization in Video Analytics
Cao, Jiashen
Sarkar, Karan
Hadidi, Ramyad
Arulraj, Joy
Kim, Hyesoon
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 559 - 572
[44] Fine-grained talking face generation with video reinterpretation
Xin Huang
Mingjie Wang
Minglun Gong
The Visual Computer, 2021, 37 : 95 - 105
[45] Fine-Grained Motion Estimation for Video Frame Interpolation
Yan, Bo
Tan, Weimin
Lin, Chuming
Shen, Liquan
IEEE TRANSACTIONS ON BROADCASTING, 2021, 67 (01) : 174 - 184
[46] Modeling Video as Stochastic Processes for Fine-Grained Video Representation Learning
Zhang, Heng
Liu, Daqing
Zheng, Qi
Su, Bing
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2225 - 2234
[47] FINE-GRAINED COLOUR DISCRIMINATION WITHOUT FINE-GRAINED COLOUR
Gert, Joshua
AUSTRALASIAN JOURNAL OF PHILOSOPHY, 2015, 93 (03) : 602 - 605
[48] Online/Offline and Fine-Grained Controllable Editing with Accountability and Revocability in Blockchains
Guo, Lifeng
Ma, Xueke
Yau, Wei-Chuen
BLOCKCHAIN TECHNOLOGY AND APPLICATION, CBCS 2023, 2024, 2098 : 125 - 153
[49] STYLEPTB: A Compositional Benchmark for Fine-grained Controllable Text Style Transfer
Lyu, Yiwei
Liang, Paul Pu
Pham, Hai
Hovy, Eduard
Poczos, Barnabas
Salakhutdinov, Ruslan
Morency, Louis-Philippe
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2116 - 2138
[50] Fine-grained person-based image captioning via advanced spectrum parsing
Wu, Jianhui
Ni, Fan
Wang, Zijie
Ju, Haoyu
Zhang, Yue
Hu, Fangqiang
Li, Yifeng
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 34015 - 34030

← 1 2 3 4 5 →