Fine-Grained Length Controllable Video Captioning With Ordinal Embeddings

被引:0
|
作者
Nitta, Tomoya [1 ,2 ]
Fukuzawa, Takumi [2 ]
Tamaki, Toru [2 ]
机构
[1] Toshiba, Kawasaki 2128582, Japan
[2] Nagoya Inst Technol, Nagoya 4668555, Japan
来源
IEEE ACCESS | 2024年 / 12卷
基金
日本学术振兴会;
关键词
Decoding; Vectors; Earth Observing System; Training; Long short term memory; Data models; Web sites; Video on demand; Reviews; Reliability; Video captioning; length controllable generation; ordinal embedding;
D O I
10.1109/ACCESS.2024.3506751
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a method for video captioning that controls the length of generated captions. Previous work on length control often had few levels for expressing length. In this study, we propose two methods of length embedding for fine-grained length control. A traditional embedding method is linear, using a one-hot vector and an embedding matrix. In this study, we propose methods that represent length in multi-hot vectors. One is bit embedding that expresses length in bit representation, and the other is ordinal embedding that uses the binary representation often used in ordinal regression. These length representations of multi-hot vectors are converted into length embedding by a nonlinear MLP. This method allows for not only the length control of caption sentences but also the control of the time when reading the caption. Experiments using ActivityNet Captions and Spoken Moments in Time show that the proposed method effectively controls the length of the generated captions. Analysis of the embedding vectors with Independent Component Analysis (ICA) shows that length and semantics were learned separately, demonstrating the effectiveness of the proposed embedding methods. Our code and online demo are available at https://huggingface.co/spaces/fztkm/length_controllable_video_captioning.
引用
收藏
页码:189667 / 189688
页数:22
相关论文
共 50 条
  • [41] Spotting Temporally Precise, Fine-Grained Events in Video
    Hong, James
    Zhang, Haotian
    Gharbi, Michael
    Fisher, Matthew
    Fatahalian, Kayvon
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 33 - 51
  • [42] Fine-Grained Video Categorization with Redundancy Reduction Attention
    Zhu, Chen
    Tan, Xiao
    Zhou, Feng
    Liu, Xiao
    Yue, Kaiyu
    Ding, Errui
    Ma, Yi
    COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 139 - 155
  • [43] FiGO: Fine-Grained Query Optimization in Video Analytics
    Cao, Jiashen
    Sarkar, Karan
    Hadidi, Ramyad
    Arulraj, Joy
    Kim, Hyesoon
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 559 - 572
  • [44] Fine-grained talking face generation with video reinterpretation
    Xin Huang
    Mingjie Wang
    Minglun Gong
    The Visual Computer, 2021, 37 : 95 - 105
  • [45] Fine-Grained Motion Estimation for Video Frame Interpolation
    Yan, Bo
    Tan, Weimin
    Lin, Chuming
    Shen, Liquan
    IEEE TRANSACTIONS ON BROADCASTING, 2021, 67 (01) : 174 - 184
  • [46] Modeling Video as Stochastic Processes for Fine-Grained Video Representation Learning
    Zhang, Heng
    Liu, Daqing
    Zheng, Qi
    Su, Bing
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2225 - 2234
  • [47] FINE-GRAINED COLOUR DISCRIMINATION WITHOUT FINE-GRAINED COLOUR
    Gert, Joshua
    AUSTRALASIAN JOURNAL OF PHILOSOPHY, 2015, 93 (03) : 602 - 605
  • [48] Online/Offline and Fine-Grained Controllable Editing with Accountability and Revocability in Blockchains
    Guo, Lifeng
    Ma, Xueke
    Yau, Wei-Chuen
    BLOCKCHAIN TECHNOLOGY AND APPLICATION, CBCS 2023, 2024, 2098 : 125 - 153
  • [49] STYLEPTB: A Compositional Benchmark for Fine-grained Controllable Text Style Transfer
    Lyu, Yiwei
    Liang, Paul Pu
    Pham, Hai
    Hovy, Eduard
    Poczos, Barnabas
    Salakhutdinov, Ruslan
    Morency, Louis-Philippe
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2116 - 2138
  • [50] Fine-grained person-based image captioning via advanced spectrum parsing
    Wu, Jianhui
    Ni, Fan
    Wang, Zijie
    Ju, Haoyu
    Zhang, Yue
    Hu, Fangqiang
    Li, Yifeng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 34015 - 34030