Fine-Grained Length Controllable Video Captioning With Ordinal Embeddings

被引:0
|
作者
Nitta, Tomoya [1 ,2 ]
Fukuzawa, Takumi [2 ]
Tamaki, Toru [2 ]
机构
[1] Toshiba, Kawasaki 2128582, Japan
[2] Nagoya Inst Technol, Nagoya 4668555, Japan
来源
IEEE ACCESS | 2024年 / 12卷
基金
日本学术振兴会;
关键词
Decoding; Vectors; Earth Observing System; Training; Long short term memory; Data models; Web sites; Video on demand; Reviews; Reliability; Video captioning; length controllable generation; ordinal embedding;
D O I
10.1109/ACCESS.2024.3506751
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a method for video captioning that controls the length of generated captions. Previous work on length control often had few levels for expressing length. In this study, we propose two methods of length embedding for fine-grained length control. A traditional embedding method is linear, using a one-hot vector and an embedding matrix. In this study, we propose methods that represent length in multi-hot vectors. One is bit embedding that expresses length in bit representation, and the other is ordinal embedding that uses the binary representation often used in ordinal regression. These length representations of multi-hot vectors are converted into length embedding by a nonlinear MLP. This method allows for not only the length control of caption sentences but also the control of the time when reading the caption. Experiments using ActivityNet Captions and Spoken Moments in Time show that the proposed method effectively controls the length of the generated captions. Analysis of the embedding vectors with Independent Component Analysis (ICA) shows that length and semantics were learned separately, demonstrating the effectiveness of the proposed embedding methods. Our code and online demo are available at https://huggingface.co/spaces/fztkm/length_controllable_video_captioning.
引用
收藏
页码:189667 / 189688
页数:22
相关论文
共 50 条
  • [21] Fine-grained and Semantic-guided Visual Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1709 - 1717
  • [22] Fine-Grained Image Captioning With Global-Local Discriminative Objective
    Wu, Jie
    Chen, Tianshui
    Wu, Hefeng
    Yang, Zhi
    Luo, Guangchun
    Lin, Liang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2413 - 2427
  • [23] FIVR: Fine-Grained Incident Video Retrieval
    Kordopatis-Zilos, Giorgos
    Papadopoulos, Symeon
    Patras, Ioannis
    Kompatsiaris, Ioannis
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (10) : 2638 - 2652
  • [24] Fine-grained image emotion captioning based on Generative Adversarial Networks
    Yang, Chunmiao
    Wang, Yang
    Han, Liying
    Jia, Xiran
    Sun, Hebin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (34) : 81857 - 81875
  • [25] Leveraging Weighted Fine-Grained Cross-Graph Attention for Visual and Semantic Enhanced Video Captioning Network
    Verma, Deepali
    Haldar, Arya
    Dutta, Tanima
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2465 - 2473
  • [26] Fine-Grained Crowdsourcing for Fine-Grained Recognition
    Jia Deng
    Krause, Jonathan
    Li Fei-Fei
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587
  • [27] Online video advertising based on fine-grained video tags
    Lu, Feng
    Wang, Zirui
    Liao, Xiaofei
    Jin, Hai
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2014, 51 (12): : 2733 - 2745
  • [28] Variational Conditional GAN for Fine-grained Controllable Image Generation
    Hu, Mingqi
    Zhou, Deyu
    He, Yulan
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 109 - 124
  • [29] Hierarchical template transformer for fine-grained sentiment controllable generation
    Yuan, Li
    Wang, Jin
    Yu, Liang-Chih
    Zhang, Xuejie
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (05)
  • [30] ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Hossain, Mohammad Alamgir
    DISPLAYS, 2024, 84