Fine-Grained Length Controllable Video Captioning With Ordinal Embeddings

被引:0
|
作者
Nitta, Tomoya [1 ,2 ]
Fukuzawa, Takumi [2 ]
Tamaki, Toru [2 ]
机构
[1] Toshiba, Kawasaki 2128582, Japan
[2] Nagoya Inst Technol, Nagoya 4668555, Japan
来源
IEEE ACCESS | 2024年 / 12卷
基金
日本学术振兴会;
关键词
Decoding; Vectors; Earth Observing System; Training; Long short term memory; Data models; Web sites; Video on demand; Reviews; Reliability; Video captioning; length controllable generation; ordinal embedding;
D O I
10.1109/ACCESS.2024.3506751
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a method for video captioning that controls the length of generated captions. Previous work on length control often had few levels for expressing length. In this study, we propose two methods of length embedding for fine-grained length control. A traditional embedding method is linear, using a one-hot vector and an embedding matrix. In this study, we propose methods that represent length in multi-hot vectors. One is bit embedding that expresses length in bit representation, and the other is ordinal embedding that uses the binary representation often used in ordinal regression. These length representations of multi-hot vectors are converted into length embedding by a nonlinear MLP. This method allows for not only the length control of caption sentences but also the control of the time when reading the caption. Experiments using ActivityNet Captions and Spoken Moments in Time show that the proposed method effectively controls the length of the generated captions. Analysis of the embedding vectors with Independent Component Analysis (ICA) shows that length and semantics were learned separately, demonstrating the effectiveness of the proposed embedding methods. Our code and online demo are available at https://huggingface.co/spaces/fztkm/length_controllable_video_captioning.
引用
收藏
页码:189667 / 189688
页数:22
相关论文
共 50 条
  • [31] Learning Structured Relation Embeddings for Fine-Grained Fashion Attribute Recognition
    Zhu, Shumin
    Zou, Xingxing
    Qian, Jianjun
    Wong, Wai Keung
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1652 - 1664
  • [32] Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained Form Classification
    Archibald, Taylor
    Martinez, Tony
    DOCUMENT ANALYSIS SYSTEMS, DAS 2024, 2024, 14994 : 182 - 195
  • [33] Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings
    Zhuang, Chengyang
    Zheng, Yuanjie
    Huang, Wenhui
    Jia, Weikuan
    IEEE ACCESS, 2019, 7 : 174699 - 174708
  • [34] REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning
    Jiang, Ming
    Hu, Junjie
    Huang, Qiuyuan
    Zhang, Lei
    Diesner, Jana
    Gao, Jianfeng
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1475 - 1480
  • [35] Label-Aware Hyperbolic Embeddings for Fine-grained Emotion Classification
    Chen, Chih-Yao
    Hung, Tun-Min
    Hsu, Yi-Li
    Ku, Lun-Wei
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10947 - 10958
  • [36] Context-Aware Visual Policy Network for Fine-Grained Image Captioning
    Zha, Zheng-Jun
    Liu, Daqing
    Zhang, Hanwang
    Zhang, Yongdong
    Wu, Feng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 710 - 722
  • [37] Fine-grained scalable video caching for heterogeneous clients
    Liu, Jiangchuan
    Xu, Jianliang
    Chu, Xiaowen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2006, 8 (05) : 1011 - 1020
  • [38] Temporal Query Networks for Fine-grained Video Understanding
    Zhang, Chuhan
    Gupta, Ankush
    Zisserman, Andrew
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4484 - 4494
  • [39] Image Difference Captioning With Instance-Level Fine-Grained Feature Representation
    Huang, Qingbao
    Liang, Yu
    Wei, Jielong
    Yi, Cai
    Liang, Hanyu
    Leung, Ho-fung
    Li, Qing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2004 - 2017
  • [40] Fine-grained talking face generation with video reinterpretation
    Huang, Xin
    Wang, Mingjie
    Gong, Minglun
    VISUAL COMPUTER, 2021, 37 (01): : 95 - 105