Fine-Grained Length Controllable Video Captioning With Ordinal Embeddings

被引:0
|
作者
Nitta, Tomoya [1 ,2 ]
Fukuzawa, Takumi [2 ]
Tamaki, Toru [2 ]
机构
[1] Toshiba, Kawasaki 2128582, Japan
[2] Nagoya Inst Technol, Nagoya 4668555, Japan
来源
IEEE ACCESS | 2024年 / 12卷
基金
日本学术振兴会;
关键词
Decoding; Vectors; Earth Observing System; Training; Long short term memory; Data models; Web sites; Video on demand; Reviews; Reliability; Video captioning; length controllable generation; ordinal embedding;
D O I
10.1109/ACCESS.2024.3506751
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a method for video captioning that controls the length of generated captions. Previous work on length control often had few levels for expressing length. In this study, we propose two methods of length embedding for fine-grained length control. A traditional embedding method is linear, using a one-hot vector and an embedding matrix. In this study, we propose methods that represent length in multi-hot vectors. One is bit embedding that expresses length in bit representation, and the other is ordinal embedding that uses the binary representation often used in ordinal regression. These length representations of multi-hot vectors are converted into length embedding by a nonlinear MLP. This method allows for not only the length control of caption sentences but also the control of the time when reading the caption. Experiments using ActivityNet Captions and Spoken Moments in Time show that the proposed method effectively controls the length of the generated captions. Analysis of the embedding vectors with Independent Component Analysis (ICA) shows that length and semantics were learned separately, demonstrating the effectiveness of the proposed embedding methods. Our code and online demo are available at https://huggingface.co/spaces/fztkm/length_controllable_video_captioning.
引用
收藏
页码:189667 / 189688
页数:22
相关论文
共 50 条
  • [1] Fine-grained Video Captioning for Sports Narrative
    Yu, Huanyu
    Cheng, Shuo
    Ni, Bingbing
    Wang, Minsi
    Zhang, Jian
    Yang, Xiaokang
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6006 - 6015
  • [2] iMakeup: Makeup Instructional Video Dataset for Fine-Grained Dense Video Captioning
    Lin X.
    Jin Q.
    Chen S.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (08): : 1350 - 1357
  • [3] iMakeup: Makeup Instructional Video Dataset for Fine-Grained Dense Video Captioning
    Lin, Xiaozhu
    Jin, Qin
    Chen, Shizhe
    Song, Yuqing
    Zhao, Yida
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 78 - 88
  • [4] A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
    Liu, An-An
    Qiu, Yurui
    Wong, Yongkang
    Su, Yu-Ting
    Kankanhalli, Mohan
    IEEE ACCESS, 2018, 6 : 68463 - 68471
  • [5] Fine-Grained Features for Image Captioning
    Shao, Mengyue
    Feng, Jie
    Wu, Jie
    Zhang, Haixiang
    Zheng, Yayu
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4697 - 4712
  • [6] EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
    Shi, Yaya
    Yang, Xu
    Xu, Haiyang
    Yuan, Chunfeng
    Li, Bing
    Hu, Weiming
    Zha, Zheng-Jun
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17908 - 17917
  • [7] Accountable and Fine-Grained Controllable Rewriting in Blockchains
    Xu, Shengmin
    Huang, Xinyi
    Yuan, Jiaming
    Li, Yingjiu
    Deng, Robert H.
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 101 - 116
  • [8] Fine-grained Audible Video Description
    Shen, Xuyang
    Li, Dong
    Zhou, Jinxing
    Qin, Zhen
    He, Bowen
    Han, Xiaodong
    Li, Aixuan
    Dai, Yuchao
    Kong, Lingpeng
    Wang, Meng
    Qiao, Yu
    Zhong, Yiran
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10585 - 10596
  • [9] Fine-Grained Scalable Video Caching
    Gong, Qiushi
    Woods, John W.
    Kar, Koushik
    Chakareski, Jacob
    2015 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2015, : 101 - 106
  • [10] Evaluation of Output Embeddings for Fine-Grained Image Classification
    Akata, Zeynep
    Reed, Scott
    Walter, Daniel
    Lee, Honglak
    Schiele, Bernt
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2927 - 2936