Human-like Controllable Image Captioning with Verb-specific Semantic Roles

被引:42
|
作者
Chen, Long [2 ,3 ]
Jiang, Zhihong [1 ]
Xiao, Jun [1 ]
Liu, Wei [4 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Tencent AI Lab, Bellevue, WA USA
[3] Columbia Univ, New York, NY 10027 USA
[4] Tencent Data Platform, New York, NY USA
基金
浙江省自然科学基金; 中国国家自然科学基金;
关键词
D O I
10.1109/CVPR46437.2021.01657
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Controllable Image Captioning (CIC) - generating image descriptions following designated control signals- has received unprecedented attention over the last few years. To emulate the human ability in controlling caption generation, current CIC studies focus exclusively on control signals concerning objective properties, such as contents of interest or descriptive patterns. However, we argue that almost all existing objective control signals have overlooked two indispensable characteristics of an ideal control signal: 1) Event-compatible: all visual contents referred to in a single sentence should be compatible with the described activity. 2) Sample-suitable: the control signals should be suitable for a specific image sample. To this end, we propose a new control signal for CIC: Verb-specific Semantic Roles (VSR). VSR consists of a verb and some semantic roles, which represents a targeted activity and the roles of entities involved in this activity. Given a designated VSR, we first train a grounded semantic role labeling (GSRL) model to identify and ground all entities for each role. Then, we propose a semantic structure planner (SSP) to learn human-like descriptive semantic structures. Lastly, we use a roleshift captioning model to generate the captions. Extensive experiments and ablations demonstrate that our framework can achieve better controllability than several strong baselines on two challenging CIC benchmarks. Besides, we can generate multi-level diverse captions easily.
引用
收藏
页码:16841 / 16851
页数:11
相关论文
共 40 条
  • [21] Learning relations in human-like style for few-shot fine-grained image classification
    Li, Shenming
    Feng, Lin
    Xue, Linsong
    Wang, Yifan
    Wang, Dong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (02) : 377 - 385
  • [22] Learning relations in human-like style for few-shot fine-grained image classification
    Shenming Li
    Lin Feng
    Linsong Xue
    Yifan Wang
    Dong Wang
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 377 - 385
  • [23] Toward Human-Like Grasp: Functional Grasp by Dexterous Robotic Hand Via Object-Hand Semantic Representation
    Zhu, Tianqiang
    Wu, Rina
    Hang, Jinglue
    Lin, Xiangbo
    Sun, Yi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12521 - 12534
  • [24] Human-Like Build-Order Management in StarCraft to Win against Specific Opponent's Strategies
    Takino, Hiroto
    Hoki, Kunihito
    3RD INTERNATIONAL CONFERENCE ON APPLIED COMPUTING AND INFORMATION TECHNOLOGY (ACIT 2015) 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND INTELLIGENCE (CSI 2015), 2015, : 97 - 102
  • [25] Human-specific ARHGAP11B ensures human-like basal progenitor levels in hominid cerebral organoids
    Fischer, Jan
    Ortuno, Eduardo Fernandez
    Marsoner, Fabio
    Artioli, Annasara
    Peters, Jula
    Namba, Takashi
    Oegema, Christina Eugster
    Huttner, Wieland B.
    Ladewig, Julia
    Heide, Michael
    EMBO REPORTS, 2022, 23 (11)
  • [26] Controllable probability-limited and learning-based human-like vehicle behavior and trajectory generation for autonomous driving testing in highway scenario
    Wei, Cheng
    Hui, Fei
    Khattak, Asad J.
    Zhang, Yutan
    Wang, Wenbo
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 227
  • [27] A low-cost, human-like, high-resolution, tactile sensor based on optical fibers and an image sensor
    Buyuksahin, Utku
    Kirli, Ahmet
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2018, 15 (04):
  • [28] Cali-sketch: Stroke calibration and completion for high-quality face image generation from human-like sketches
    Xia, Weihao
    Yang, Yujiu
    Xue, Jing-Hao
    NEUROCOMPUTING, 2021, 460 : 256 - 265
  • [29] Characteristics of fed-batch cultures of recombinant Escherichia coli containing human-like collagen cDNA at different specific growth rates
    Dai D Fan
    Yane Luo
    Yu Mi
    Xiao X Ma
    Longan Shang
    Biotechnology Letters, 2005, 27 : 865 - 870
  • [30] Characteristics of fed-batch cultures of recombinant Escherichia coli containing human-like collagen cDNA at different specific growth rates
    Fan, DD
    Luo, Y
    Mi, Y
    Ma, XX
    Shang, LG
    BIOTECHNOLOGY LETTERS, 2005, 27 (12) : 865 - 870