Multimodal Logical Inference System for Visual-Textual Entailment

被引:0
|
作者
Suzuki, Riko [1 ]
Yanaka, Hitomi [1 ,2 ]
Yoshikawa, Masashi [3 ]
Mineshima, Koji [1 ]
Bekki, Daisuke [1 ]
机构
[1] Ochanomizu Univ, Tokyo, Japan
[2] RIKEN Ctr Adv Intelligence Project, Tokyo, Japan
[3] Nara Inst Sci & Technol, Nara, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large amount of research about multimodal inference across text and vision has been recently developed to obtain visually grounded word and sentence representations. In this paper, we use logic-based representations as unified meaning representations for texts and images and present an unsupervised multimodal logical inference system that can effectively prove entailment relations between them. We show that by combining semantic parsing and theorem proving, the system can handle semantically complex sentences for visual-textual inference.
引用
收藏
页码:386 / 392
页数:7
相关论文
共 50 条
  • [31] Visual-Textual Encounters with a German Grandfather: The Work of Angela Findlay
    Pettitt, Joanne
    JEWISH FILM & NEW MEDIA-AN INTERNATIONAL JOURNAL, 2023, 11 (01)
  • [32] Hybrid Representation and Decision Fusion towards Visual-textual Sentiment
    Yin, Chunyong
    Zhang, Sun
    Zeng, Qingkui
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (03)
  • [33] Visual-Textual Alignment for Generalizable Person Reidentification in Internet of Things
    Liu, Xiaosheng
    Zhou, Zhiheng
    Niu, Chang
    Wu, Qingru
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (15) : 13865 - 13875
  • [34] Affective Color Theme Generation System for Visual-textual Design: A Case Study of Banner Design
    Qiu, Qianru
    Luo, Xuan
    Watanabe, Shu
    Omura, Kengo
    INTERNATIONAL JOURNAL OF AFFECTIVE ENGINEERING, 2019, 18 (03): : 137 - 144
  • [35] Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues
    Dong, Qingxiu
    Qin, Ziwei
    Xia, Heming
    Feng, Tian
    Tong, Shoujie
    Meng, Haoran
    Xu, Lin
    Wei, Zhongyu
    Zhan, Weidong
    Chang, Baobao
    Li, Sujian
    Liu, Tianyu
    Sui, Zuifang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 932 - 946
  • [36] HTCN: Harmonious Text Colorization Network for Visual-Textual Presentation Design
    Yang, Xuyong
    Xu, Xiaobin
    Huang, Yaohong
    Yu, Nenghai
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 560 - 571
  • [37] A Novel Visual-Textual Sentiment Analysis Framework for Social Media Data
    Jindal, Kanika
    Aron, Rajni
    COGNITIVE COMPUTATION, 2021, 13 (06) : 1433 - 1450
  • [38] THE BASIC SYSTEM OF LOGICAL ENTAILMENT IS A TARSKIAN DEDUCTIVE SYSTEM
    MUSKARDIN, V
    JOURNAL OF SYMBOLIC LOGIC, 1987, 52 (01) : 333 - 333
  • [39] Nonlinear Discrete Cross-Modal Hashing for Visual-Textual Data
    Ma, Dekui
    Liang, Jian
    He, Ran
    Kong, Xiangwei
    IEEE MULTIMEDIA, 2017, 24 (02) : 56 - 65
  • [40] Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents
    Yauney, Gregory
    Hessel, Jack
    Mimno, David
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2039 - 2045