Multimodal Logical Inference System for Visual-Textual Entailment

被引:0
|
作者
Suzuki, Riko [1 ]
Yanaka, Hitomi [1 ,2 ]
Yoshikawa, Masashi [3 ]
Mineshima, Koji [1 ]
Bekki, Daisuke [1 ]
机构
[1] Ochanomizu Univ, Tokyo, Japan
[2] RIKEN Ctr Adv Intelligence Project, Tokyo, Japan
[3] Nara Inst Sci & Technol, Nara, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large amount of research about multimodal inference across text and vision has been recently developed to obtain visually grounded word and sentence representations. In this paper, we use logic-based representations as unified meaning representations for texts and images and present an unsupervised multimodal logical inference system that can effectively prove entailment relations between them. We show that by combining semantic parsing and theorem proving, the system can handle semantically complex sentences for visual-textual inference.
引用
收藏
页码:386 / 392
页数:7
相关论文
共 50 条
  • [41] A Distributed Architecture System for Recognizing Textual Entailment
    Iftene, Adrian
    Balahur-Dobrescu, Alexandra
    Matei, Daniel
    NINTH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, PROCEEDINGS, 2007, : 219 - 226
  • [42] Visual-textual Capsule Routing for Text-based Video Segmentation
    McIntosh, Bruce
    Duarte, Kevin
    Rawat, Yogesh S.
    Shah, Mubarak
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9939 - 9948
  • [43] Improving Fairness in Facial Albedo Estimation via Visual-Textual Cues
    Ren, Xingyu
    Deng, Jiankang
    Ma, Chao
    Yan, Yichao
    Yang, Xiaokang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4511 - 4520
  • [44] Visual-textual integration in LLMs for medical diagnosis: A preliminary quantitative analysis
    Agbareia, Reem
    Omar, Mahmud
    Soffer, Shelly
    Glicksberg, Benjamin S.
    Nadkarni, Girish N.
    Klang, Eyal
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2025, 27 : 184 - 189
  • [45] Social Media Popularity Prediction Based on Visual-Textual Features with XGBoost
    Chen, Junhong
    Liang, Dayong
    Zhu, Zhanmo
    Zhou, Xiaojing
    Ye, Zihan
    Mo, Xiuyun
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2692 - 2696
  • [46] Cross Modal Person Re-identification with Visual-Textual Queries
    Farooq, Ammarah
    Awais, Muhammad
    Kittler, Josef
    Akbari, Ali
    Khalid, Syed Safwan
    IEEE/IAPR INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2020), 2020,
  • [47] A Novel Visual-Textual Sentiment Analysis Framework for Social Media Data
    Kanika Jindal
    Rajni Aron
    Cognitive Computation, 2021, 13 : 1433 - 1450
  • [48] Sentiment Recognition for Short Annotated GIFs Using Visual-Textual Fusion
    Liu, Tianliang
    Wan, Junwei
    Dai, Xiubin
    Liu, Feng
    You, Quanzeng
    Luo, Jiebo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (04) : 1098 - 1110
  • [49] Visual-Textual Cross-Modal Interaction Network for Radiology Report Generation
    Zhang, Wenfeng
    Cai, Baoning
    Hu, Jianming
    Qin, Qibing
    Xie, Kezhen
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 984 - 988
  • [50] Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube
    Hessel, Jack
    Zhu, Zhenhai
    Pang, Bo
    Soricut, Radu
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8812 - 8822