Visual-textual sentiment classification with bi-directional multi-level attention networks

被引:49
|
作者
Xu, Jie [1 ]
Huang, Feiran [2 ,3 ,4 ]
Zhang, Xiaoming [5 ]
Wang, Senzhang [6 ]
Li, Chaozhuo [1 ]
Li, Zhoujun [1 ]
He, Yueying [7 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Jinan Univ, Coll Cyber Secur, Guangzhou 510632, Guangdong, Peoples R China
[3] Jinan Univ, Coll Informat Sci & Technol, Guangzhou 510632, Guangdong, Peoples R China
[4] Guangdong Key Lab Data Secur & Privacy Preserving, Guangzhou 510632, Guangdong, Peoples R China
[5] Beihang Univ, Sch Cyber Sci & Technol, Beijing 100191, Peoples R China
[6] Nanjing Univ Aeronaut & Astronaut, Sch Comp Sci & Technol, Nanjing 210016, Jiangsu, Peoples R China
[7] Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Beijing 100029, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Multi-modal; Social image; Attention model; Sentiment analysis;
D O I
10.1016/j.knosys.2019.04.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social network has become an inseparable part of our daily lives and thus the automatic sentiment analysis on social media content is of great significance to identify people's viewpoints, attitudes, and emotions on the social websites. Most existing works have concentrated on the sentiment analysis of single modality such as image or text, which cannot handle the social media content with multiple modalities including both image and text. Although some works tried to conduct multi modal sentiment analysis, the complicated correlations between the two modalities have not been fully explored. In this paper, we propose a novel Bi-Directional Multi-Level Attention (BDMLA) model to exploit the complementary and comprehensive information between the image modality and text modality for joint visual-textual sentiment classification. Specifically, to highlight the emotional regions and words in the image-text pair, visual attention network and semantic attention network are proposed respectively. The visual attention network makes region features of the image interact with multiple semantic levels of text (word, phrase, and sentence) to obtain the attended visual features. The semantic attention network makes semantic features of the text interact with multiple visual levels of image (global and local) to obtain the attended semantic features. Then, the attended visual and semantic features from the two attention networks are unified into a holistic framework to conduct visual-textual sentiment classification. Proof-of-concept experiments conducted on three real-world datasets verify the effectiveness of our model. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:61 / 73
页数:13
相关论文
共 50 条
  • [1] MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism
    Peng, Yuxin
    Qi, Jinwei
    Zhuo, Yunkan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 2728 - 2741
  • [2] Joint Visual-Textual Sentiment Analysis with Deep Neural Networks
    You, Quanzeng
    Luo, Jiebo
    Jin, Hailin
    Yang, Jianchao
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1071 - 1074
  • [3] A multimodal fusion network with attention mechanisms for visual-textual sentiment analysis
    Gan, Chenquan
    Fu, Xiang
    Feng, Qingdong
    Zhu, Qingyi
    Cao, Yang
    Zhu, Ye
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 242
  • [4] Multi-Model Fusion Framework Using Deep Learning for Visual-Textual Sentiment Classification
    Al-Tameemi, Israa K. Salman
    Feizi-Derakhshi, Mohammad-Reza
    Pashazadeh, Saeed
    Asadpour, Mohammad
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 76 (02): : 2145 - 2177
  • [5] Multi-Model Fusion Framework Using Deep Learning for Visual-Textual Sentiment Classification
    Salman Al-Tameemi I.K.
    Feizi-Derakhshi M.-R.
    Pashazadeh S.
    Asadpour M.
    Computers, Materials and Continua, 2023, 76 (02): : 2145 - 2177
  • [6] Bi-Directional Multi-Level Converter for an Energy Storage System
    Han, Sang-Hyup
    Kim, Heung-Geun
    Cha, Honnyong
    Chun, Tae-Won
    Nho, Eui-Cheol
    JOURNAL OF POWER ELECTRONICS, 2014, 14 (03) : 499 - 506
  • [7] Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks
    You, Quanzeng
    Cao, Liangliang
    Jin, Hailin
    Luo, Jiebo
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 1008 - 1017
  • [8] Multi-granularity visual-textual jointly modeling for aspect-level multimodal sentiment analysis
    Chen, Yuzhong
    Shi, Liyuan
    Lin, Jiali
    Chen, Jingtian
    Zhong, Jiayuan
    Dong, Chen
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [9] A comprehensive review of visual-textual sentiment analysis from social media networks
    Al-Tameemi, Israa Khalaf Salman
    Feizi-Derakhshi, Mohammad-Reza
    Pashazadeh, Saeed
    Asadpour, Mohammad
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2024, 7 (03): : 2767 - 2838
  • [10] Joint Visual-Textual Sentiment Analysis Based on Cross-Modality Attention Mechanism
    Zhu, Xuelin
    Cao, Biwei
    Xu, Shuai
    Liu, Bo
    Cao, Jiuxin
    MULTIMEDIA MODELING (MMM 2019), PT I, 2019, 11295 : 264 - 276