Unifying Model Explainability and Robustness for Joint Text Classification and Rationale Extraction

被引:0
|
作者
Li, Dongfang [1 ]
Hu, Baotian [1 ]
Chen, Qingcai [1 ,2 ]
Xu, Tujie [1 ]
Tao, Jingcong [1 ]
Zhang, Yunan [1 ]
机构
[1] Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent works have shown explainability and robustness are two crucial ingredients of trustworthy and reliable text classification. However, previous works usually address one of two aspects: i) how to extract accurate rationales for explainability while being beneficial to prediction; how to make the predictive model robust to different types of adversarial attacks. Intuitively, a model that produces helpful explanations should be more robust against adversarial attacks, because we cannot trust the model that outputs explanations but changes its prediction under small perturbations. To this end, we propose a joint classification and rationale extraction model named AT-B MC. It includes two key mechanisms: mixed Adversarial Training (AT) is designed to use various perturbations in discrete and embedding space to improve the model's robustness, and Boundary Match Constraint (BMC) helps to locate rationales more precisely with the guidance of boundary information. Performances on benchmark datasets demonstrate that the proposed AT-BMC outperforms baselines on both classification and rationale extraction by a large margin. Robustness analysis shows that the proposed AT-BMC decreases the attack success rate effectively by up to 69%. The empirical results indicate that there are connections between robust models and better explanations.
引用
收藏
页码:10947 / 10955
页数:9
相关论文
共 50 条
  • [41] Assessing Robustness of Text Classification through Maximal Safe Radius Computation
    La Malfa, Emanuele
    Wu, Min
    Laurenti, Luca
    Wang, Benjie
    Hartshorn, Anthony
    Kwiatkowska, Marta
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [42] Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals
    Wang, Zhao
    Culotta, Aron
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14024 - 14031
  • [43] Towards Robustness to Label Noise in Text Classification via Noise Modeling
    Garg, Siddhant
    Ramakrishnan, Goutham
    Thumbe, Varun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3024 - 3028
  • [44] Robustness and Predictive Performance of Homogeneous Ensemble Feature Selection in Text Classification
    Mehta, Poornima
    Chandra, Satish
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2021, 11 (01) : 75 - 89
  • [45] Joint Entity Extraction and Assertion Detection for Clinical Text
    Bhatia, Parminder
    Celikkaya, Busra
    Khalilia, Mohammed
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 954 - 959
  • [46] Text Feature Extraction based on Joint Conditional Entropy
    Chen, Yanmin
    Wang, Xinwei
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 2055 - 2058
  • [47] A conceptual model for unifying variability in space and time: Rationale, validation, and illustrative applications
    Ananieva, Sofia
    Greiner, Sandra
    Kehrer, Timo
    Krueger, Jacob
    Kuehn, Thomas
    Linsbauer, Lukas
    Gruener, Sten
    Koziolek, Anne
    Lonn, Henrik
    Ramesh, S.
    Reussner, Ralf
    EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (05)
  • [48] Poster: Exploring Explainability Techniques for Large Language Model Classification Predictions
    Ayachitula, Sriya
    2024 IEEE 44TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS 2024, 2024, : 1454 - 1455
  • [49] A conceptual model for unifying variability in space and time: Rationale, validation, and illustrative applications
    Sofia Ananieva
    Sandra Greiner
    Timo Kehrer
    Jacob Krüger
    Thomas Kühn
    Lukas Linsbauer
    Sten Grüner
    Anne Koziolek
    Henrik Lönn
    S. Ramesh
    Ralf Reussner
    Empirical Software Engineering, 2022, 27
  • [50] Review of feature extraction approaches on biomedical text classification
    Dollah, Rozilawati
    Jafni, Tiara Izrinda
    Hashim, Haslina
    Othman, Mohd Shahizan
    Rasib, Abd Wahid
    INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2020, 7 (04): : 1 - 8