An ensemble model for classifying idioms and literal texts using BERT and RoBERTa

被引：68

作者：

Briskilal, J. ^{[1
]}

Subalalitha, C. N. ^{[1
]}

机构：

[1] SRM Inst Sci & Technol, Chengalpattu, Tamil Nadu, India

来源：

INFORMATION PROCESSING & MANAGEMENT | 2022年 / 59卷 / 01期

关键词：

BERT; RoBERTa; Ensemble model; Idiom; Literal classification;

D O I：

10.1016/j.ipm.2021.102756

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

An idiom is a common phrase that means something other than its literal meaning. Detecting idioms automatically is a serious challenge in natural language processing (NLP) domain appli-cations like information retrieval (IR), machine translation and chatbot. Automatic detection of Idioms plays an important role in all these applications. A fundamental NLP task is text classi-fication, which categorizes text into structured categories known as text labeling or categoriza-tion. This paper deals with idiom identification as a text classification task. Pre-trained deep learning models have been used for several text classification tasks; though models like BERT and RoBERTa have not been exclusively used for idiom and literal classification. We propose a pre-dictive ensemble model to classify idioms and literals using BERT and RoBERTa, fine-tuned with the TroFi dataset. The model is tested with a newly created in house dataset of idioms and literal expressions, numbering 1470 in all, and annotated by domain experts. Our model outperforms the baseline models in terms of the metrics considered, such as F-score and accuracy, with a 2% improvement in accuracy.

引用

页数：9

共 50 条

[31] Classifying unlabeled short texts using a fuzzy declarative approach
Romero, Francisco P.
Julian-Iranzo, Pascual
Soto, Andres
Ferreira-Satler, Mateus
Gallardo-Casero, Juan
LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (01) : 151 - 178
[32] Research on the Classification of New Energy Industry Policy Texts Based on BERT Model
Li, Qian
Xiao, Zezhong
Zhao, Yanyun
SUSTAINABILITY, 2023, 15 (14)
[33] Alexithymic traits predict the speed of classifying non-literal statements using nonverbal cues
Jakobson, Lorna S.
Pearson, Pauline M.
COGNITION & EMOTION, 2021, 35 (03) : 569 - 575
[34] Similarity Matching for Patent Documents Using Ensemble BERT-Related Model and Novel Text Processing Method
Yu, Liqiang
Liu, Bo
Lin, Qunwei
Zhao, Xinyu
Che, Chang
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (03) : 446 - 450
[35] Classifying MRI motion severity using a stacked ensemble approach
Mohebbian, MohammadReza
Walia, Ekta
Habibullah, Mohammad
Stapleton, Shawn
Wahid, Khan A.
MAGNETIC RESONANCE IMAGING, 2021, 75 : 107 - 115
[36] Using Ensemble of Bayesian Classifying Algorithms for Medical Systematic Reviews
Aref, Abdullah
Tran, Thomas
ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2014, 2014, 8436 : 263 - 268
[37] SETAR: Stacking Ensemble Learning for Thai Sentiment Analysis Using RoBERTa and Hybrid Feature Representation
Thiengburanathum, Pree
Charoenkwan, Phasit
IEEE ACCESS, 2023, 11 : 92822 - 92837
[38] An Ensemble Learning Approach of Multi-Model for Classifying House Damage
Fan, Junqiao
Xu, Chun
Zhang, Jiahe
2021 2ND INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2021), 2021, : 145 - 152
[39] HunEmBERT: A Fine-Tuned BERT-Model for Classifying Sentiment and Emotion in Political Communication
Uveges, Istvan
Ring, Orsolya
IEEE ACCESS, 2023, 11 : 60267 - 60278
[40] A BERT-based review helpfulness prediction model utilizing consistency of ratings and texts
Li, Xinzhe
Li, Qinglong
Ryu, Dongyeop
Kim, Jaekyeong
APPLIED INTELLIGENCE, 2025, 55 (06)

← 1 2 3 4 5 →