An ensemble model for classifying idioms and literal texts using BERT and RoBERTa

被引：68

作者：

Briskilal, J. ^{[1
]}

Subalalitha, C. N. ^{[1
]}

机构：

[1] SRM Inst Sci & Technol, Chengalpattu, Tamil Nadu, India

来源：

INFORMATION PROCESSING & MANAGEMENT | 2022年 / 59卷 / 01期

关键词：

BERT; RoBERTa; Ensemble model; Idiom; Literal classification;

D O I：

10.1016/j.ipm.2021.102756

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

An idiom is a common phrase that means something other than its literal meaning. Detecting idioms automatically is a serious challenge in natural language processing (NLP) domain appli-cations like information retrieval (IR), machine translation and chatbot. Automatic detection of Idioms plays an important role in all these applications. A fundamental NLP task is text classi-fication, which categorizes text into structured categories known as text labeling or categoriza-tion. This paper deals with idiom identification as a text classification task. Pre-trained deep learning models have been used for several text classification tasks; though models like BERT and RoBERTa have not been exclusively used for idiom and literal classification. We propose a pre-dictive ensemble model to classify idioms and literals using BERT and RoBERTa, fine-tuned with the TroFi dataset. The model is tested with a newly created in house dataset of idioms and literal expressions, numbering 1470 in all, and annotated by domain experts. Our model outperforms the baseline models in terms of the metrics considered, such as F-score and accuracy, with a 2% improvement in accuracy.

引用

页数：9

共 50 条

[21] Classifying text streams by keywords using classifier ensemble
Yang, Baoguo
Zhang, Yang
Li, Xue
DATA & KNOWLEDGE ENGINEERING, 2011, 70 (09) : 775 - 793
[22] An approach for classifying large dataset using ensemble classifiers
Abad, Sajad Khodarahmi Jahan
Zare-Mirakabad, Mohammad-Reza
Rezaeian, Mehdi
2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 785 - 789
[23] NCUEE at MEDIQA 2019: Medical Text Inference Using Ensemble BERT-BiLSTM-Attention Model
Lee, Lung-Hao
Lu, Yi
Chen, Po-Han
Lee, Po-Lei
Shyu, Kuo-Kai
SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), 2019, : 528 - 532
[24] Semantic Positioning Model Incorporating BERT/RoBERTa and Fuzzy Theory Achieves More Nuanced Japanese Adverb Clustering
Odle, Eric
Hsueh, Yun-Ju
Lin, Pei-Chun
ELECTRONICS, 2023, 12 (19)
[25] RECOGNIZING EMOTIONS FROM TEXTS USING A BERT-BASED APPROACH
Adoma, Acheampong Francisca
Henry, Nunoo-Mensah
Chen, Wenyu
Andre, Niyongabo Rubungo
2020 17TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2020, : 62 - 66
[26] Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT
Mutinda, Faith Wavinya
Yada, Shuntaro
Wakamiya, Shoko
Aramaki, Eiji
METHODS OF INFORMATION IN MEDICINE, 2021, 60 : E56 - E64
[27] A deep ensemble network model for classifying and predicting breast cancer
Subramanian, Arul Antran Vijay
Venugopal, Jothi Prakash
COMPUTATIONAL INTELLIGENCE, 2023, 39 (02) : 258 - 282
[28] An Ensemble Model for Stance Detection in Social Media Texts
Sherif, Sara S.
Shawky, Doaa M.
Fayed, Hatem A.
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2023, 22 (02) : 737 - 775
[29] Classifying unlabeled short texts using a fuzzy declarative approach
Francisco P. Romero
Pascual Julián-Iranzo
Andrés Soto
Mateus Ferreira-Satler
Juan Gallardo-Casero
Language Resources and Evaluation, 2013, 47 : 151 - 178
[30] Using Language Models for Classifying the Party Affiliation of Political Texts
Tu My Doan
Kille, Benjamin
Gulla, Jon Atle
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 382 - 393

← 1 2 3 4 5 →