Empirical Text Analysis for Identifying the Genres of Bengali Literary Work

被引：1

作者：

Afroze, Ayesha ^{[1
]}

Dutta, Kishowloy ^{[1
]}

Sadik, Sadman ^{[1
]}

Khanam, Sadia ^{[1
]}

Rab, Raqeebir ^{[1
]}

Rahim, Mohammad Asifur ^{[1
]}

机构：

[1] Ahsanullah Univ Sci & Technol AUST, Dept Comp Sci & Engn, Dhaka, Bangladesh

来源：

JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY | 2024年 / 15卷 / 05期

关键词：

genre; Long Short-Term Memory (LSTM); Convolutional Neural Networks (CNN); Bidirectional Encoder Representations from Transformers (BERT); Support Vector Machines (SVM); Natural Language Processing; Book Snippets; Recurrent Neural Networks (RNN);

D O I：

10.12720/jait.15.5.602-613

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Digital books and internet retailers are growing in popularity daily. Different individuals prefer various genres of literature. Categorizing genres facilitates the discovery of books that match a reader's tastes. The assortment is the process of categorizing or genre-classifying a book. In this paper, we categorize books by genre using a variety of traditional machine learning and deep learning models based on book titles and snippets. Such work exists for books in other languages but has not yet been completed for Bengali novels. We have developed two types of datasets as a result of data collection for this research. One dataset includes the titles of Bengali novels across nine genres, while the other includes book snippets from three genres. For classification, we have employed logistic regression, Support Vector Machines (SVM), random forest classifiers, decision trees, Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and Bidirectional Encoder Representations from Transformers (BERT). Among all the models, BERT has the highest performance for both datasets, with 90% accuracy for the book excerpt dataset and 77% accuracy for the book Title dataset. With the exception of BERT, traditional machine learning models performed better in the Snippets dataset, whereas deep learning models performed better in the Titles dataset. Due to the quantity and the number of words present in the dataset, the performance varied.

引用

页码：602 / 613

页数：12

共 50 条

[31] Identifying and overcoming possible mismatches in the beginning reader literary text interaction
Shook, DJ
HISPANIA-A JOURNAL DEVOTED TO THE TEACHING OF SPANISH AND PORTUGUESE, 1997, 80 (02): : 234 - 243
[32] Thermal Aware Energy Efficient Bengali Unicode Reader in Text Analysis
Uddin, Ashraf
Rahman, Md Atiqur
Banshal, Sumit
Das, Teerath
Kumar, Tanesh
Pandey, Bishwajeet
PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON RELIABILTY, OPTIMIZATION, & INFORMATION TECHNOLOGY (ICROIT 2014), 2014, : 369 - 373
[33] INDIVIDUAL CONCRETIZATION OF LITERARY TEXT, IMPLICIT AND EXPLICIT SUBJECTIVITY IN THE TEXT-WORK RELATION
BILEK, PA
CESKA LITERATURA, 1994, 42 (02): : 178 - 190
[34] TEXT-TRANSFORMS - COMMUNICATIVE PARADIGM BASIS OF A LITERARY WORK
Syniavska, Lesya
Kolkutina, Viktoriia
Pohrebennyk, Volodymyr
WISDOM, 2020, 15 (02): : 56 - 67
[35] Language and text in theoretical and empirical work - Germanic linguistics
Lauf, R
ZEITSCHRIFT FUR DIALEKTOLOGIE UND LINGUISTIK, 2003, 70 (01): : 68 - 71
[36] CONTENTS ANALYSIS OF THE LITERARY WORK
HAMAN, A
CESKA LITERATURA, 1990, 38 (02): : 97 - 114
[37] LITERARY WORK + THE ANALYSIS OF DISCOURSE
LAJARTE, PD
DEGRES-REVUE DE SYNTHESE A ORIENTATION SEMIOLOGIQUE, 1985, (41): : A1 - A19
[38] Wish generations in literary text understanding: Empirical data and a computational model
Tokosumi, A
Ieshima, T
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2000, 35 (3-4) : 377 - 377
[39] DISCOURSE AND LITERATURE, NEW APPROACHES TO THE ANALYSIS OF LITERARY GENRES - VANDIJK,T
WALRAVENS, J
REVUE BELGE DE PHILOLOGIE ET D HISTOIRE, 1990, 68 (03): : 761 - 763
[40] Towards a philosophy of the mixed breed and the clash of literary genres in the work ofJose Maneul Bricenco Guerrero
Bohorquez, Douglas
UTOPIA Y PRAXIS LATINOAMERICANA-REVISTA INTERNACIONAL DE FILOSOFIA IBEROAMERICANA Y TEORIA SOCIAL, 2014, 19 (64): : 135 - 136

← 1 2 3 4 5 →