Empirical Text Analysis for Identifying the Genres of Bengali Literary Work

被引:1
|
作者
Afroze, Ayesha [1 ]
Dutta, Kishowloy [1 ]
Sadik, Sadman [1 ]
Khanam, Sadia [1 ]
Rab, Raqeebir [1 ]
Rahim, Mohammad Asifur [1 ]
机构
[1] Ahsanullah Univ Sci & Technol AUST, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
genre; Long Short-Term Memory (LSTM); Convolutional Neural Networks (CNN); Bidirectional Encoder Representations from Transformers (BERT); Support Vector Machines (SVM); Natural Language Processing; Book Snippets; Recurrent Neural Networks (RNN);
D O I
10.12720/jait.15.5.602-613
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Digital books and internet retailers are growing in popularity daily. Different individuals prefer various genres of literature. Categorizing genres facilitates the discovery of books that match a reader's tastes. The assortment is the process of categorizing or genre-classifying a book. In this paper, we categorize books by genre using a variety of traditional machine learning and deep learning models based on book titles and snippets. Such work exists for books in other languages but has not yet been completed for Bengali novels. We have developed two types of datasets as a result of data collection for this research. One dataset includes the titles of Bengali novels across nine genres, while the other includes book snippets from three genres. For classification, we have employed logistic regression, Support Vector Machines (SVM), random forest classifiers, decision trees, Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and Bidirectional Encoder Representations from Transformers (BERT). Among all the models, BERT has the highest performance for both datasets, with 90% accuracy for the book excerpt dataset and 77% accuracy for the book Title dataset. With the exception of BERT, traditional machine learning models performed better in the Snippets dataset, whereas deep learning models performed better in the Titles dataset. Due to the quantity and the number of words present in the dataset, the performance varied.
引用
收藏
页码:602 / 613
页数:12
相关论文
共 50 条
  • [21] Text, context and post text. Introduction to the literary work of Luisa Valenzuela
    Arning, Ursula
    IBEROAMERICANA, 2012, 12 (45): : 240 - 243
  • [22] EMPIRICAL APPROACHES TO TEXT ANALYSIS
    PFAFFLIN, S
    JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 1976, 5 (02) : 228 - 228
  • [23] THE LITERARY TEXT AS OBJECT OF CONCEPTUAL ANALYSIS
    Karaseva, Y. A.
    VESTNIK ROSSIISKOGO UNIVERSITETA DRUZHBY NARODOV-SERIYA LINGVISTIKA-RUSSIAN JOURNAL OF LINGUISTICS, 2011, (01): : 122 - 126
  • [24] DYNAMIC INDEXING FOR THE ANALYSIS OF LITERARY TEXT
    WEISS, H
    INTERNATIONAL CLASSIFICATION, 1991, 18 (04): : 200 - 204
  • [25] ANTHROPOCENTRIC APPROACH TO THE ANALYSIS OF LITERARY TEXT
    Shchukina, D. A.
    JOURNAL OF MINING INSTITUTE, 2008, 175 : 59 - 60
  • [26] BIOCYBERNETIC APPROACH TO THE ANALYSIS OF A LITERARY TEXT
    MALIK, MF
    DEGRES-REVUE DE SYNTHESE A ORIENTATION SEMIOLOGIQUE, 1985, (42-43): : E1 - E27
  • [27] The literary work ok Vicent Andres Estelles. Genres, poetic traditions and style
    Lacueva i Lorenz, Maria
    ZEITSCHRIFT FUR KATALANISTIK, 2015, 28 : 358 - 366
  • [28] Deep Learning Approach for Sentimental Analysis of Hotel Review on Bengali text
    Bonny, Jannatul Jahan
    Haque, Nuzhat Jabeen
    Ulla, Rohmat
    Kanungoe, Proma
    Ome, Zahid Hassan
    Junaid, Istiaq Hossain
    2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
  • [29] Design and Analysis of an Effective Corpus for Evaluation of Bengali Text Compression Schemes
    Islam, Md. Rafiqul
    Rajon, S. A. Ahsan
    JOURNAL OF COMPUTERS, 2010, 5 (01) : 59 - 68
  • [30] Exploring WordNet® graphs for text summarization and sentiment analysis in Bengali speech
    Sonakshi Vij
    Janhvi Juyal
    Amita Jain
    Devendra Tayal
    International Journal of Information Technology, 2025, 17 (1) : 529 - 538