Empirical Text Analysis for Identifying the Genres of Bengali Literary Work

被引:1
|
作者
Afroze, Ayesha [1 ]
Dutta, Kishowloy [1 ]
Sadik, Sadman [1 ]
Khanam, Sadia [1 ]
Rab, Raqeebir [1 ]
Rahim, Mohammad Asifur [1 ]
机构
[1] Ahsanullah Univ Sci & Technol AUST, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
genre; Long Short-Term Memory (LSTM); Convolutional Neural Networks (CNN); Bidirectional Encoder Representations from Transformers (BERT); Support Vector Machines (SVM); Natural Language Processing; Book Snippets; Recurrent Neural Networks (RNN);
D O I
10.12720/jait.15.5.602-613
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Digital books and internet retailers are growing in popularity daily. Different individuals prefer various genres of literature. Categorizing genres facilitates the discovery of books that match a reader's tastes. The assortment is the process of categorizing or genre-classifying a book. In this paper, we categorize books by genre using a variety of traditional machine learning and deep learning models based on book titles and snippets. Such work exists for books in other languages but has not yet been completed for Bengali novels. We have developed two types of datasets as a result of data collection for this research. One dataset includes the titles of Bengali novels across nine genres, while the other includes book snippets from three genres. For classification, we have employed logistic regression, Support Vector Machines (SVM), random forest classifiers, decision trees, Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and Bidirectional Encoder Representations from Transformers (BERT). Among all the models, BERT has the highest performance for both datasets, with 90% accuracy for the book excerpt dataset and 77% accuracy for the book Title dataset. With the exception of BERT, traditional machine learning models performed better in the Snippets dataset, whereas deep learning models performed better in the Titles dataset. Due to the quantity and the number of words present in the dataset, the performance varied.
引用
收藏
页码:602 / 613
页数:12
相关论文
共 50 条
  • [1] The text in the text:: A reading of literary genres
    Díaz-Corralejo, V
    REVISTA DE FILOLOGIA ESPANOLA, 2003, 83 (1-2): : 189 - 194
  • [2] The literary text. Genres, forms, questions
    Comparini, Alberto
    GERMANISCH-ROMANISCHE MONATSSCHRIFT, 2021, 71 (02): : 260 - 262
  • [3] THE CLASSIFICATION OF GENRES + SYNCHRONIC AND DIACHRONIC DIMENSIONS OF THE LITERARY TEXT
    KENT, TL
    GENRE, 1983, 16 (01): : 1 - 20
  • [4] Readability Analysis of Bengali Literary Texts
    Phani, Shanta
    Lahiri, Shibamouli
    Biswas, Arindam
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2019, 26 (04) : 287 - 305
  • [5] An Upgraded Approach for Identifying Partially Reduplicated Forms in Bengali Text
    Abhijit Barman
    Diganta Saha
    Alok Ranjan Pal
    SN Computer Science, 5 (7)
  • [6] From generic demands to the emergency of literary text: The problem of genres
    Forero Olaya, Nicolas Camilo
    ENUNCIACION, 2016, 21 (02): : 212 - 225
  • [7] THE LITERARY WORK IS NOT ITS TEXT
    WILSMORE, S
    PHILOSOPHY AND LITERATURE, 1987, 11 (02) : 307 - 316
  • [8] Text analysis for Bengali Text Summarization using Deep Learning
    Al Munzir, Abdullah
    Rahman, Md. Lutfor
    Abujar, Sheikh
    Ohidujjaman
    Hossain, Syed Akhter
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [9] THE PRESENCE OF A LITERARY TEXT IN A MUSICAL WORK
    GENETTE, G
    REVUE DES SCIENCES HUMAINES, 1987, (205): : 113 - 120
  • [10] Ethics of Work with Literary Science and Literary Text in Educational Practice
    Zemberova, Viera
    ETICKE MYSLENIE MINULOSTI A SUCASNOSTI (ETPP 2016/15): ETIKA V SKOLE - MINULOST A PRITOMNOST, 2016, 1 : 211 - 219