Transformer-based Pouranic topic classification in Indian mythology

被引:0
|
作者
Paul, Apurba [1 ,3 ]
Seal, Srijan [2 ]
Das, Dipankar [1 ]
机构
[1] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
[2] JIS Coll Engn, Dept Comp Sci & Engn, Kalyani, India
[3] Univ Engn & Management, Inst Engn & Management, Dept Comp Sci & Engn, Kolkata, India
关键词
Topic classification; Indian mythology; transformer models; semantic similarity; log-likelihood; Pouranic;
D O I
10.1007/s12046-024-02598-6
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Topic classification is a challenging task in order to comprehend the subject matter or theme of the Indian mythology. It will enhance the performance of NLP-based systems, such as recommendation and semantic search engines, when dealing with texts containing mythology. This research focuses on developing transformer based models for automated topic classification of Indian mythological documents, which addresses the challenges of organizing and analyzing this rich and diverse corpus. We introduce PouranicTopic, a new annotated dataset containing over 200k verses from 7 major Hindu texts with canto, topic, and sentence labels. Additional datasets Similarity-based and Log-likelihood-based are created using sentence clustering techniques. The BERT, RoBERTa, and DistilBERT models are evaluated for canto and topic classification on these datasets. Clustering greatly improves the results on the Similarity-based dataset, but Log-likelihood-based dataset remains challenging.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Tweets Topic Classification and Sentiment Analysis Based on Transformer-Based Language Models
    Mandal, Ranju
    Chen, Jinyan
    Becken, Susanne
    Stantic, Bela
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2023, 10 (02) : 117 - 134
  • [2] Empirical Study of Tweets Topic Classification Using Transformer-Based Language Models
    Mandal, Ranju
    Chen, Jinyan
    Becken, Susanne
    Stantic, Bela
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2021, 2021, 12672 : 340 - 350
  • [3] Topic classification of electric vehicle consumer experiences with transformer-based deep learning
    Ha, Sooji
    Marchetto, Daniel J.
    Dharur, Sameer
    Asensio, Omar, I
    PATTERNS, 2021, 2 (02):
  • [4] Transformer-based Bug/Feature Classification
    Ozturk, Ceyhun E.
    Yilmaz, Eyup Halit
    Koksal, Omer
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [5] EEG Classification with Transformer-Based Models
    Sun, Jiayao
    Xie, Jin
    Zhou, Huihui
    2021 IEEE 3RD GLOBAL CONFERENCE ON LIFE SCIENCES AND TECHNOLOGIES (IEEE LIFETECH 2021), 2021, : 92 - 93
  • [6] Transformer-Based Point Cloud Classification
    Wu, Xianfeng
    Liu, Xinyi
    Wang, Junfei
    Wu, Xianzu
    Lai, Zhongyuan
    Zhou, Jing
    Liu, Xia
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT I, 2022, 1700 : 218 - 225
  • [7] Transformer-Based Topic Modeling for Urdu Translations of the Holy Quran
    Zafar, Amna
    Wasim, Muhammad
    Zulfiqar, Shaista
    Waheed, Talha
    Siddique, Abubakar
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (10)
  • [8] Transformer-based Hierarchical Encoder for Document Classification
    Sakhrani, Harsh
    Parekh, Saloni
    Ratadiya, Pratik
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 852 - 858
  • [9] Practical Transformer-based Multilingual Text Classification
    Wang, Cindy
    Banko, Michele
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 121 - 129
  • [10] BertSRC: transformer-based semantic relation classification
    Lee, Yeawon
    Son, Jinseok
    Song, Min
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)