Empirical evaluations using character and word N-grams on authorship attribution for Telugu text

被引:0
|
作者
Nagaprasad, S. [1 ]
Raghunadha Reddy, T. [2 ]
Vijayapal Reddy, P. [3 ]
Vinaya Babu, A. [4 ]
VishnuVardhan, B. [5 ]
机构
[1] Department of CSE, Aacharya Nagarjuna University, Guntur, India
[2] Department of CSE, Swarnandhra Institute of Engineering and Technology, Narsapur, India
[3] Department of CSE, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India
[4] Department of CSE, J.N.T.U. College of Engineering, Hyderabad, India
[5] Department of IT, J.N.T.U. College of Engineering, Nachupally,Karimnagar, India
关键词
D O I
10.1007/978-81-322-2268-2_62
中图分类号
学科分类号
摘要
Authorship attribution (AA) is the task of identifying authors of anonymous texts. It is represented as multi-class text classification task. It is concerned with writing style rather than topic matter. The scalability issue in traditional AA studies concerns with the effect of data size, the amount of data per candidate author. Most stylometry researches tend to focus on long texts per author, but it is not probed in much depth in short texts. This paper investigates the task of AA on Telugu texts written by 12 different authors. Several experiments were conducted on these texts by extracting various lexical and character features of the writing style of each author, using word n-grams and character n-grams as a text representation. The support vector machine (SVM) classifier is employed in order to classify the texts to their authors. AA performance in terms of F1 measure and accuracy deteriorates as the number of candidate author’s increases and size of training data decreases. © 2015, Springer India.
引用
收藏
页码:613 / 623
相关论文
共 50 条
  • [1] Instance Based Authorship Attribution for Kannada Text Using Amalgamation of Character and Word N-grams Technique
    Chandrika, C. P.
    Kallimani, Jagadish S.
    DISTRIBUTED COMPUTING AND OPTIMIZATION TECHNIQUES, ICDCOT 2021, 2022, 903 : 547 - 557
  • [2] Authorship Attribution in Portuguese Using Character N-grams
    Markov, Ilia
    Baptista, Jorge
    Pichardo-Lagunas, Obdulia
    ACTA POLYTECHNICA HUNGARICA, 2017, 14 (03) : 59 - 78
  • [3] Source code authorship attribution using n-grams
    Burrows, Steven
    Tahaghoghi, S.M.M.
    ADCS 2007 - Proceedings of the Twelfth Australasian Document Computing Symposium, 2007, : 32 - 39
  • [4] Authorship Attribution of Ancient Texts Written by Ten Arabic Travelers Using Character N-Grams
    Ouamour, Siham
    Sayoud, Halim
    2013 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS), 2013,
  • [5] An improved N-grams based Model for Authorship Attribution
    Boughaci, Dalila
    Benmesbah, Mounir
    Zebiri, Aniss
    2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 70 - 75
  • [6] Authorship attribution of Spanish poems using n-grams and the Web as Corpus
    Guzman-Cabrera, Rafael
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2391 - 2396
  • [7] Complete syntactic N-grams as style markers for authorship attribution
    Posadas-Duran, Juan-Pablo
    Sidorov, Grigori
    Batyrshin, Ildar
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8856 : 9 - 17
  • [8] Impact of Character n-grams Attention Scores for English and Russian News Articles Authorship Attribution
    Makhmutova, Liliya
    Ross, Robert
    Salton, Giancarlo
    38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 939 - 941
  • [9] Complete Syntactic N-grams as Style Markers for Authorship Attribution
    Posadas-Duran, Juan-Pablo
    Sidorov, Grigori
    Batyrshin, Ildar
    HUMAN-INSPIRED COMPUTING AND ITS APPLICATIONS, PT I, 2014, 8856 : 9 - 17
  • [10] Using Word N-Grams as Features in Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Alhoshan, Muneera
    Hazzaa, Itisam
    SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2015, 569 : 35 - 43