Author Identification Using Latent Dirichlet Allocation

被引:0
|
作者
Calvo, Hiram [1 ,2 ]
Hernandez-Castaneda, Angel [1 ]
Garcia-Flores, Jorge [2 ]
机构
[1] IPN, Ctr Comp Res CIC, Ave JD Batiz E MO Mendizabal, Mexico City 07738, DF, Mexico
[2] Univ Paris 13, Lab Informat Paris Nord, CNRS, UMR 7030,Sorbonne Paris Cite, F-93430 Villetaneuse, France
关键词
D O I
10.1007/978-3-319-77116-8_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation (LDA) model. By using this method, we take into account the vocabulary and context of words at the same time, and after a statistical process find to what extent the relations between words are given in each document; processing a set of documents by LDA returns a set of distributions of topics. Each distribution can be seen as a vector of features and a fingerprint of each document within the collection. We used then a Naive Bayes classifier on the obtained patterns with different performances. We obtained state-of-the-art performance for English, overtaking the best FS score reported in PAN 2015, while obtaining mixed results for other languages.
引用
收藏
页码:303 / 312
页数:10
相关论文
共 50 条
  • [41] Labeled Phrase Latent Dirichlet Allocation
    Tang, Yi-Kun
    Mao, Xian-Ling
    Huang, Heyan
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2016, PT I, 2016, 10041 : 525 - 536
  • [42] Bibliometric Analysis of Latent Dirichlet Allocation
    Garg, Mohit
    Rangra, Priya
    DESIDOC JOURNAL OF LIBRARY & INFORMATION TECHNOLOGY, 2022, 42 (02): : 105 - 113
  • [43] A Spectral Algorithm for Latent Dirichlet Allocation
    Anandkumar, Anima
    Foster, Dean P.
    Hsu, Daniel
    Kakade, Sham M.
    Liu, Yi-Kai
    ALGORITHMICA, 2015, 72 (01) : 193 - 214
  • [44] A Spectral Algorithm for Latent Dirichlet Allocation
    Anima Anandkumar
    Dean P. Foster
    Daniel Hsu
    Sham M. Kakade
    Yi-Kai Liu
    Algorithmica, 2015, 72 : 193 - 214
  • [45] Topic Selection in Latent Dirichlet Allocation
    Wang, Biao
    Liu, Zelong
    Li, Maozhen
    Liu, Yang
    Qi, Man
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 756 - 760
  • [46] Crowd labeling latent Dirichlet allocation
    Pion-Tonachini, Luca
    Makeig, Scott
    Kreutz-Delgado, Ken
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 53 (03) : 749 - 765
  • [47] The Auto Annotation Latent Dirichlet Allocation
    Xiang, Yingzhuo
    Yang, Dongmei
    Yan, Jikun
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, MACHINERY, MATERIALS AND ENERGY (ICISMME 2015), 2015, 126 : 1908 - 1911
  • [48] Topic Modeling Twitter Data Using Latent Dirichlet Allocation and Latent Semantic Analysis
    Qomariyah, Siti
    Iriawan, Nur
    Fithriasari, Kartika
    2ND INTERNATIONAL CONFERENCE ON SCIENCE, MATHEMATICS, ENVIRONMENT, AND EDUCATION, 2019, 2019, 2194
  • [49] Empirical identification of the chief digital officer role: A latent Dirichlet allocation approach
    Culasso, Francesca
    Gavurova, Beata
    Crocco, Edoardo
    Giacosa, Elisa
    JOURNAL OF BUSINESS RESEARCH, 2023, 154
  • [50] Industry 4.0: Latent Dirichlet Allocation and clustering based theme identification of bibliography
    Janmaijaya, Manvendra
    Shukla, Amit K.
    Muhuri, Pranab K.
    Abraham, Ajith
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 103