Author Identification Using Latent Dirichlet Allocation

被引:0
|
作者
Calvo, Hiram [1 ,2 ]
Hernandez-Castaneda, Angel [1 ]
Garcia-Flores, Jorge [2 ]
机构
[1] IPN, Ctr Comp Res CIC, Ave JD Batiz E MO Mendizabal, Mexico City 07738, DF, Mexico
[2] Univ Paris 13, Lab Informat Paris Nord, CNRS, UMR 7030,Sorbonne Paris Cite, F-93430 Villetaneuse, France
关键词
D O I
10.1007/978-3-319-77116-8_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation (LDA) model. By using this method, we take into account the vocabulary and context of words at the same time, and after a statistical process find to what extent the relations between words are given in each document; processing a set of documents by LDA returns a set of distributions of topics. Each distribution can be seen as a vector of features and a fingerprint of each document within the collection. We used then a Naive Bayes classifier on the obtained patterns with different performances. We obtained state-of-the-art performance for English, overtaking the best FS score reported in PAN 2015, while obtaining mixed results for other languages.
引用
收藏
页码:303 / 312
页数:10
相关论文
共 50 条
  • [1] Identification of Pavement Issues Using Latent Dirichlet Allocation Machine Learning
    Parsons, Timothy A.
    Pullen, Aaron
    AIRFIELD AND HIGHWAY PAVEMENTS 2023: INNOVATION AND SUSTAINABILITY IN AIRFIELD AND HIGHWAY PAVEMENTS TECHNOLOGY, 2023, : 185 - 193
  • [2] Identification of Metrics for the Purdue Index for Construction Using Latent Dirichlet Allocation
    Jeon, JungHo
    Padhye, Suyash
    Yoon, Soojin
    Cai, Hubo
    Hastak, Makarand
    JOURNAL OF MANAGEMENT IN ENGINEERING, 2021, 37 (06)
  • [3] Identification of Novel Type III Effectors Using Latent Dirichlet Allocation
    Yang, Yang
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2012, 2012
  • [4] HDPauthor: A New Hybrid Author-Topic Model using Latent Dirichlet Allocation and Hierarchical Dirichlet Processes
    Yang, Ming
    Hsu, Willian H.
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 619 - 624
  • [5] Accuracy of Unit Under Test Identification Using Latent Semantic Analysis and Latent Dirichlet Allocation
    Madeja, Matej
    Poruban, Jaroslav
    2019 IEEE 15TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATICS (INFORMATICS 2019), 2019, : 161 - 166
  • [6] Unsupervised language identification based on Latent Dirichlet Allocation
    Zhang, Wei
    Clark, Robert A. J.
    Wang, Yongyuan
    Li, Wen
    COMPUTER SPEECH AND LANGUAGE, 2016, 39 : 47 - 66
  • [7] Coherent structure identification in turbulent channel flow using latent Dirichlet allocation
    Frihat, Mohamed
    Podvin, Berengere
    Mathelin, Lionel
    Fraigneau, Yann
    Yvon, Francois
    JOURNAL OF FLUID MECHANICS, 2021, 920
  • [8] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [9] Bug localization using latent Dirichlet allocation
    Lukins, Stacy K.
    Kraft, Nicholas A.
    Etzkorn, Letha H.
    INFORMATION AND SOFTWARE TECHNOLOGY, 2010, 52 (09) : 972 - 990
  • [10] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 601 - 608