Latent semantic analysis for tagging activation states and identifiability in northwestern Mexican news outlets

被引:0
|
作者
Sanchez-Fernandez, Manuel-Alejandro [1 ]
Medina-Urrea, Alfonso [2 ]
Torres-Moren, Juan-Manuel [3 ]
机构
[1] Inst Humanidades & Ciencias Conducta, Ensenada, Baja California, Mexico
[2] Colegio Mexico, Ctr Estudios Linguist & Literarios, Mexico City, DF, Mexico
[3] Univ Avignon, Lab Informat Avignon, Avignon, France
关键词
Automatic tagging; activation states; latent semantic analysis; noun phrases; computational pragmatics; AUTHORSHIP ATTRIBUTION;
D O I
10.3233/JIFS-219235
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The present work aims to study the relationship between measures, obtained from Latent Semantic Analysis (LSA) and a variant known as SPAN, and activation and identifiability states (Informative States) of referents in noun phrases present in journalistic notes from Northwestern Mexican news outlets written in Spanish. The aim and challenge is to find a strategy to achieve labelling of new / given information in the discourse rooted in a theoretically linguistic stance. The new / given distinction can be defined from different perspectives in which it varies what linguistic forms are taken into account. Thus, the focus in this work is to work with full referential devices (n = 2 388). Pearson's R correlation tests, analysis of variance, graphical exploration of the clustering of labels, and a classification experiment with random forests are performed. For the experiment, two groups were used: noun phrases labeled with all 10 tags of informative states and a binary labelling, as well as the use of two bags-of-words for each noun phrase: the interior and the exterior. It was found that using LSA in conjunction with the inner bag of words can be used to classify certain informational states. This same measure showed good results for the binary division, detecting which sentences introduce new referents in discourse. In previous work using a similar method in noun phrases in English, 80% accuracy (n = 478) was reached in their classification exercise. Our best test for Spanish reached 79%. No work on Spanish using this method has been done before and this kind of experiment is important because Spanish exhibits a more complex inflectional morphology.
引用
收藏
页码:4463 / 4480
页数:17
相关论文
共 7 条
  • [1] Predicting Readers' Emotional States Induced by News Articles through Latent Semantic Analysis
    Lupan, D.
    Bobocescu-Kesikis, S.
    Dascalu, M.
    Trausan-Matu, S.
    Dessus, P.
    SMART 2013: SOCIAL MEDIA IN ACADEMIA: RESEARCH AND TEACHING, 2013, : 79 - 84
  • [2] Probabilistic Latent Semantic Analysis for Broadcast News Story Segmentation
    Lu, Mimi
    Leung, Cheung-Chi
    Xie, Lei
    Ma, Bin
    Li, Haizhou
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1116 - +
  • [3] Enhancing Latent Semantic Analysis by Embedding Tagging Algorithm in Retrieving Malay Text Documents
    Abd Rahman, Nurazzah
    Soom, Afiqah Bazlla Md
    Ismail, Normaly Kamal
    ADVANCED TOPICS IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2017, 710 : 309 - 319
  • [4] AUTOMATIC EVALUATION OF TEXTUAL COHERENCE IN POLICE NEWS USING LATENT SEMANTIC ANALYSIS
    Hernandez Osuna, Sergio
    Ferreira Cabrera, Anita
    RLA-REVISTA DE LINGUISTICA TEORICA Y APLICADA, 2010, 48 (02): : 115 - 139
  • [5] A GPU-accelerated non-negative sparse latent semantic analysis algorithm for social tagging data
    Zhang, Yin
    Yi, Deng
    Wei, Baogang
    Zhuang, Yueting
    INFORMATION SCIENCES, 2014, 281 : 687 - 702
  • [6] Analyzing Online Fake News Using Latent Semantic Analysis: Case of USA Election Campaign
    Mayopu, Richard G.
    Wang, Yi-Yun
    Chen, Long-Sheng
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (02)
  • [7] SUBWORD LATENT SEMANTIC ANALYSIS FOR TEXTTILING-BASED AUTOMATIC STORY SEGMENTATION OF CHINESE BROADCAST NEWS
    Yang, Yulian
    Xie, Lei
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 358 - 361