Latent semantic analysis for tagging activation states and identifiability in northwestern Mexican news outlets

被引：0

作者：

Sanchez-Fernandez, Manuel-Alejandro ^{[1
]}

Medina-Urrea, Alfonso ^{[2
]}

Torres-Moren, Juan-Manuel ^{[3
]}

机构：

[1] Inst Humanidades & Ciencias Conducta, Ensenada, Baja California, Mexico

[2] Colegio Mexico, Ctr Estudios Linguist & Literarios, Mexico City, DF, Mexico

[3] Univ Avignon, Lab Informat Avignon, Avignon, France

来源：

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS | 2022年 / 42卷 / 05期

关键词：

Automatic tagging; activation states; latent semantic analysis; noun phrases; computational pragmatics; AUTHORSHIP ATTRIBUTION;

D O I：

10.3233/JIFS-219235

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The present work aims to study the relationship between measures, obtained from Latent Semantic Analysis (LSA) and a variant known as SPAN, and activation and identifiability states (Informative States) of referents in noun phrases present in journalistic notes from Northwestern Mexican news outlets written in Spanish. The aim and challenge is to find a strategy to achieve labelling of new / given information in the discourse rooted in a theoretically linguistic stance. The new / given distinction can be defined from different perspectives in which it varies what linguistic forms are taken into account. Thus, the focus in this work is to work with full referential devices (n = 2 388). Pearson's R correlation tests, analysis of variance, graphical exploration of the clustering of labels, and a classification experiment with random forests are performed. For the experiment, two groups were used: noun phrases labeled with all 10 tags of informative states and a binary labelling, as well as the use of two bags-of-words for each noun phrase: the interior and the exterior. It was found that using LSA in conjunction with the inner bag of words can be used to classify certain informational states. This same measure showed good results for the binary division, detecting which sentences introduce new referents in discourse. In previous work using a similar method in noun phrases in English, 80% accuracy (n = 478) was reached in their classification exercise. Our best test for Spanish reached 79%. No work on Spanish using this method has been done before and this kind of experiment is important because Spanish exhibits a more complex inflectional morphology.

引用

页码：4463 / 4480

页数：17

共 7 条

[1] Predicting Readers' Emotional States Induced by News Articles through Latent Semantic Analysis
Lupan, D.
Bobocescu-Kesikis, S.
Dascalu, M.
Trausan-Matu, S.
Dessus, P.
SMART 2013: SOCIAL MEDIA IN ACADEMIA: RESEARCH AND TEACHING, 2013, : 79 - 84
[2] Probabilistic Latent Semantic Analysis for Broadcast News Story Segmentation
Lu, Mimi
Leung, Cheung-Chi
Xie, Lei
Ma, Bin
Li, Haizhou
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1116 - +
[3] Enhancing Latent Semantic Analysis by Embedding Tagging Algorithm in Retrieving Malay Text Documents
Abd Rahman, Nurazzah
Soom, Afiqah Bazlla Md
Ismail, Normaly Kamal
ADVANCED TOPICS IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2017, 710 : 309 - 319
[4] AUTOMATIC EVALUATION OF TEXTUAL COHERENCE IN POLICE NEWS USING LATENT SEMANTIC ANALYSIS
Hernandez Osuna, Sergio
Ferreira Cabrera, Anita
RLA-REVISTA DE LINGUISTICA TEORICA Y APLICADA, 2010, 48 (02): : 115 - 139
[5] A GPU-accelerated non-negative sparse latent semantic analysis algorithm for social tagging data
Zhang, Yin
Yi, Deng
Wei, Baogang
Zhuang, Yueting
INFORMATION SCIENCES, 2014, 281 : 687 - 702
[6] Analyzing Online Fake News Using Latent Semantic Analysis: Case of USA Election Campaign
Mayopu, Richard G.
Wang, Yi-Yun
Chen, Long-Sheng
BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (02)
[7] SUBWORD LATENT SEMANTIC ANALYSIS FOR TEXTTILING-BASED AUTOMATIC STORY SEGMENTATION OF CHINESE BROADCAST NEWS
Yang, Yulian
Xie, Lei
2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 358 - 361

← 1 →