Reproducible Extraction of Cross-lingual Topics (rectr)

被引:19
|
作者
Chan, Chung-Hong [1 ]
Zeng, Jing [2 ]
Wessler, Hartmut [3 ]
Jungblut, Marc [4 ]
Welbers, Kasper [5 ]
Bajjalieh, Joseph W. [6 ]
van Atteveldt, Wouter [5 ]
Althaus, Scott L. [6 ]
机构
[1] Univ Mannheim, Mannheimer Zentrum Europa Sozialforsch, D-68131 Mannheim, Germany
[2] Univ Zurich, Dept Commun & Media Res, Zurich, Switzerland
[3] Univ Mannheim, Inst Media & Commun Studies, Mannheim, Germany
[4] LMU Munchen, Dept Media & Commun, Munich, Germany
[5] Vrije Univ Amsterdam, Dept Commun Sci, Amsterdam, Netherlands
[6] Univ Illinois, Cline Ctr Adv Social Res, Urbana, IL USA
基金
美国人文基金会;
关键词
SENTIMENT ANALYSIS; TEXT; TRANSLATION;
D O I
10.1080/19312458.2020.1812555
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and proposes a new method - Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source-aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news using our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.
引用
收藏
页码:285 / 305
页数:21
相关论文
共 50 条
  • [1] Multi-lingual and Cross-lingual timeline extraction
    Laparra, Egoitz
    Agerri, Rodrigo
    Aldabe, Itziar
    Rigau, German
    KNOWLEDGE-BASED SYSTEMS, 2017, 133 : 77 - 89
  • [2] Cross-Lingual Latent Topic Extraction
    Zhang, Duo
    Mei, Qiaozhu
    Zhai, ChengXiang
    ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 1128 - 1137
  • [3] Cross-Lingual Information to the Rescue in Keyword Extraction
    Huang, Chung-Chi
    Eskenazi, Maxine
    Carbonell, Jaime
    Ku, Lun-Wei
    Yang, Ping-Che
    PROCEEDINGS OF 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: SYSTEM DEMONSTRATIONS, 2014, : 1 - 6
  • [4] Cross-Lingual Sentence Extraction for Information Distillation
    Singla, Adish Kumar
    Hakkani-Tuer, Dilek
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2707 - 2710
  • [5] Cross-lingual pseudo relevance feedback based on bilingual topics
    Wang, Xu-Wen
    Wang, Xiao-Jie
    Sun, Yue-Ping
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2013, 36 (04): : 81 - 84
  • [6] Cross-lingual Structure Transfer for Relation and Event Extraction
    Subburathinam, Ananya
    Lu, Di
    Ji, Heng
    May, Jonathan
    Chang, Shih-Fu
    Sil, Avirup
    Voss, Clare
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 313 - 325
  • [7] Cross-lingual Terminology Extraction for Translation Quality Estimation
    Yuan, Yu
    Gao, Yuze
    Zhang, Yue
    Sharoff, Serge
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3774 - 3780
  • [8] Prompt-Learning for Cross-Lingual Relation Extraction
    Hsu, Chiaming
    Zan, Changtong
    Ding, Liang
    Wang, Longyue
    Wang, Xiaoting
    Liu, Weifeng
    Lin, Fu
    Hu, Wenbin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [9] Comparison of Two Cross-lingual AF Extraction Methods
    Du, Shixuan
    Zhan, Qingran
    Shan, Yahui
    Xie, Xiang
    2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 262 - 266
  • [10] CrossOIE: Cross-Lingual Classifier for Open Information Extraction
    Cabral, Bruno Souza
    Glauber, Rafael
    Souza, Marlo
    Claro, Daniela Barreiro
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 368 - 378