Reproducible Extraction of Cross-lingual Topics (rectr)

被引:19
|
作者
Chan, Chung-Hong [1 ]
Zeng, Jing [2 ]
Wessler, Hartmut [3 ]
Jungblut, Marc [4 ]
Welbers, Kasper [5 ]
Bajjalieh, Joseph W. [6 ]
van Atteveldt, Wouter [5 ]
Althaus, Scott L. [6 ]
机构
[1] Univ Mannheim, Mannheimer Zentrum Europa Sozialforsch, D-68131 Mannheim, Germany
[2] Univ Zurich, Dept Commun & Media Res, Zurich, Switzerland
[3] Univ Mannheim, Inst Media & Commun Studies, Mannheim, Germany
[4] LMU Munchen, Dept Media & Commun, Munich, Germany
[5] Vrije Univ Amsterdam, Dept Commun Sci, Amsterdam, Netherlands
[6] Univ Illinois, Cline Ctr Adv Social Res, Urbana, IL USA
基金
美国人文基金会;
关键词
SENTIMENT ANALYSIS; TEXT; TRANSLATION;
D O I
10.1080/19312458.2020.1812555
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and proposes a new method - Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source-aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news using our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.
引用
收藏
页码:285 / 305
页数:21
相关论文
共 50 条
  • [11] Language Model Priming for Cross-Lingual Event Extraction
    Fincke, Steven
    Agarwal, Shantanu
    Miller, Scott
    Boschee, Elizabeth
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10627 - 10635
  • [12] Cross-Lingual Sentiment Relation Capturing for Cross-Lingual Sentiment Analysis
    Chen, Qiang
    Li, Wenjie
    Lei, Yu
    Liu, Xule
    Luo, Chuwei
    He, Yanxiang
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 54 - 67
  • [13] Zero-Shot Cross-Lingual Opinion Target Extraction
    Jebbara, Soufian
    Cimiano, Philipp
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2486 - 2495
  • [14] Automatic Information Extraction in the Medical Domain by Cross-Lingual Projection
    Ben Abacha, Asma
    Zweigenbaum, Pierre
    Max, Aurelien
    2013 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2013), 2013, : 82 - 88
  • [15] Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification
    Wu, Hanqian
    Wang, Zhike
    Qing, Feng
    Li, Shoushan
    ELECTRONICS, 2021, 10 (03) : 1 - 14
  • [16] Using Query Expansion for Cross-Lingual Mathematical Terminology Extraction
    Stoykova, Velislava
    Stankovic, Ranka
    ARTIFICIAL INTELLIGENCE AND ALGORITHMS IN INTELLIGENT SYSTEMS, 2019, 764 : 154 - 164
  • [17] Towards an entity relation extraction framework in the cross-lingual context
    Yu, Chuanming
    Xue, Haodong
    Wang, Manyi
    An, Lu
    ELECTRONIC LIBRARY, 2021, 39 (03): : 411 - 434
  • [18] XLTU: A Cross-Lingual Model in Temporal Expression Extraction for Uyghur
    Liang, Yifei
    Li, Lanying
    Liu, Rui
    Ahmat, Ahtam
    Jiang, Lei
    COMPUTATIONAL SCIENCE, ICCS 2024, PT II, 2024, 14833 : 159 - 173
  • [19] Cross-Lingual Blog Analysis by Cross-Lingual Comparison of Characteristic Terms and Blog Posts
    Nakasaki, Hiroyuki
    Kawaba, Mariko
    Utsuro, Takehito
    Fukuhara, Tomohiro
    Nakagawa, Hiroshi
    Kando, Noriko
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 105 - +
  • [20] SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism
    Fatima, Mehwish
    Kolber, Tim
    Markert, Katja
    Strube, Michael
    NewSumm 2023 - Proceedings of the 4th New Frontiers in Summarization Workshop, Proceedings of EMNLP Workshop, 2023, : 24 - 40