Reproducible Extraction of Cross-lingual Topics (rectr)

被引:19
|
作者
Chan, Chung-Hong [1 ]
Zeng, Jing [2 ]
Wessler, Hartmut [3 ]
Jungblut, Marc [4 ]
Welbers, Kasper [5 ]
Bajjalieh, Joseph W. [6 ]
van Atteveldt, Wouter [5 ]
Althaus, Scott L. [6 ]
机构
[1] Univ Mannheim, Mannheimer Zentrum Europa Sozialforsch, D-68131 Mannheim, Germany
[2] Univ Zurich, Dept Commun & Media Res, Zurich, Switzerland
[3] Univ Mannheim, Inst Media & Commun Studies, Mannheim, Germany
[4] LMU Munchen, Dept Media & Commun, Munich, Germany
[5] Vrije Univ Amsterdam, Dept Commun Sci, Amsterdam, Netherlands
[6] Univ Illinois, Cline Ctr Adv Social Res, Urbana, IL USA
基金
美国人文基金会;
关键词
SENTIMENT ANALYSIS; TEXT; TRANSLATION;
D O I
10.1080/19312458.2020.1812555
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and proposes a new method - Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source-aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news using our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.
引用
收藏
页码:285 / 305
页数:21
相关论文
共 50 条
  • [31] Cross-Lingual Word Embeddings
    Agirre, Eneko
    COMPUTATIONAL LINGUISTICS, 2020, 46 (01) : 245 - 248
  • [32] Cross-lingual CCG Induction
    Evang, Kilian
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1577 - 1587
  • [33] A platform for cross-lingual, domain and user adaptive Web information extraction
    Karkaletsis, V
    Spyropoulos, CD
    Grover, C
    Pazienza, MT
    Coch, J
    Souflis, D
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 725 - 729
  • [34] Neural Cross-Lingual Relation Extraction Based on BilingualWord Embedding Mapping
    Ni, Jian
    Florian, Radu
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 399 - 409
  • [35] Transition-based Adversarial Network for Cross-lingual Aspect Extraction
    Wang, Wenya
    Pan, Sinno Jialin
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4475 - 4481
  • [36] Cross-lingual and Multilingual CLIP
    Carlsson, Fredrik
    Eisen, Philipp
    Rekathati, Faton
    Sahlgren, Magnus
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6848 - 6854
  • [37] Cross-Lingual Text Categorization
    Bel, N
    Koster, CHA
    Villegas, M
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2003, 2769 : 126 - 139
  • [38] Cross-lingual Structure Transfer for Zero-resource Event Extraction
    Lu, Di
    Subburathinam, Ananya
    Ji, Heng
    May, Jonathan
    Chang, Shih-Fu
    Sil, Avirup
    Voss, Clare
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1976 - 1981
  • [39] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Ghanbari, Elham
    Shakery, Azadeh
    APPLIED INTELLIGENCE, 2022, 52 (03) : 3156 - 3174
  • [40] Cross-Lingual Visual Grounding
    Dong, Wenjian
    Otani, Mayu
    Garcia, Noa
    Nakashima, Yuta
    Chu, Chenhui
    IEEE ACCESS, 2021, 9 : 349 - 358