Wikipedia as Multilingual Source of Comparable Corpora

被引:0
|
作者
Gamallo Otero, Pablo [1 ]
Gonzalez Lopez, Isaac [1 ]
机构
[1] Univ Santiago de Compostela, Galiza, Spain
关键词
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This article describes an automatic method to build comparable corpora from Wikipedia using Categories as topic restrictions. Our strategy relies of the fact Wikipedia is a multilingual encyclopedia containing semi-structured information. Given two languages and a particular topic, our strategy builds a corpus with texts in the two selected languages, whose content is focused on the selected topic. Tools and corpora will be distributed under free linceses (General Public License and Creative Commons).
引用
收藏
页码:21 / 25
页数:5
相关论文
共 50 条
  • [21] Corpora, translation and multilingual computing
    Mcenery, Tony
    Baker, Paul
    Corpora in Translator Education, 2003, : 89 - 102
  • [22] Sentiment Analysis for Multilingual Corpora
    Galeshchuk, Svitlana
    Qiu, Ju
    Jourdan, Julien
    7TH WORKSHOP ON BALTO-SLAVIC NATURAL LANGUAGE PROCESSING (BSNLP'2019), 2019, : 120 - 125
  • [23] Terminology in the age of multilingual corpora
    Melby, Alan K.
    JOURNAL OF SPECIALISED TRANSLATION, 2012, (18): : 7 - 29
  • [24] Text Corpora and Multilingual Lexicography
    Maniez, Francois
    TERMINOLOGY, 2008, 14 (02): : 266 - 271
  • [25] Seeing through multilingual corpora
    Johansson, Stig
    CORPUS LINGUISTICS 25 YEARS ON, 2007, (62): : 51 - 71
  • [26] Can comparable corpora be compared?
    Lopez Arroyo, Belen
    IBERICA, 2020, (39): : 43 - 68
  • [27] IRVILAB: Gamified Searching on Multilingual Wikipedia
    Arvola, Paavo
    Alamettala, Tuulikki
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3329 - 3333
  • [28] MultiWiBi: The multilingual Wikipedia bitaxonomy project
    Flati, Tiziano
    Vannella, Daniele
    Pasini, Tommaso
    Navigli, Roberto
    ARTIFICIAL INTELLIGENCE, 2016, 241 : 66 - 102
  • [29] Multilingual schema matching for Wikipedia infoboxes
    Nguyen, Thanh
    Moreira, Viviane
    Nguyen, Huong
    Nguyen, Hoa
    Freire, Juliana
    International Journal of Computer Science Issues, 2012, 9 (03): : 133 - 144
  • [30] Understanding Editing Behaviors in Multilingual Wikipedia
    Kim, Suin
    Park, Sungjoon
    Hale, Scott A.
    Kim, Sooyoung
    Byun, Jeongmin
    Oh, Alice H.
    PLOS ONE, 2016, 11 (05):