A Classification Model with Corpus Enrichment for Toponym Disambiguation

被引:0
|
作者
Priego Sanchez, Belem [1 ]
Somodevilla, Maria J. [1 ]
Guzman Cabrera, Rafael [2 ]
Pineda, Ivo H. [1 ]
Carrillo, Maya [1 ]
机构
[1] Benemerita Univ Autonoma Puebla, FCC, Av San Claudio & 14 S, Puebla, Mexico
[2] Univ Guanajuato, DICIS, Salamanca, Mexico
关键词
toponym disambiguation; geographic information retrieval; corpus; classification model; WORD SENSE DISAMBIGUATION; WEB;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a method based on information retrieval to enrich corpus using bootstrapping techniques. A supervised corpus manually validated is provided, and then snippets are obtained from Web in order to increase the size of the initial corpus. Although this technique has already been reported in the literature, the main objective of this work is to apply it under the specific task of GEO/NO-GEO toponym disambiguation. The disambiguation procedure is evaluated by a classification model observing favorable results.
引用
收藏
页码:472 / 480
页数:9
相关论文
共 50 条
  • [31] Word sense disambiguation for untagged corpus: Application to Romanian language
    Serban, G
    Tatar, D
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 268 - 272
  • [32] Restrictions on constituent order based on corpus and intended for syntactic disambiguation
    Ibanez, M. Pilar Valverde
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2006, (37): : 105 - 112
  • [33] So Far Away and Yet so Close: Augmenting Toponym Disambiguation and Similarity with Text-Based Networks
    Spitz, Andreas
    Geiss, Johanna
    Gertz, Michael
    THIRD INTERNATIONAL ACM WORKSHOP ON MANAGING AND MINING ENRICHED GEO-SPATIAL DATA, 2016, : 7 - 12
  • [34] A tool for corpus analysis using partial disambiguation and bootstrapping of the lexicon
    Eberle, Kurt
    Heid, Ulrich
    Kountz, Manuel
    Eckart, Kersdn
    TEXT RESOURCES AND LEXICAL KNOWLEDGE, 2008, 8 : 145 - 157
  • [35] Corpus-driven Annotation Enrichment
    Kuhr, Felix
    Witten, Bjarne
    Moeller, Ralf
    2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 138 - 141
  • [36] England: enrichment of the corpus of Romanesque settlements
    Grandchamp, Pierre Garrigou
    BULLETIN MONUMENTAL, 2020, 178 (04): : 517 - 517
  • [37] Recognition and Disambiguation Chinese Toponym from Web Texts——Take the Names of Chinese Administrative Division above County for Example
    杜萍
    刘勇
    遥感技术与应用, 2011, (06) : 868 - 873
  • [38] A Classification Schema for Fast Disambiguation of Spatial Prepositions
    Dittrich, Andre
    Vasardani, Maria
    Winter, Stephan
    Baldwin, Timothy
    Liu, Fei
    PROCEEDINGS OF THE 6TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON GEOSTREAMING (IWGS) 2015, 2015, : 78 - 86
  • [39] Semi-supervised Word Sense Disambiguation Using the Web as Corpus
    Guzman-Cabrera, Rafael
    Rosso, Paolo
    Montes-y-Gomez, Manuel
    Villasenor-Pineda, Luis
    Pinto-Avendano, David
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2009, 5449 : 256 - +
  • [40] Improving an automatically extracted corpus for UMLS Metathesaurus word sense disambiguation
    Jimeno-Yepes, Antonio
    Aronson, Alan R.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (45): : 239 - 242