Bootstrapping lexical knowledge from unsegmented text using graph kernels

被引:0
|
作者
Hagiwara M. [1 ]
Ogawa Y. [2 ]
Toyama K. [2 ]
机构
[1] Graduate School of Information Science, Nagoya University
关键词
Bootstrapping; Graph kernel; Link analysis; Named entity extraction; Semantic category; Unsegmented text;
D O I
10.1527/tjsai.26.440
中图分类号
学科分类号
摘要
Extraction of named entitiy classes and their relationships from large corpora often involves morphological analysis of target sentences and tends to suffer from out-of-vocabulary words. In this paper we propose a semantic category extraction algorithm called Monaka and its graph-based extention g-Monaka, both of which use character n-gram based patterns as context to directly extract semantically related instances from unsegmented Japanese text. These algorithms also use "bidirectional adjacent constraints," which states that reliable instances should be placed in between reliable left and right context patterns, in order to improve proper segmentation. Monaka algorithms uses iterative induction of instaces and pattens similarly to the bootstrapping algorithm Espresso. The g-Monaka algorithm further formalizes the adjacency relation of character n-grams as a directed graph and applies von Neumann kernel and Laplacian kernel so that the negative effect of semantic draft, i.e., a phenomenon of semantically unrelated general instances being extracted, is reduced. The experiments show that g-Monaka substantially increases the performance of semantic category acquisition compared to conventional methods, including distributional similarity, bootstrapping-based Espresso, and its graph-based extension g-Espresso, in terms of F-value of the NE category task from unsegmented Japanese newspaper articles.
引用
收藏
页码:440 / 450
页数:10
相关论文
共 50 条
  • [1] Bootstrapping-Based Extraction of Dictionary Terms from Unsegmented Legal Text
    Hagiwara, Masato
    Ogawa, Yasuhiro
    Toyama, Katsuhiko
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2009, 5447 : 213 - 227
  • [2] Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish
    Rios-Alvarado, Ana B.
    Martinez-Rodriguez, Jose L.
    Garcia-Perez, Andrea G.
    Guerrero-Melendez, Tania Y.
    Lopez-Arevalo, Ivan
    Gonzalez-Compean, Jose Luis
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (02) : 1281 - 1297
  • [3] Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish
    Ana B. Rios-Alvarado
    Jose L. Martinez-Rodriguez
    Andrea G. Garcia-Perez
    Tania Y. Guerrero-Melendez
    Ivan Lopez-Arevalo
    Jose Luis Gonzalez-Compean
    Complex & Intelligent Systems, 2023, 9 : 1281 - 1297
  • [4] Bootstrapping Knowledge Graphs From Images and Text
    Mao, Jiayuan
    Yao, Yuan
    Heinrich, Stefan
    Hinz, Tobias
    Weber, Cornelius
    Wermter, Stefan
    Liu, Zhiyuan
    Sun, Maosong
    FRONTIERS IN NEUROROBOTICS, 2019, 13
  • [5] Extractive Text Summarization Using Lexical Association and Graph Based Text Analysis
    Krishna, R. V. V. Murali
    Reddy, Ch. Satyananda
    COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 1, CIDM 2015, 2016, 410 : 261 - 272
  • [6] The secret is in the sound: from unsegmented speech to lexical categories
    Christiansen, Morten H.
    Onnis, Luca
    Hockema, Stephen A.
    DEVELOPMENTAL SCIENCE, 2009, 12 (03) : 388 - 395
  • [7] Text Resources and Lexical Knowledge
    Duran Munoz, Isabel
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (42): : 133 - 134
  • [8] Discovering Chinese words from unsegmented text
    Ge, XP
    Pratt, W
    Smyth, P
    SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 271 - 272
  • [9] Using Graph-Kernels to Represent Semantic Information in Text Classification
    Goncalves, Teresa
    Quaresma, Paulo
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 632 - 646
  • [10] Bootstrapping Entity Alignment with Knowledge Graph Embedding
    Sun, Zequn
    Hu, Wei
    Zhang, Qingheng
    Qu, Yuzhong
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4396 - 4402