Bootstrapping lexical knowledge from unsegmented text using graph kernels

被引:0
|
作者
Hagiwara M. [1 ]
Ogawa Y. [2 ]
Toyama K. [2 ]
机构
[1] Graduate School of Information Science, Nagoya University
关键词
Bootstrapping; Graph kernel; Link analysis; Named entity extraction; Semantic category; Unsegmented text;
D O I
10.1527/tjsai.26.440
中图分类号
学科分类号
摘要
Extraction of named entitiy classes and their relationships from large corpora often involves morphological analysis of target sentences and tends to suffer from out-of-vocabulary words. In this paper we propose a semantic category extraction algorithm called Monaka and its graph-based extention g-Monaka, both of which use character n-gram based patterns as context to directly extract semantically related instances from unsegmented Japanese text. These algorithms also use "bidirectional adjacent constraints," which states that reliable instances should be placed in between reliable left and right context patterns, in order to improve proper segmentation. Monaka algorithms uses iterative induction of instaces and pattens similarly to the bootstrapping algorithm Espresso. The g-Monaka algorithm further formalizes the adjacency relation of character n-grams as a directed graph and applies von Neumann kernel and Laplacian kernel so that the negative effect of semantic draft, i.e., a phenomenon of semantically unrelated general instances being extracted, is reduced. The experiments show that g-Monaka substantially increases the performance of semantic category acquisition compared to conventional methods, including distributional similarity, bootstrapping-based Espresso, and its graph-based extension g-Espresso, in terms of F-value of the NE category task from unsegmented Japanese newspaper articles.
引用
收藏
页码:440 / 450
页数:10
相关论文
共 50 条
  • [31] A Concept-Based Text Analysis Approach Using Knowledge Graph
    Hojas-Mazo, Wenny
    Simon-Cuevas, Alfredo
    Campos, Manuel de la Iglesia
    Romero, Francisco P.
    Olivas, Jose A.
    INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS: THEORY AND FOUNDATIONS, PT II, 2018, 854 : 696 - 708
  • [32] Knowledge Element Analogy Relation Recognition using Text and Graph Structure
    Wang, Wei
    Zheng, Qinghua
    Chen, Yingying
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 178 - +
  • [33] SRDF: A Novel Lexical Knowledge Graph for Whole Sentence Knowledge Extraction
    Nam, Sangha
    Choi, GyuHyeon
    Choi, Key-Sun
    LANGUAGE, DATA, AND KNOWLEDGE, LDK 2017, 2017, 10318 : 315 - 329
  • [34] Construction of a General Lexical-Semantic Knowledge Graph
    Li, Yi
    Shao, Yanqiu
    Zhao, Yuhang
    CHINESE LEXICAL SEMANTICS (CLSW 2020), 2021, 12278 : 464 - 472
  • [35] Graph Kernels: Crossing Information from Different Patterns Using Graph Edit Distance
    Gauezere, Benoit
    Brun, Luc
    Villemin, Didier
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 42 - 50
  • [36] Text classification using string kernels
    Lodhi, H
    Shawe-Taylor, J
    Cristianini, N
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 563 - 569
  • [37] Lexical Text Segmentation Using Dictionaries
    Chawathe, Sudarshan S.
    2018 IEEE 8TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2018, : 47 - 53
  • [38] Text classification using string kernels
    Lodhi, H
    Saunders, C
    Shawe-Taylor, J
    Cristianini, N
    Watkins, C
    JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) : 419 - 444
  • [39] An Automatic Knowledge Graph Creation Framework from Natural Language Text
    Kertkeidkachorn, Natthawut
    Ichise, Ryutaro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (01): : 90 - 98
  • [40] Preface for the Second International Workshop on Knowledge Graph Generation from Text
    Tiwari, Sanju
    Mihindukulasooriya, Nandana
    Osborne, Francesco
    Kontokostas, Dimitris
    D’Souza, Jennifer
    Kejriwal, Mayank
    CEUR Workshop Proceedings, 2023, 3447