Bootstrapping lexical knowledge from unsegmented text using graph kernels

被引:0
|
作者
Hagiwara M. [1 ]
Ogawa Y. [2 ]
Toyama K. [2 ]
机构
[1] Graduate School of Information Science, Nagoya University
关键词
Bootstrapping; Graph kernel; Link analysis; Named entity extraction; Semantic category; Unsegmented text;
D O I
10.1527/tjsai.26.440
中图分类号
学科分类号
摘要
Extraction of named entitiy classes and their relationships from large corpora often involves morphological analysis of target sentences and tends to suffer from out-of-vocabulary words. In this paper we propose a semantic category extraction algorithm called Monaka and its graph-based extention g-Monaka, both of which use character n-gram based patterns as context to directly extract semantically related instances from unsegmented Japanese text. These algorithms also use "bidirectional adjacent constraints," which states that reliable instances should be placed in between reliable left and right context patterns, in order to improve proper segmentation. Monaka algorithms uses iterative induction of instaces and pattens similarly to the bootstrapping algorithm Espresso. The g-Monaka algorithm further formalizes the adjacency relation of character n-grams as a directed graph and applies von Neumann kernel and Laplacian kernel so that the negative effect of semantic draft, i.e., a phenomenon of semantically unrelated general instances being extracted, is reduced. The experiments show that g-Monaka substantially increases the performance of semantic category acquisition compared to conventional methods, including distributional similarity, bootstrapping-based Espresso, and its graph-based extension g-Espresso, in terms of F-value of the NE category task from unsegmented Japanese newspaper articles.
引用
收藏
页码:440 / 450
页数:10
相关论文
共 50 条
  • [41] Preface for the Third International Workshop on Knowledge Graph Generation from Text
    Tiwari, Sanju
    Mihindukulasooriya, Nandana
    Osborne, Francesco
    Kontokostas, Dimitris
    D’Souza, Jennifer
    Kejriwal, Mayank
    CEUR Workshop Proceedings, 2024, 3747
  • [42] Text-Graph Enhanced Knowledge Graph Representation Learning
    Hu, Linmei
    Zhang, Mengmei
    Li, Shaohua
    Shi, Jinghan
    Shi, Chuan
    Yang, Cheng
    Liu, Zhiyuan
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [43] OpenIE-based approach for Knowledge Graph construction from text
    Martinez-Rodriguez, Jose L.
    Lopez-Arevalo, Ivan
    Rios-Alvarado, Ana B.
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 113 : 339 - 355
  • [44] A materials terminology knowledge graph automatically constructed from text corpus
    Zhang, Yuwei
    Chen, Fangyi
    Liu, Zeyi
    Ju, Yunzhuo
    Cui, Dongliang
    Zhu, Jinyi
    Jiang, Xue
    Guo, Xi
    He, Jie
    Zhang, Lei
    Zhang, Xiaotong
    Su, Yanjing
    SCIENTIFIC DATA, 2024, 11 (01)
  • [45] JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs
    Ke, Pei
    Ji, Haozhe
    Ran, Yu
    Cui, Xin
    Wang, Liwei
    Song, Linfeng
    Zhu, Xiaoyan
    Huang, Minlie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2526 - 2538
  • [46] Automatic learning features using bootstrapping for text categorization
    Chen, WL
    Zhu, JB
    Wu, HL
    Yao, TS
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 571 - 579
  • [47] Propagation kernels: efficient graph kernels from propagated information
    Marion Neumann
    Roman Garnett
    Christian Bauckhage
    Kristian Kersting
    Machine Learning, 2016, 102 : 209 - 245
  • [48] LexRank: Graph-based lexical centrality as salience in text summarization
    Erkan, G
    Radev, DR
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 22 : 457 - 479
  • [49] Propagation kernels: efficient graph kernels from propagated information
    Neumann, Marion
    Garnett, Roman
    Bauckhage, Christian
    Kersting, Kristian
    MACHINE LEARNING, 2016, 102 (02) : 209 - 245
  • [50] LexRank: Graph-based lexical centrality as salience in text summarization
    Erkan, G. (GERKAN@UMICH.EDU), 1600, American Association for Artificial Intelligence (22):