Bootstrapping lexical knowledge from unsegmented text using graph kernels

被引：0

作者：

Hagiwara M. ^{[1
]}

Ogawa Y. ^{[2
]}

Toyama K. ^{[2
]}

机构：

[1] Graduate School of Information Science, Nagoya University

来源：

Transactions of the Japanese Society for Artificial Intelligence | 2011年 / 26卷 / 03期

关键词：

Bootstrapping; Graph kernel; Link analysis; Named entity extraction; Semantic category; Unsegmented text;

D O I：

10.1527/tjsai.26.440

中图分类号：

学科分类号：

摘要：

Extraction of named entitiy classes and their relationships from large corpora often involves morphological analysis of target sentences and tends to suffer from out-of-vocabulary words. In this paper we propose a semantic category extraction algorithm called Monaka and its graph-based extention g-Monaka, both of which use character n-gram based patterns as context to directly extract semantically related instances from unsegmented Japanese text. These algorithms also use "bidirectional adjacent constraints," which states that reliable instances should be placed in between reliable left and right context patterns, in order to improve proper segmentation. Monaka algorithms uses iterative induction of instaces and pattens similarly to the bootstrapping algorithm Espresso. The g-Monaka algorithm further formalizes the adjacency relation of character n-grams as a directed graph and applies von Neumann kernel and Laplacian kernel so that the negative effect of semantic draft, i.e., a phenomenon of semantically unrelated general instances being extracted, is reduced. The experiments show that g-Monaka substantially increases the performance of semantic category acquisition compared to conventional methods, including distributional similarity, bootstrapping-based Espresso, and its graph-based extension g-Espresso, in terms of F-value of the NE category task from unsegmented Japanese newspaper articles.

引用

页码：440 / 450

页数：10

共 50 条

[41] Preface for the Third International Workshop on Knowledge Graph Generation from Text
Tiwari, Sanju
Mihindukulasooriya, Nandana
Osborne, Francesco
Kontokostas, Dimitris
D’Souza, Jennifer
Kejriwal, Mayank
CEUR Workshop Proceedings, 2024, 3747
[42] Text-Graph Enhanced Knowledge Graph Representation Learning
Hu, Linmei
Zhang, Mengmei
Li, Shaohua
Shi, Jinghan
Shi, Chuan
Yang, Cheng
Liu, Zhiyuan
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
[43] OpenIE-based approach for Knowledge Graph construction from text
Martinez-Rodriguez, Jose L.
Lopez-Arevalo, Ivan
Rios-Alvarado, Ana B.
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 113 : 339 - 355
[44] A materials terminology knowledge graph automatically constructed from text corpus
Zhang, Yuwei
Chen, Fangyi
Liu, Zeyi
Ju, Yunzhuo
Cui, Dongliang
Zhu, Jinyi
Jiang, Xue
Guo, Xi
He, Jie
Zhang, Lei
Zhang, Xiaotong
Su, Yanjing
SCIENTIFIC DATA, 2024, 11 (01)
[45] JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs
Ke, Pei
Ji, Haozhe
Ran, Yu
Cui, Xin
Wang, Liwei
Song, Linfeng
Zhu, Xiaoyan
Huang, Minlie
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2526 - 2538
[46] Automatic learning features using bootstrapping for text categorization
Chen, WL
Zhu, JB
Wu, HL
Yao, TS
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 571 - 579
[47] Propagation kernels: efficient graph kernels from propagated information
Marion Neumann
Roman Garnett
Christian Bauckhage
Kristian Kersting
Machine Learning, 2016, 102 : 209 - 245
[48] LexRank: Graph-based lexical centrality as salience in text summarization
Erkan, G
Radev, DR
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 22 : 457 - 479
[49] Propagation kernels: efficient graph kernels from propagated information
Neumann, Marion
Garnett, Roman
Bauckhage, Christian
Kersting, Kristian
MACHINE LEARNING, 2016, 102 (02) : 209 - 245
[50] LexRank: Graph-based lexical centrality as salience in text summarization
Erkan, G. (GERKAN@UMICH.EDU), 1600, American Association for Artificial Intelligence (22):

← 1 2 3 4 5 →