Bootstrapping lexical knowledge from unsegmented text using graph kernels

被引：0

作者：

Hagiwara M. ^{[1
]}

Ogawa Y. ^{[2
]}

Toyama K. ^{[2
]}

机构：

[1] Graduate School of Information Science, Nagoya University

来源：

Transactions of the Japanese Society for Artificial Intelligence | 2011年 / 26卷 / 03期

关键词：

Bootstrapping; Graph kernel; Link analysis; Named entity extraction; Semantic category; Unsegmented text;

D O I：

10.1527/tjsai.26.440

中图分类号：

学科分类号：

摘要：

Extraction of named entitiy classes and their relationships from large corpora often involves morphological analysis of target sentences and tends to suffer from out-of-vocabulary words. In this paper we propose a semantic category extraction algorithm called Monaka and its graph-based extention g-Monaka, both of which use character n-gram based patterns as context to directly extract semantically related instances from unsegmented Japanese text. These algorithms also use "bidirectional adjacent constraints," which states that reliable instances should be placed in between reliable left and right context patterns, in order to improve proper segmentation. Monaka algorithms uses iterative induction of instaces and pattens similarly to the bootstrapping algorithm Espresso. The g-Monaka algorithm further formalizes the adjacency relation of character n-grams as a directed graph and applies von Neumann kernel and Laplacian kernel so that the negative effect of semantic draft, i.e., a phenomenon of semantically unrelated general instances being extracted, is reduced. The experiments show that g-Monaka substantially increases the performance of semantic category acquisition compared to conventional methods, including distributional similarity, bootstrapping-based Espresso, and its graph-based extension g-Espresso, in terms of F-value of the NE category task from unsegmented Japanese newspaper articles.

引用

页码：440 / 450

页数：10

共 50 条

[1] Bootstrapping-Based Extraction of Dictionary Terms from Unsegmented Legal Text
Hagiwara, Masato
Ogawa, Yasuhiro
Toyama, Katsuhiko
NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2009, 5447 : 213 - 227
[2] Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish
Rios-Alvarado, Ana B.
Martinez-Rodriguez, Jose L.
Garcia-Perez, Andrea G.
Guerrero-Melendez, Tania Y.
Lopez-Arevalo, Ivan
Gonzalez-Compean, Jose Luis
COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (02) : 1281 - 1297
[3] Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish
Ana B. Rios-Alvarado
Jose L. Martinez-Rodriguez
Andrea G. Garcia-Perez
Tania Y. Guerrero-Melendez
Ivan Lopez-Arevalo
Jose Luis Gonzalez-Compean
Complex & Intelligent Systems, 2023, 9 : 1281 - 1297
[4] Bootstrapping Knowledge Graphs From Images and Text
Mao, Jiayuan
Yao, Yuan
Heinrich, Stefan
Hinz, Tobias
Weber, Cornelius
Wermter, Stefan
Liu, Zhiyuan
Sun, Maosong
FRONTIERS IN NEUROROBOTICS, 2019, 13
[5] Extractive Text Summarization Using Lexical Association and Graph Based Text Analysis
Krishna, R. V. V. Murali
Reddy, Ch. Satyananda
COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 1, CIDM 2015, 2016, 410 : 261 - 272
[6] The secret is in the sound: from unsegmented speech to lexical categories
Christiansen, Morten H.
Onnis, Luca
Hockema, Stephen A.
DEVELOPMENTAL SCIENCE, 2009, 12 (03) : 388 - 395
[7] Text Resources and Lexical Knowledge
Duran Munoz, Isabel
PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (42): : 133 - 134
[8] Discovering Chinese words from unsegmented text
Ge, XP
Pratt, W
Smyth, P
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 271 - 272
[9] Using Graph-Kernels to Represent Semantic Information in Text Classification
Goncalves, Teresa
Quaresma, Paulo
MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 632 - 646
[10] Bootstrapping Entity Alignment with Knowledge Graph Embedding
Sun, Zequn
Hu, Wei
Zhang, Qingheng
Qu, Yuzhong
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4396 - 4402

← 1 2 3 4 5 →