Data Acquisition and Information Extraction for Scientific Knowledge Base Building

被引:2
|
作者
Andruszkiewicz, Piotr [1 ]
Rybinski, Henryk [1 ]
机构
[1] Warsaw Univ Technol, Inst Comp Sci, Warsaw, Poland
关键词
D O I
10.1109/ICSC.2018.00045
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we present the process of data acquisition and information extraction for building a comprehensive and accurate scientific knowledge base including conferences, publications and scientists. We use two kinds of data sources. Firstly we gather structured and reliable, but incomprehensive and not always up-to-date data sources such as digital libraries. We enrich information extracted from those sources with unstructured data obtained from the Internet by filtering websites using SVM classifier to identify potentially useful web pages. There are two potential sources of errors in the process of information enrichment. The first is the unstructured data origin and another is lack of accuracy of the machine learning methods used for data acquisition and information extraction. We address both problems by proposing a new information extraction method as well as by using crowdsourcing to correct information. Our methods are currently used in a scientific platform; namely, Omega-psi(R) university knowledge base, containing list of researchers, publications, events, etc.
引用
收藏
页码:256 / 259
页数:4
相关论文
共 50 条