Improving Curated Web-Data Quality with Structured Harvesting and Assessment

被引:19
作者
Feeney, Kevin Chekov [1 ]
O'Sullivan, Declan [1 ,2 ]
Tai, Wei [1 ,3 ]
Brennan, Rob [3 ]
机构
[1] Univ Dublin Trinity Coll, Sch Comp Sci & Stat, Dublin 2, Ireland
[2] Univ Dublin Trinity Coll, Dublin 2, Ireland
[3] Univ Dublin Trinity Coll, Knowledge & Data Engn Grp, Dublin 2, Ireland
关键词
Data Curation; Data Quality; Digital Humanities; Linked Data; Web Data;
D O I
10.4018/ijswis.2014040103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a semi-automated process, framework and tools for harvesting, assessing, improving and maintaining high-quality linked-data. The framework, known as DaCura(1), provides dataset curators, who may not be knowledge engineers, with tools to collect and curate evolving linked data datasets that maintain quality over time. The framework encompasses a novel process, workflow and architecture. A working implementation has been produced and applied firstly to the publication of an existing social-sciences dataset, then to the harvesting and curation of a related dataset from an unstructured data-source. The framework's performance is evaluated using data quality measures that have been developed to measure existing published datasets. An analysis of the framework against these dimensions demonstrates that it addresses a broad range of real-world data quality concerns. Experimental results quantify the impact of the DaCura process and tools on data quality through an assessment framework and methodology which combines automated and human data quality controls.
引用
收藏
页码:35 / 62
页数:28
相关论文
共 34 条
[1]  
[Anonymous], 2013, P 9 INT C SEMANTIC S, DOI DOI 10.1145/2506182.2506195
[2]  
[Anonymous], 2008, 924111 ISO
[3]  
[Anonymous], P 2 INT WORKSH CONS
[4]  
[Anonymous], SPARQL 1 1 UPD
[5]  
Auer Soren, 2012, The Semantic Web. 11th International Semantic Web Conference (ISWC 2012). Proceedings, P1, DOI 10.1007/978-3-642-35173-0_1
[6]   An empirical evaluation of the System Usability Scale [J].
Bangor, Aaron ;
Kortum, Philip T. ;
Miller, James T. .
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2008, 24 (06) :574-594
[7]   Linked Data - The Story So Far [J].
Bizer, Christian ;
Heath, Tom ;
Berners-Lee, Tim .
INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2009, 5 (03) :1-22
[8]  
Bozsak E, 2002, LECT NOTES COMPUT SC, V2455, P304
[9]  
Brennan R., 2013, EXPL NAV RETR INF CU
[10]  
Brickley D., 2009, TERMCENTRIC SEMANTIC