Rule-based information extraction for mechanical-electrical-plumbing-specific semantic web

被引:30
|
作者
Wu, Lang-Tao [1 ]
Lin, Jia-Rui [1 ]
Leng, Shuo [1 ]
Li, Jiu-Lin [2 ]
Hu, Zhen-Zhong [3 ]
机构
[1] Tsinghua Univ, Dept Civil Engn, Beijing 100084, Peoples R China
[2] Beijing Urban Construction Grp Co Ltd, Beijing 100088, Peoples R China
[3] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Information extraction; MEP; Rule match; Named entity recognition; Relation extraction; Natural language understanding; Semantic web; MANAGEMENT; KNOWLEDGE; ONTOLOGY; OBJECTS;
D O I
10.1016/j.autcon.2021.104108
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Information extraction (IE), which aims to retrieve meaningful information from plain text, has been widely studied in general and professional domains to support downstream applications. However, due to the lack of labeled data and the complexity of professional mechanical, electrical and plumbing (MEP) information, it is challenging to apply current common deep learning IE methods to the MEP domain. To solve this problem, this paper proposes a rule-based approach for MEP IE task, including a "snowball " strategy to collect large-scale MEP corpora, a suffix-based matching algorithm on text segments for named entity recognition (NER), and a dependency-path-based matching algorithm on dependency tree for relationship extraction (RE). 2 ideas called "meta linking " and "path filtering " for RE are proposed as well, to discover the out-of-pattern entities/relationships as many as possible. To verify the feasibility of the proposed approach, 65 MB MEP corpora have been collected as input of the proposed approach and an MEP semantic web which consists of 15,978 entities and 65,110 relationship triples established, with an accuracy of 81% to entities and 75% to relationship triples, respectively. A comparison experiment between classical deep learning models and the proposed rule-based approach was carried out, illustrating that the performance of our method is 37% and 49% better than the selected deep learning NER and RE models, respectively, in the aspect of extraction precision.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Rule-based information extraction from patients' clinical data
    Mykowiecka, Agnieszka
    Marciniak, Malgorzata
    Kupsc, Anna
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 923 - 936
  • [32] Intelligent Tourism Information Consumption: A Push Semantic Rule-based System
    Lamsfus, Carlos
    Martin, David
    Alzua-Sorzabal, Aurkene
    Lopez-de-Ipina, Diego
    Torres-Manzanera, Emilio
    ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 : 823 - 832
  • [33] Information extraction for the semantic web
    Baumgartner, R
    Eiter, T
    Gottlob, G
    Herzog, M
    Koch, C
    REASONING WEB, 2005, 3564 : 275 - 289
  • [34] Semantic web framework for rule-based generation of knowledge and simulation of manufacturing systems
    Rabe, Markus
    Gocev, Pavel
    ENTERPRISE INTEROPERABILITY III: NEW CHALLENGES AND INDUSTRIAL APPROACHES, 2008, : 397 - 409
  • [35] Semantic Rule-Based Equipment Diagnostics
    Mehdi, Gulnar
    Kharlamov, E.
    Savkovic, Ognjen
    Xiao, G.
    Kalayci, E. Guzel
    Brandt, S.
    Horrocks, I.
    Roshchin, Mikhail
    Runkler, Thomas
    SEMANTIC WEB - ISWC 2017, PT II, 2017, 10588 : 314 - 333
  • [36] PROVA: Rule-based Java']Java-scripting for a bioinformatics semantic web
    Kozlenkov, A
    Schroeder, M
    DATA INTEGRATION IN THE LIFE SCIENCES, PROCEEDINGS, 2004, 2994 : 17 - 30
  • [37] A Rule Based Personalized Location Information System for the Semantic Web
    Viktoratos, Iosif
    Tsadiras, Athanasios
    Bassiliades, Nick
    E-COMMERCE AND WEB TECHNOLOGIES, EC-WEB 2013, 2013, 152 : 27 - 38
  • [38] A Web-Based Approach for Traceability in Rule-Based Business Information Systems
    Rutledge, Lloyd
    Berghuis, Brent
    Lim, Kelvin
    Soerokromo, Mark
    BUSINESS MODELING AND SOFTWARE DESIGN, BMSD 2023, 2023, 483 : 308 - 318
  • [39] Rule-Based Extraction of Family History Information from Clinical Notes
    Almeida, Joao Rafael
    Matos, Sergio
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 670 - 675
  • [40] UIMA Ruta: Rapid development of rule-based information extraction applications
    Kluegl, Peter
    Toepfer, Martin
    Beck, Philip-Daniel
    Fette, Georg
    Puppe, Frank
    NATURAL LANGUAGE ENGINEERING, 2016, 22 (01) : 1 - 40