Rule-based information extraction for mechanical-electrical-plumbing-specific semantic web

被引:30
|
作者
Wu, Lang-Tao [1 ]
Lin, Jia-Rui [1 ]
Leng, Shuo [1 ]
Li, Jiu-Lin [2 ]
Hu, Zhen-Zhong [3 ]
机构
[1] Tsinghua Univ, Dept Civil Engn, Beijing 100084, Peoples R China
[2] Beijing Urban Construction Grp Co Ltd, Beijing 100088, Peoples R China
[3] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Information extraction; MEP; Rule match; Named entity recognition; Relation extraction; Natural language understanding; Semantic web; MANAGEMENT; KNOWLEDGE; ONTOLOGY; OBJECTS;
D O I
10.1016/j.autcon.2021.104108
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Information extraction (IE), which aims to retrieve meaningful information from plain text, has been widely studied in general and professional domains to support downstream applications. However, due to the lack of labeled data and the complexity of professional mechanical, electrical and plumbing (MEP) information, it is challenging to apply current common deep learning IE methods to the MEP domain. To solve this problem, this paper proposes a rule-based approach for MEP IE task, including a "snowball " strategy to collect large-scale MEP corpora, a suffix-based matching algorithm on text segments for named entity recognition (NER), and a dependency-path-based matching algorithm on dependency tree for relationship extraction (RE). 2 ideas called "meta linking " and "path filtering " for RE are proposed as well, to discover the out-of-pattern entities/relationships as many as possible. To verify the feasibility of the proposed approach, 65 MB MEP corpora have been collected as input of the proposed approach and an MEP semantic web which consists of 15,978 entities and 65,110 relationship triples established, with an accuracy of 81% to entities and 75% to relationship triples, respectively. A comparison experiment between classical deep learning models and the proposed rule-based approach was carried out, illustrating that the performance of our method is 37% and 49% better than the selected deep learning NER and RE models, respectively, in the aspect of extraction precision.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] From Open Information Extraction to Semantic Web: A Context Rule-Based Strategy
    Hernandez, Julio
    Lopez-Arevalo, Ivan
    Martinez-Rodriguez, Jose L.
    Aldana-Bobadilla, Edwyn
    MINING INTELLIGENCE AND KNOWLEDGE EXPLORATION, MIKE 2018, 2018, 11308 : 32 - 41
  • [2] Semantic rule-based information extraction for meteorological reports
    Mengmeng Cui
    Ruibin Huang
    Zhichen Hu
    Fan Xia
    Xiaolong Xu
    Lianyong Qi
    International Journal of Machine Learning and Cybernetics, 2024, 15 : 177 - 188
  • [3] Semantic rule-based information extraction for meteorological reports
    Cui, Mengmeng
    Huang, Ruibin
    Hu, Zhichen
    Xia, Fan
    Xu, Xiaolong
    Qi, Lianyong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (01) : 177 - 188
  • [4] Rule-based agents for the semantic web
    Dietrich, J.
    Kozlenkov, A.
    Schroeder, M.
    Wagner, G.
    Electronic Commerce Research and Applications, 2003, 2 (04) : 323 - 338
  • [5] Rule-based Semantic Web Services Annotation for Healthcare Information Integration
    Sonsilphong, Suphachoke
    Arch-int, Ngamnij
    Arch-int, Somjit
    2012 8TH INTERNATIONAL CONFERENCE ON COMPUTING AND NETWORKING TECHNOLOGY (ICCNT, INC, ICCIS AND ICMIC), 2012, : 147 - 152
  • [6] Rule-Based Trust Assessment on the Semantic Web
    Jacobi, Ian
    Kagal, Lalana
    Khandelwal, Ankesh
    RULE-BASED REASONING, PROGRAMMING, AND APPLICATIONS, 2011, 6826 : 227 - +
  • [7] Rule-based Semantic Web Services Matching Strategy
    Fan, Hong
    Wang, Zhihua
    MIPPR 2011: REMOTE SENSING IMAGE PROCESSING, GEOGRAPHIC INFORMATION SYSTEMS, AND OTHER APPLICATIONS, 2011, 8006
  • [8] Rule-based active domain brokering for the semantic Web
    Behrends, Erik
    Fritzen, Oliver
    Knabke, Tobias
    May, Wolfgang
    Schenk, Franz
    WEB REASONING AND RULE SYSTEMS, PROCEEDINGS, 2007, 4524 : 259 - +
  • [9] RULE RESPONDER: RULE-BASED AGENTS FOR THE SEMANTIC-PRAGMATIC WEB
    Paschke, Adrian
    Boley, Harold
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2011, 20 (06) : 1043 - 1081
  • [10] Rule-based adaptation of web information systems
    De Virgilio, Roberto
    Torlone, Riccardo
    Houben, Geert-Jan
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2007, 10 (04): : 443 - 470