Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques

被引:4
|
作者
Qinjun Qiu
Zhong Xie
Liang Wu
Liufeng Tao
机构
[1] China University of Geosciences,School of Geography and Information Engineering
[2] National Engineering Research Center of Geographic Information System,undefined
来源
Earth Science Informatics | 2020年 / 13卷
关键词
Geoscience document; Knowledge graph; Geological text mining; Natural language processing;
D O I
暂无
中图分类号
学科分类号
摘要
A large number of georeferenced quantitative data about rock and geoscience surveys are buried in geological documents and remain unused. Data analytics and information extraction offer opportunities to use this data for improved understanding of ore forming processes and to enhance our knowledge. Extracting spatiotemporal and semantic information from a set of geological documents enables us to develop a rich representation of the geoscience knowledge recorded in unstructured text written in Chinese. This paper presents the workflow for spatiotemporal and semantic information extraction, which is a geological document analysis approach that uses automated techniques for browsing and searching relevant geological content. The developed workflow applies spatial and temporal gazetteer matching, pattern-based rules and spatiotemporal relationship extraction to identify and label terms in geological text documents. It offers a representation of contextual information in knowledge graph form, extracts a set of relevant tables and figures, and queries a list of relevant documents by using geological topic information. Here, text mining techniques are used to facilitate the analysis of geological knowledge and to show the effectiveness of text analysis for improving the rapid assessment of a massive number of documents. Furthermore, autogenerated keyword suggestions derived from extracted keyword associations are used to reduce document search efforts. This research illustrates the usefulness and effectiveness of the developed information extraction workflow and demonstrates the potential of incorporating text mining and NLP techniques for geoscience.
引用
收藏
页码:1393 / 1410
页数:17
相关论文
共 50 条
  • [41] Information Extraction from Unstructured Data using RDF
    Gandhi, Kalgi
    Madia, Nidhi
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ICT IN BUSINESS INDUSTRY & GOVERNMENT (ICTBIG), 2016,
  • [42] Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports
    Malashin, Ivan
    Masich, Igor
    Tynchenko, Vadim
    Gantimurov, Andrei
    Nelyub, Vladimir
    Borodulin, Aleksei
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (02): : 1361 - 1377
  • [43] Associative Feature Information Extraction Using Text Mining from Health Big Data
    Kim, Joo-Chang
    Chung, Kyungyong
    WIRELESS PERSONAL COMMUNICATIONS, 2019, 105 (02) : 691 - 707
  • [44] Associative Feature Information Extraction Using Text Mining from Health Big Data
    Joo-Chang Kim
    Kyungyong Chung
    Wireless Personal Communications, 2019, 105 : 691 - 707
  • [45] CyNER: Information Extraction from Unstructured Text of CTI Sources with Noncontextual IOCs
    Fujii, Shota
    Kawaguchi, Nobutaka
    Shigemoto, Tomohiro
    Yamauchi, Toshihiro
    ADVANCES IN INFORMATION AND COMPUTER SECURITY, IWSEC 2022, 2022, 13504 : 85 - 104
  • [46] Automatic Extraction of Engineering Rules From Unstructured Text: A Natural Language Processing Approach
    Ye, Xinfeng
    Lu, Yuqian
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2020, 20 (03)
  • [47] TTPDrill: Automatic and Accurate Extraction of Threat Actions from Unstructured Text of CTI Sources
    Husari, Ghaith
    Al -Shaer, Ehab
    Ahmed, Mohiuddin
    Chu, Bill
    Niu, Xi
    33RD ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2017), 2017, : 103 - 115
  • [48] Automatic Open Domain Information Extraction from Indonesian Text
    Gultom, Yohanes
    Wibowo, Wahyu Catur
    2017 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS 2017), 2017, : 23 - 30
  • [49] Deep learning for automatic extraction of tumor site and histology from unstructured pathology reports
    Mitchell, Ross
    Howard, Rachel
    Lewis, Patricia
    Fellows, Katie
    Jones, Jennie
    Reisman, Phillip
    Fridley, Brooke
    Rollison, Dana
    CANCER RESEARCH, 2020, 80 (16)
  • [50] Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks
    Alawad, Mohammed
    Gao, Shang
    Qiu, John X.
    Yoon, Hong Jun
    Christian, J. Blair
    Penberthy, Lynne
    Mumphrey, Brent
    Wu, Xiao-Cheng
    Coyle, Linda
    Tourassi, Georgia
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (01) : 89 - 98