Usage of the Term Big Data in Biomedical Publications: A Text Mining Approach

被引:0
|
作者
van Altena, Allard J. [1 ]
Moerland, Perry D. [1 ]
Zwinderman, Aeilko H. [1 ]
Delgado Olabarriaga, Silvia [1 ]
机构
[1] Univ Amsterdam, Amsterdam UMC, Dept Clin Epidemiol Biostat & Bioinformat, Meibergdreef 9, NL-1105 AZ Amsterdam, Netherlands
关键词
Big Data; Big Data Aspects; hype; biomedical literature; text mining; Lasso Regression;
D O I
10.3390/bdcc3010013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we attempt to assess the value of the term Big Data when used by researchers in their publications. For this purpose, we systematically collected a corpus of biomedical publications that use and do not use the term Big Data. These documents were used as input to a machine learning classifier to determine how well they can be separated into two groups and to determine the most distinguishing classification features. We generated 100 classifiers that could correctly distinguish between Big Data and non-Big Data documents with an area under the Receiver Operating Characteristic (ROC) curve of 0.96. The differences between the two groups were characterized by terms specific to Big Data themes-such as 'computational', 'mining', and 'challenges'-and also by terms that indicate the research field, such as 'genomics'. The ROC curves when plotted for various time intervals showed no difference over time. We conclude that there is a detectable and stable difference between publications that use the term Big Data and those that do not. Furthermore, the use of the term Big Data within a publication seems to indicate a distinct type of research in the biomedical field. Therefore, we conclude that value can be attributed to the term Big Data when used in a publication and this value has not changed over time.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [31] Big Data Analytics, Text Mining and Modern English Language
    Alam, Saqib
    Yao, Nianmin
    JOURNAL OF GRID COMPUTING, 2019, 17 (02) : 357 - 366
  • [32] Systematic Characterizations of Text Similarity in Full Text Biomedical Publications
    Sun, Zhaohui
    Errami, Mounir
    Long, Tara
    Renard, Chris
    Choradia, Nishant
    Garner, Harold
    PLOS ONE, 2010, 5 (09): : 1 - 6
  • [33] A Data Mining Based Approach for Collaborative Analysis of Biomedical Data
    Tsiliki, Georgia
    Kossida, Sophia
    Friesen, Natalja
    Rueping, Stefan
    Tzagarakis, Manolis
    Karacapilidis, Nikos
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2014, 23 (04)
  • [34] On the Power of Big Data: Mining Structures from Massive, Unstructured Text Data
    Han, Jiawei
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 4 - 4
  • [35] tmVar: a text mining approach for extracting sequence variants in biomedical literature
    Wei, Chih-Hsuan
    Harris, Bethany R.
    Kao, Hung-Yu
    Lu, Zhiyong
    BIOINFORMATICS, 2013, 29 (11) : 1433 - 1439
  • [36] CGM: A Biomedical Text Categorization Approach Using Concept Graph Mining
    Bleik, Said
    Song, Min
    Smalter, Aaron
    Huan, Jun
    Lushington, Gerald
    BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 37 - +
  • [37] Big Data Framework for Scalable and Efficient Biomedical Literature Mining in the Cloud
    Shen, Zhengru
    Wang, Xi
    Spruit, Marco
    NLPIR 2019: 2019 3RD INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, 2019, : 80 - 86
  • [38] DTMBIO 2013: International Workshop on Data and Text Mining in Biomedical Informatics
    Butte, Atul
    Lee, Doheon
    Xu, Hua
    Song, Min
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013,
  • [39] Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery
    Gonzalez, Graciela H.
    Tahsin, Tasnia
    Goodale, Britton C.
    Greene, Anna C.
    Greene, Casey S.
    BRIEFINGS IN BIOINFORMATICS, 2016, 17 (01) : 33 - 42
  • [40] Efficient Retrieval of Text for Biomedical Domain using Data Mining Algorithm
    Vashishta, Sumit
    Jain, Yogendra Kumar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2011, 2 (04) : 77 - 80