A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis

被引:53
|
作者
Saeed-Ul Hassan [1 ]
Safder, Iqra [1 ]
Akram, Anam [1 ]
Kamiran, Faisal [1 ]
机构
[1] Informat Technol Univ, 346-B,Ferozepur Rd, Lahore 54700, Pakistan
关键词
Knowledge flows; Machine learning; Citation context classification; Influential citations; Citation analysis; INFORMATION-SCIENCE; PATENT CITATIONS; INSTITUTIONS; SPECIALTY; DIFFUSION; SPACE; US;
D O I
10.1007/s11192-018-2767-x
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We measure the knowledge flows between countries by analysing publication and citation data, arguing that not all citations are equally important. Therefore, in contrast to existing techniques that utilize absolute citation counts to quantify knowledge flows between different entities, our model employs a citation context analysis technique, using a machine-learning approach to distinguish between important and non-important citations. We use 14 novel features (including context-based, cue words-based and text-based) to train a Support Vector Machine (SVM) and Random Forest classifier on an annotated dataset of 20,527 publications downloaded from the Association for Computational Linguistics anthology (http://allenai.org/data.html). Our machine-learning models outperform existing state-of-the-art citation context approaches, with the SVM model reaching up to 61% and the Random Forest model up to a very encouraging 90% Precision-Recall Area Under the Curve, with 10-fold cross-validation. Finally, we present a case study to explain our deployed method for datasets of PLoS ONE full-text publications in the field of Computer and Information Sciences. Our results show that a significant volume of knowledge flows from the United States, based on important citations, are consumed by the international scientific community. Of the total knowledge flow from China, we find a relatively smaller proportion (only 4.11%) falling into the category of knowledge flow based on important citations, while The Netherlands and Germany show the highest proportions of knowledge flows based on important citations, at 9.06 and 7.35% respectively. Among the institutions, interestingly, the findings show that at the University of Malaya more than 10% of the knowledge produced falls into the category of important. We believe that such analyses are helpful to understand the dynamics of the relevant knowledge flows across nations and institutions.
引用
收藏
页码:973 / 996
页数:24
相关论文
共 50 条
  • [31] Context-Based News Headlines Analysis Using Machine Learning Approach
    Rahman, Shadikur
    Hossain, Syeda Sumbul
    Islam, Saiful
    Chowdhury, Mazharul Islam
    Rafiq, Fatama Binta
    Badruzzaman, Khalid Been Md
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT II, 2019, 11684 : 167 - 178
  • [32] A machine-learning approach to automated knowledge-base building for remote sensing image analysis with GIS data
    Huang, XQ
    Jensen, JR
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 1997, 63 (10): : 1185 - 1194
  • [33] Secret Key Classification Based on Electromagnetic Analysis and Feature Extraction Using Machine-Learning Approach
    Mukhtar, Naila
    Kong, Yinan
    FUTURE NETWORK SYSTEMS AND SECURITY, FNSS 2018, 2018, 878 : 80 - 92
  • [34] Investigation of stratum corneum cell morphology and content using novel machine-learning image analysis
    Tohgasaki, Takeshi
    Aihara, Saki
    Ikeda, Mariko
    Takahashi, Minako
    Eto, Masaya
    Kudo, Riki
    Taira, Hiroshi
    Kido, Ai
    Kondo, Shinya
    Ishiwatari, Shioji
    SKIN RESEARCH AND TECHNOLOGY, 2024, 30 (02)
  • [35] Mental Health Predictive Analysis Using Machine-Learning Techniques
    Jain, Vanshika
    Kumari, Ritika
    Bansal, Poonam
    Dev, Amita
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 4, SMARTCOM 2024, 2024, 948 : 103 - 115
  • [36] Predicting the chemical reactivity of organic materials using a machine-learning approach
    Lee, Byungju
    Yoo, Jaekyun
    Kang, Kisuk
    CHEMICAL SCIENCE, 2020, 11 (30) : 7813 - 7822
  • [37] Detection of Colchicum autumnale in drone images, using a machine-learning approach
    Lukas Petrich
    Georg Lohrmann
    Matthias Neumann
    Fabio Martin
    Andreas Frey
    Albert Stoll
    Volker Schmidt
    Precision Agriculture, 2020, 21 : 1291 - 1303
  • [38] Prediction of bacterial associations with plants using a supervised machine-learning approach
    Manuel Martinez-Garcia, Pedro
    Lopez-Solanilla, Emilia
    Ramos, Cayo
    Rodriguez-Palenzuela, Pablo
    ENVIRONMENTAL MICROBIOLOGY, 2016, 18 (12) : 4847 - 4861
  • [39] Predicting obesity and smoking using medication data: A machine-learning approach
    Ali, Sitwat
    Na, Renhua
    Waterhouse, Mary
    Jordan, Susan J.
    Olsen, Catherine M.
    Whiteman, David C.
    Neale, Rachel E.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2022, 31 (01) : 91 - 99
  • [40] Predicting obesity and smoking using medication data: a machine-learning approach
    Ali, Sitwat
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2021, 50