Predicting innovative firms using web mining and deep learning

被引:28
|
作者
Kinne, Jan [1 ,2 ,3 ]
Lenz, David [3 ,4 ]
机构
[1] ZEW Ctr European Econ Res, Dept Econ Innovat & Ind Dynam, Mannheim, Germany
[2] Univ Salzburg, Dept Geoinformat Z GIS, Salzburg, Austria
[3] Istari Ai, Mannheim, Germany
[4] Justus Liebig Univ, Dept Econometr & Stat, Giessen, Germany
来源
PLOS ONE | 2021年 / 16卷 / 04期
关键词
PATENT STATISTICS; NEURAL-NETWORKS;
D O I
10.1371/journal.pone.0249071
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Evidence-based STI (science, technology, and innovation) policy making requires accurate indicators of innovation in order to promote economic growth. However, traditional indicators from patents and questionnaire-based surveys often lack coverage, granularity as well as timeliness and may involve high data collection costs, especially when conducted at a large scale. Consequently, they struggle to provide policy makers and scientists with the full picture of the current state of the innovation system. In this paper, we propose a first approach on generating web-based innovation indicators which may have the potential to overcome some of the shortcomings of traditional indicators. Specifically, we develop a method to identify product innovator firms at a large scale and very low costs. We use traditional firm-level indicators from a questionnaire-based innovation survey (German Community Innovation Survey) to train an artificial neural network classification model on labelled (product innovator/no product innovator) web texts of surveyed firms. Subsequently, we apply this classification model to the web texts of hundreds of thousands of firms in Germany to predict whether they are product innovators or not. We then compare these predictions to firm-level patent statistics, survey extrapolation benchmark data, and regional innovation indicators. The results show that our approach produces reliable predictions and has the potential to be a valuable and highly cost-efficient addition to the existing set of innovation indicators, especially due to its coverage and regional granularity.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Data Mining and Deep Learning for Predicting the Displacement of "Step-like" Landslides
    Miao, Fasheng
    Xie, Xiaoxu
    Wu, Yiping
    Zhao, Fancheng
    SENSORS, 2022, 22 (02)
  • [32] Mining the web for learning the ontology
    Aoun, Bassam M.
    Khair, Marie
    ICSOFT 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES, VOL ISDM/WSEHST/DC, 2007, : 189 - 192
  • [33] Web mining: Machine learning for Web applications
    Chen, HC
    Chau, M
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2004, 38 : 289 - 329
  • [34] Predicting bankruptcy of firms using earnings call data and transfer learning
    Siddiqui, Hafeez Ur Rehman
    Sainz de Abajo, Beatriz
    de la Torre Diez, Isabel
    Rustam, Furqan
    Raza, Amjad
    Atta, Sajjad
    Ashraf, Imran
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [35] Mining the deep Web for company information
    Ojala, M
    ONLINE, 2002, 26 (05): : 73 - 75
  • [36] A user-friendly R Shiny web app for predicting cancer genetic dependencies using deep learning
    Kasper, Michael J.
    Wang, Li-Ju
    Ning, Michael
    Huang, Yufei
    Chiu, Yu-Chiao
    CANCER RESEARCH, 2024, 84 (06)
  • [37] Web Table Retrieval using Multimodal Deep Learning
    Shraga, Roee
    Roitman, Haggai
    Feigenblat, Guy
    Cannim, Mustafa
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1399 - 1408
  • [38] Efficient Deep Web Crawling Using Reinforcement Learning
    Jiang, Lu
    Wu, Zhaohui
    Feng, Qian
    Liu, Jun
    Zheng, Qinghua
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I, PROCEEDINGS, 2010, 6118 : 428 - +
  • [39] Web attack detection using deep learning models
    Eunaicy, J. I. Christy
    Suguna, S.
    MATERIALS TODAY-PROCEEDINGS, 2022, 62 : 4806 - 4813
  • [40] Web Application Attacks Detection Using Deep Learning
    Montes, Nicolas
    Betarte, Gustavo
    Martinez, Rodrigo
    Pardo, Alvaro
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2021, 2021, 12702 : 227 - 236