Large Scale Financial Filing Analysis on HPCC Systems

被引:3
|
作者
Murray, Matthias [1 ]
Chala, Arjuna [2 ]
Xu, Lili [2 ]
Dev, Roger [3 ]
机构
[1] New Coll Florida, LexisNexis Risk Solut, Sarasota, FL 34243 USA
[2] LexisNexis Risk Solut, Atlanta, GA USA
[3] LexisNexis Risk Solut, Denver, CO USA
关键词
SEC; Sentiment Analysis; Natural Language Processing; HPCC Systems;
D O I
10.1109/BigData50022.2020.9378388
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Insights from public companies' financial filings are necessary for securities analysts and investors to make the right investment decisions. Synthesizing salient facts from such filings is a complex language task, especially now as the data volume is growing at an overwhelming pace. To ease human labor in this process, our work proposed a financial filing analysis pipeline which automatically scrapes financial filings, generates the embeddings of the contextual data and performs sentiment analysis in order to predict future performance of the underlying companies. The pipeline is built on top of Big Data processing platform HPCC Systems to enable the capability of processing large amounts of financial filings in a scalable and timely manner. By applying word embedding and machine learning models to a large amount of SEC financial filings, our pipeline is able to process 20 GB of XBRL files - 5,000 filing documents for more than 3,500 companies - into 50,000 sentence embeddings within 5 minutes and transform the same data to TF-IDF embedding in about 8 minutes. To test sentiment analysis, we randomly sampled and manually labeled 5,000 SEC filings. As a result, the sentiment analysis suggested that the usefulness of stock price as a metric is specific to each industry and overall market, but is usable as long as the scope of inquiry is sufficiently narrow. Additionally, while our model is trained only on 5,000 manually labeled filings with unigrams and a final loss of 0.09, the results of the sentiment analysis exhibited discriminatory power exceeding naive label selection through random or biased choice, suggesting that there is efficacy in using Natural Language Processing to analyze SEC filings.
引用
收藏
页码:4429 / 4436
页数:8
相关论文
共 50 条
  • [31] Large Scale App Recommendation in Ant Financial
    Chen, Chaochao
    Yang, Xinxing
    Wang, Li
    Zhou, Jun
    Li, Xiaolong
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4733 - 4735
  • [32] A METHOD OF STABILITY ANALYSIS OF NONLINEAR LARGE-SCALE SYSTEMS
    Martynyuk, A. A.
    Mullazhonov, P. V.
    INTERNATIONAL APPLIED MECHANICS, 2010, 46 (05) : 596 - 603
  • [33] STABILITY ANALYSIS OF LARGE-SCALE SYSTEMS WITH MULTIPLE DELAYS
    ZHANG, ZJ
    QIAN, ZY
    ACTA MATHEMATICA SCIENTIA, 1985, 5 (03) : 309 - 317
  • [34] Large-Scale Vehicle Sharing Systems: Analysis of Velib'
    Nair, Rahul
    Miller-Hooks, Elise
    Hampshire, Robert C.
    Busic, Ana
    INTERNATIONAL JOURNAL OF SUSTAINABLE TRANSPORTATION, 2013, 7 (01) : 85 - 106
  • [35] ANALYSIS OF LARGE SCALE LINEAR SYSTEMS BY DIAKOPTICS AND EIGENVALUES.
    Ram, S.K.
    Wang, K.U.
    Journal Water Pollution Control Federation, 1980, : 567 - 568
  • [36] Hierarchical Robust Performance Analysis of Uncertain Large Scale Systems
    Laib, Khaled
    Korniienko, Anton
    Dinh, Marc
    Scorletti, Gerard
    Morel, Florent
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 2075 - 2090
  • [37] Critical Department Analysis for Large-Scale Outpatient Systems
    Zou, Chengye
    Wang, Junwei
    Cheng, Yao
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (06) : 3194 - 3203
  • [38] H∞ Performance Analysis of Large-Scale Networked Systems
    Guan, Rongxing
    Liu, Huabo
    Huang, Keke
    Yu, Haisheng
    IEEE SYSTEMS JOURNAL, 2024, 18 (03): : 1528 - 1537
  • [39] Foreword: Special issue on Large-Scale Systems Analysis
    Royston, Katherine
    Fusion Science and Technology, 2023, 79 (03)
  • [40] AN ANALYSIS OF THE FORECASTING FUNCTION IN LARGE-SCALE INVENTORY SYSTEMS
    MOORE, RI
    COX, JF
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 1992, 30 (09) : 1987 - 2010