Large Scale Financial Filing Analysis on HPCC Systems

被引:3
|
作者
Murray, Matthias [1 ]
Chala, Arjuna [2 ]
Xu, Lili [2 ]
Dev, Roger [3 ]
机构
[1] New Coll Florida, LexisNexis Risk Solut, Sarasota, FL 34243 USA
[2] LexisNexis Risk Solut, Atlanta, GA USA
[3] LexisNexis Risk Solut, Denver, CO USA
关键词
SEC; Sentiment Analysis; Natural Language Processing; HPCC Systems;
D O I
10.1109/BigData50022.2020.9378388
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Insights from public companies' financial filings are necessary for securities analysts and investors to make the right investment decisions. Synthesizing salient facts from such filings is a complex language task, especially now as the data volume is growing at an overwhelming pace. To ease human labor in this process, our work proposed a financial filing analysis pipeline which automatically scrapes financial filings, generates the embeddings of the contextual data and performs sentiment analysis in order to predict future performance of the underlying companies. The pipeline is built on top of Big Data processing platform HPCC Systems to enable the capability of processing large amounts of financial filings in a scalable and timely manner. By applying word embedding and machine learning models to a large amount of SEC financial filings, our pipeline is able to process 20 GB of XBRL files - 5,000 filing documents for more than 3,500 companies - into 50,000 sentence embeddings within 5 minutes and transform the same data to TF-IDF embedding in about 8 minutes. To test sentiment analysis, we randomly sampled and manually labeled 5,000 SEC filings. As a result, the sentiment analysis suggested that the usefulness of stock price as a metric is specific to each industry and overall market, but is usable as long as the scope of inquiry is sufficiently narrow. Additionally, while our model is trained only on 5,000 manually labeled filings with unigrams and a final loss of 0.09, the results of the sentiment analysis exhibited discriminatory power exceeding naive label selection through random or biased choice, suggesting that there is efficacy in using Natural Language Processing to analyze SEC filings.
引用
收藏
页码:4429 / 4436
页数:8
相关论文
共 50 条
  • [41] STABILITY ANALYSIS OF LARGE-SCALE INTEGRODIFFERENTIAL-SYSTEMS
    LIU, XZ
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1994, 188 (01) : 258 - 274
  • [42] Automating Workload Analysis of Large-Scale Supercomputer Systems
    P. A. Shvets
    V. V. Voevodin
    S. A. Zhumatiy
    Lobachevskii Journal of Mathematics, 2021, 42 : 1547 - 1559
  • [43] Automating Workload Analysis of Large-Scale Supercomputer Systems
    Shvets, P. A.
    Voevodin, V. V.
    Zhumatiy, S. A.
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2021, 42 (07) : 1547 - 1559
  • [44] DIGRAPH ANALYSIS OF LARGE-SCALE SYSTEMS - SYSTEM PRIMITIVE
    EVANS, FJ
    SCHIZAS, C
    ELECTRONICS LETTERS, 1979, 15 (20) : 613 - 614
  • [45] PARAMETER-PLANE ANALYSIS FOR LARGE-SCALE SYSTEMS
    SELTZER, SM
    ASNER, BA
    JACKSON, RL
    JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1982, 5 (02) : 158 - 163
  • [46] ANALYSIS OF LARGE-SCALE WATER DISTRIBUTION-SYSTEMS
    SARIKELLE, S
    CHUANG, YT
    JOURNAL OF HYDRAULIC RESEARCH, 1991, 29 (01) : 5 - 13
  • [47] Analysis of stability of solutions of large-scale pulse systems
    Martynyuk, A.A.
    Miladzhanov, V.G.
    Engineering Simulation, 1994, 11 (06):
  • [48] Analysis of large scale interacting systems by mean field method
    Bobbio, Andrea
    Gribaudo, Marco
    Telek, Miklos
    QUANTITATIVE EVALUATION OF SYSTEMS: QEST 2008, PROCEEDINGS, 2008, : 215 - +
  • [49] An analysis of operations efficiency in large-scale distribution systems
    Ross, AD
    Droge, C
    JOURNAL OF OPERATIONS MANAGEMENT, 2004, 21 (06) : 673 - 688
  • [50] Frequency Response Analysis of Large-Scale Biological Systems
    Moriyama, Takaaki
    Nakakuki, Takashi
    2012 12TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2012, : 1542 - 1545