Large Scale Financial Filing Analysis on HPCC Systems

被引:3
|
作者
Murray, Matthias [1 ]
Chala, Arjuna [2 ]
Xu, Lili [2 ]
Dev, Roger [3 ]
机构
[1] New Coll Florida, LexisNexis Risk Solut, Sarasota, FL 34243 USA
[2] LexisNexis Risk Solut, Atlanta, GA USA
[3] LexisNexis Risk Solut, Denver, CO USA
关键词
SEC; Sentiment Analysis; Natural Language Processing; HPCC Systems;
D O I
10.1109/BigData50022.2020.9378388
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Insights from public companies' financial filings are necessary for securities analysts and investors to make the right investment decisions. Synthesizing salient facts from such filings is a complex language task, especially now as the data volume is growing at an overwhelming pace. To ease human labor in this process, our work proposed a financial filing analysis pipeline which automatically scrapes financial filings, generates the embeddings of the contextual data and performs sentiment analysis in order to predict future performance of the underlying companies. The pipeline is built on top of Big Data processing platform HPCC Systems to enable the capability of processing large amounts of financial filings in a scalable and timely manner. By applying word embedding and machine learning models to a large amount of SEC financial filings, our pipeline is able to process 20 GB of XBRL files - 5,000 filing documents for more than 3,500 companies - into 50,000 sentence embeddings within 5 minutes and transform the same data to TF-IDF embedding in about 8 minutes. To test sentiment analysis, we randomly sampled and manually labeled 5,000 SEC filings. As a result, the sentiment analysis suggested that the usefulness of stock price as a metric is specific to each industry and overall market, but is usable as long as the scope of inquiry is sufficiently narrow. Additionally, while our model is trained only on 5,000 manually labeled filings with unigrams and a final loss of 0.09, the results of the sentiment analysis exhibited discriminatory power exceeding naive label selection through random or biased choice, suggesting that there is efficacy in using Natural Language Processing to analyze SEC filings.
引用
收藏
页码:4429 / 4436
页数:8
相关论文
共 50 条
  • [1] Propensity for premature filing for judicial financial recovery in large-scale agriculture in Brazil
    Ortiz, Antonio Carlos
    Monaco, Henrique
    Machado, Vitor
    Boehlje, Michael
    INTERNATIONAL FOOD AND AGRIBUSINESS MANAGEMENT REVIEW, 2021, 24 (04): : 637 - 648
  • [2] ANALYSIS AND MEASUREMENT OF FILING SYSTEMS
    PARENT, C
    RAIRO-INFORMATIQUE-COMPUTER SCIENCE, 1977, 11 (03): : 235 - 254
  • [3] Requirements analysis for large scale systems
    Johnson, Roger
    Roussos, George
    Tagliati, Luca Vetti
    JOURNAL OF OBJECT TECHNOLOGY, 2008, 7 (08): : 119 - 137
  • [4] HPCC Support to Campaign Level Analysis "HPCC Solving the Problem"
    Barnes, Steven
    Crino, John
    Smetek, Timothy E.
    PROCEEDINGS OF THE HPCMP USERS GROUP CONFERENCE 2008, 2008, : 371 - 375
  • [5] ANALYSIS OF LARGE-SCALE ECOLOGICAL SYSTEMS
    KERR, SR
    NEAL, MW
    JOURNAL OF THE FISHERIES RESEARCH BOARD OF CANADA, 1976, 33 (09): : 2083 - 2089
  • [6] FILING SYSTEMS
    TATHAM, L
    PRACTITIONER, 1967, S : 47 - &
  • [7] FILING SYSTEMS
    TATHAM, L
    PRACTITIONER, 1967, S : 48 - &
  • [8] Financial statement filing lags: An empirical analysis among small firms
    Luypaert, Mathieu
    Van Caneghem, Tom
    Van Uytbergen, Steve
    INTERNATIONAL SMALL BUSINESS JOURNAL-RESEARCHING ENTREPRENEURSHIP, 2016, 34 (04): : 506 - 531
  • [9] Inference for large financial systems
    Giesecke, Kay
    Schwenkler, Gustavo
    Sirignano, Justin A.
    MATHEMATICAL FINANCE, 2020, 30 (01) : 3 - 46
  • [10] Structural Analysis of Large-Scale Power Systems
    Zhang, K. F.
    Dai, X. Z.
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2012, 2012