Large Scale Financial Filing Analysis on HPCC Systems

被引:3
|
作者
Murray, Matthias [1 ]
Chala, Arjuna [2 ]
Xu, Lili [2 ]
Dev, Roger [3 ]
机构
[1] New Coll Florida, LexisNexis Risk Solut, Sarasota, FL 34243 USA
[2] LexisNexis Risk Solut, Atlanta, GA USA
[3] LexisNexis Risk Solut, Denver, CO USA
关键词
SEC; Sentiment Analysis; Natural Language Processing; HPCC Systems;
D O I
10.1109/BigData50022.2020.9378388
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Insights from public companies' financial filings are necessary for securities analysts and investors to make the right investment decisions. Synthesizing salient facts from such filings is a complex language task, especially now as the data volume is growing at an overwhelming pace. To ease human labor in this process, our work proposed a financial filing analysis pipeline which automatically scrapes financial filings, generates the embeddings of the contextual data and performs sentiment analysis in order to predict future performance of the underlying companies. The pipeline is built on top of Big Data processing platform HPCC Systems to enable the capability of processing large amounts of financial filings in a scalable and timely manner. By applying word embedding and machine learning models to a large amount of SEC financial filings, our pipeline is able to process 20 GB of XBRL files - 5,000 filing documents for more than 3,500 companies - into 50,000 sentence embeddings within 5 minutes and transform the same data to TF-IDF embedding in about 8 minutes. To test sentiment analysis, we randomly sampled and manually labeled 5,000 SEC filings. As a result, the sentiment analysis suggested that the usefulness of stock price as a metric is specific to each industry and overall market, but is usable as long as the scope of inquiry is sufficiently narrow. Additionally, while our model is trained only on 5,000 manually labeled filings with unigrams and a final loss of 0.09, the results of the sentiment analysis exhibited discriminatory power exceeding naive label selection through random or biased choice, suggesting that there is efficacy in using Natural Language Processing to analyze SEC filings.
引用
收藏
页码:4429 / 4436
页数:8
相关论文
共 50 条
  • [21] Analysis of confidential large-scale antenna systems
    Ali, Doaa S.
    Hburi, Ismail
    Fahad, Hasan
    Fahad, Kaffi
    PHYSICAL COMMUNICATION, 2021, 46
  • [22] Financial Analysis of A Large Scale Photovoltaic System and Its Impact on Distribution Feeders
    Lin, C. H.
    Hsieh, W. L.
    Chen, C. S.
    Ku, T. T.
    Tsai, C. T.
    2010 IEEE INDUSTRY APPLICATIONS SOCIETY ANNUAL MEETING, 2010,
  • [23] Large scale systems
    Lacy, SM
    Water Conservation, Reuse and Recycling, 2005, : 37 - 53
  • [24] The PigMix Benchmark on Pig, MapReduce, and HPCC Systems
    Ouaknine, Keren
    Carey, Michael
    Kirkpatrick, Scott
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 643 - 648
  • [25] Data Skew Profiling using HPCC Systems
    Mishra, Harsh
    Jayanth, S.
    Chala, Arjuna
    Camper, Dan
    Shobha, G.
    Shetty, Jyoti
    2019 INTERNATIONAL CONFERENCE ON BIG DATA AND EDUCATION (ICBDE 2019), 2019, : 66 - 69
  • [26] Regression Testing of GPU/MIC Systems for HPCC
    Reza, Hassan
    Aguilar, Michael
    Jalal, Sara Faraji
    2015 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON SOFTWARE ENGINEERING FOR HIGH PERFORMANCE COMPUTING IN SCIENCE (SE4HPCS), 2015, : 30 - 37
  • [27] Design and Development of IoT Plugin for HPCC Systems
    Vardhan, K. S. Amogh
    Jakaraddi, Manjunath
    Shobha, G.
    Shetty, Jyoti
    Chala, Arjuna
    Camper, Dan
    2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019), 2019, : 158 - 162
  • [28] FILING SYSTEMS FOR SURGERY
    TATHAM, L
    PRACTITIONER, 1966, S (JUN) : 27 - &
  • [29] FINANCIAL CONTROL OF LARGE-SCALE ENTERPRISE
    Bell, James Washington
    AMERICAN ECONOMIC REVIEW, 1939, 29 (01): : 109 - 117
  • [30] Large Scale Personalized Categorization of Financial Transactions
    Lesner, Christopher
    Ran, Alexander
    Rukonic, Marko
    Wang, Wei
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9365 - 9372