A Supervised Machine Learning Based Approach for Automatically Extracting High-Level Threat Intelligence from Unstructured Sources

被引:31
|
作者
Ghazi, Yumna [1 ]
Anwar, Zahid [1 ]
Mumtaz, Rafia [1 ]
Saleem, Shahzad [1 ]
Tahir, Ali [1 ]
机构
[1] NUST, SEECS, Dept Comp, Islamabad, Pakistan
关键词
Cyber Threat Intelligence; Natural Language Processing; Tactics; Techniques and Procedures (TTPs); STIX; Indicators of Compromise;
D O I
10.1109/FIT.2018.00030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The last few years have seen a radical shift in the cyber defense paradigm from reactive to proactive, and this change is marked by the steadily increasing trend of Cyber Threat Intelligence (CTI) sharing. Currently, there are numerous Open Source Intelligence (OSINT) sources providing periodically updated threat feeds that are fed into various analytical solutions. At this point, there is an excessive amount of data being produced from such sources, both structured (STIX, OpenIOC, etc.) as well as unstructured (blacklists, etc.). However, more often than not, the level of detail required for making informed security decisions is missing from threat feeds, since most indicators are atomic in nature, like IPs and hashes, which are usually rather volatile. These feeds distinctly lack strategic threat information, like attack patterns and techniques that truly represent the behavior of an attacker or an exploit. Moreover, there is a lot of duplication in threat information and no single place where one could explore the entirety of a threat, hence requiring hundreds of man hours for sifting through numerous sources - trying to discern signal from noise - to find all the credible information on a threat. We have made use of natural language processing to extract threat feeds from unstructured cyber threat information sources with approximately 70% precision, providing comprehensive threat reports in standards like STIX, which is a widely accepted industry standard that represents CTI. The automation of an otherwise tedious manual task would ensure the timely gathering and sharing of relevant CTI that would give organizations the edge to be able to proactively defend against known as well as unknown threats.
引用
收藏
页码:129 / 134
页数:6
相关论文
共 50 条
  • [1] A Method for Extracting Unstructured Threat Intelligence Based on Dictionary Template and Reinforcement Learning
    Wang, Xuren
    Chen, Rong
    Song, Binghua
    Yang, Jie
    Jiang, Zhengwei
    Zhang, Xiaoqing
    Li, Xiaomeng
    Ao, Shengqin
    PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 262 - 267
  • [2] EX-Action: Automatically Extracting Threat Actions from Cyber Threat Intelligence Report Based on Multimodal Learning
    Zhang, Huixia
    Shen, Guowei
    Guo, Chun
    Cui, Yunhe
    Jiang, Chaohui
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [3] A machine learning-based FinTech cyber threat attribution framework using high-level indicators of compromise
    Noor, Umara
    Anwar, Zahid
    Amjad, Tehmina
    Choo, Kim-Kwang Raymond
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 96 : 227 - 242
  • [4] Extracting Diagnoses and Investigation Results from Unstructured Text in Electronic Health Records by Semi-Supervised Machine Learning
    Wang, Zhuoran
    Shah, Anoop D.
    Tate, A. Rosemary
    Denaxas, Spiros
    Shawe-Taylor, John
    Hemingway, Harry
    PLOS ONE, 2012, 7 (01):
  • [5] High-Level K-Nearest Neighbors (HLKNN): A Supervised Machine Learning Model for Classification Analysis
    Kiyak, Elife Ozturk
    Ghasemkhani, Bita
    Birant, Derya
    ELECTRONICS, 2023, 12 (18)
  • [6] Machine Learning Based Routing Congestion Prediction in FPGA High-Level Synthesis
    Zhao, Jieru
    Liang, Tingyuan
    Sinha, Sharad
    Zhang, Wei
    2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 1130 - 1135
  • [7] High-Level Online Power Monitoring of FPGA IP Based on Machine Learning
    Richa, Majdi
    Prevotet, Jean-Christophe
    Dardaillon, Mickael
    Mroue, Mohamad
    Samhat, Abed Ellatif
    DESIGN AND ARCHITECTURE FOR SIGNAL AND IMAGE PROCESSING, DASIP 2023, 2023, 13879 : 107 - 119
  • [8] Uncovering Hidden Threats: Automated, Machine Learning-based Discovery & Extraction of Cyber Threat Intelligence from Online Sources
    Ellinitakis, Rafail A.
    Fysarakis, Konstantinos
    Bountakas, Panagiotis
    Spanoudakis, George
    2024 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE, CSR, 2024, : 860 - 865
  • [9] NeuPow: A CAD Methodology for High-level Power Estimation Based on Machine Learning
    Nasser, Yehya
    Sau, Carlo
    Prevotet, Jean-Christophe
    Fanni, Tiziana
    Palumbo, Francesca
    Helard, Maryline
    Raffo, Luigi
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2020, 25 (05)
  • [10] High-Level Early Power Estimation of FPGA IP Based on Machine Learning
    Richa, Majdi
    Prevotet, Jean-Christophe
    Dardaillon, Mickael
    Mroue, Mohamad
    Samhat, Abed Ellatif
    2022 29TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (IEEE ICECS 2022), 2022,