AECR: Automatic attack technique intelligence extraction based on fine-tuned large language model

被引:0
|
作者
Chen, Minghao [1 ,2 ]
Zhu, Kaijie [3 ,4 ]
Lu, Bin [1 ,2 ]
Li, Ding [1 ,2 ]
Yuan, Qingjun [1 ,2 ]
Zhu, Yuefei [1 ,2 ]
机构
[1] Minist Educ, Key Lab Cyberspace Secur, Zhengzhou 450001, Peoples R China
[2] Henan Key Lab Network Cryptog Technol, Zhengzhou 450001, Peoples R China
[3] Henan Key Lab Cyberspace Situat Awareness, Zhengzhou 450001, Peoples R China
[4] Zhongguancun Lab, Beijing 100194, Peoples R China
关键词
Cyber threat intelligence (CTI); Attack technique extraction; Prompt engineering; Large language model (LLM); Advanced persistent threat (APT);
D O I
10.1016/j.cose.2024.104213
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cyber Threat Intelligence (CTI) reports contain resourceful intelligence on cyber-attack campaigns, which provides great help for security analysts to infer attack trends and enhance their defenses. However, due to the diversity of report content and writing styles, current intelligence extraction is mostly based on time-consuming manual efforts. Moreover, existing automatic methods generally neglect the importance of background knowledge and produce inexact extraction results. These problems prevent the effective utilization and sharing of intelligence from CTI reports. In this paper, we primarily focus on the automatic extraction of attack technique (AT) intelligence, which reveals patterns of attack behaviors and hardly changes over time. We propose a novel automatic AT extraction pipeline for CTI reports (AECR). AECR explores the feasibility of extracting AT intelligence based on a fined-tuned large language model (LLM). Particularly, we endow the selected LLM with enhanced domain-specific knowledge to improve its comprehension of AT-relevant content and alleviate the hallucination problem. Experimental results demonstrate that AECR outperforms state-of-theart methods by a wide margin with a reasonable time cost. Specifically, we improve the accuracy, precision, recall, and F1-score by 108%, 37.2%, 22.4%, and 67.5% respectively. To the best of our knowledge, AECR is the first to perform AT extraction based on fine-tuned LLM.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Classification of Interventional Radiology Reports into Technique Categories with a Fine-Tuned Large Language Model
    Yasaka, Koichiro
    Nomura, Takuto
    Kamohara, Jun
    Hirakawa, Hiroshi
    Kubo, Takatoshi
    Kiryu, Shigeru
    Abe, Osamu
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024,
  • [2] CentralBankRoBERTa: A fine-tuned large language model for central bank communications☆
    Pfeifer, Moritz
    Marohl, Vincent P.
    JOURNAL OF FINANCE AND DATA SCIENCE, 2023, 9
  • [3] Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks
    Luo, Ling
    Ning, Jinzhong
    Zhao, Yingwen
    Wang, Zhijun
    Ding, Zeyuan
    Chen, Peng
    Fu, Weiru
    Han, Qinyu
    Xu, Guangtao
    Qiu, Yunzhi
    Pan, Dinghao
    Li, Jiru
    Li, Hao
    Feng, Wenduo
    Tu, Senbo
    Liu, Yuqi
    Yang, Zhihao
    Wang, Jian
    Sun, Yuanyuan
    Lin, Hongfei
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 1865 - 1874
  • [4] Exploring Generalizability of a fine-tuned Large Language Model for Impression Generation in PET Reports
    Yousefirizi, F.
    Wang, L.
    Gowdy, C.
    Shariftabrizi, A.
    Harsini, S.
    Ahamed, S.
    Sabouri, M.
    Mollaheydar, E.
    Rahmim, A.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2024, 51 : S785 - S785
  • [5] EpilepsyLLM: Domain-Specific Large Language Model Fine-tuned with Epilepsy Medical Knowledge
    Zhao, Xuyang
    Zhao, Qibin
    Tanaka, Toshihisa
    arXiv,
  • [6] A fine-tuned large language model based molecular dynamics agent for code generation to obtain material thermodynamic parameters
    Zhuofan Shi
    Chunxiao Xin
    Tong Huo
    Yuntao Jiang
    Bowen Wu
    Xingyue Chen
    Wei Qin
    Xinjian Ma
    Gang Huang
    Zhenyu Wang
    Xiang Jing
    Scientific Reports, 15 (1)
  • [7] Website Category Classification Using Fine-tuned BERT Language Model
    Demirkiran, Ferhat
    Cayir, Aykut
    Unal, Ugur
    Dag, Hasan
    2020 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2020, : 333 - 336
  • [8] Arabic sarcasm detection: An enhanced fine-tuned language model approach
    Galal, Mohamed A.
    Yousef, Ahmed Hassan
    Zayed, Hala H.
    Medhat, Walaa
    AIN SHAMS ENGINEERING JOURNAL, 2024, 15 (06)
  • [9] MIRA-ChatGLM: A Fine-Tuned Large Language Model for Intelligent Risk Assessment in Coal Mining
    Sun, Yi
    Zhang, Chao
    Wang, Chen
    Han, Ying
    APPLIED SCIENCES-BASEL, 2024, 14 (24):
  • [10] Extracting structured data from organic synthesis procedures using a fine-tuned large language model
    Ai, Qianxiang
    Meng, Fanwang
    Shi, Jiale
    Pelkie, Brenden
    Coley, Connor W.
    DIGITAL DISCOVERY, 2024, 3 (09): : 1822 - 1831