AECR: Automatic attack technique intelligence extraction based on fine-tuned large language model

被引:0
|
作者
Chen, Minghao [1 ,2 ]
Zhu, Kaijie [3 ,4 ]
Lu, Bin [1 ,2 ]
Li, Ding [1 ,2 ]
Yuan, Qingjun [1 ,2 ]
Zhu, Yuefei [1 ,2 ]
机构
[1] Minist Educ, Key Lab Cyberspace Secur, Zhengzhou 450001, Peoples R China
[2] Henan Key Lab Network Cryptog Technol, Zhengzhou 450001, Peoples R China
[3] Henan Key Lab Cyberspace Situat Awareness, Zhengzhou 450001, Peoples R China
[4] Zhongguancun Lab, Beijing 100194, Peoples R China
关键词
Cyber threat intelligence (CTI); Attack technique extraction; Prompt engineering; Large language model (LLM); Advanced persistent threat (APT);
D O I
10.1016/j.cose.2024.104213
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cyber Threat Intelligence (CTI) reports contain resourceful intelligence on cyber-attack campaigns, which provides great help for security analysts to infer attack trends and enhance their defenses. However, due to the diversity of report content and writing styles, current intelligence extraction is mostly based on time-consuming manual efforts. Moreover, existing automatic methods generally neglect the importance of background knowledge and produce inexact extraction results. These problems prevent the effective utilization and sharing of intelligence from CTI reports. In this paper, we primarily focus on the automatic extraction of attack technique (AT) intelligence, which reveals patterns of attack behaviors and hardly changes over time. We propose a novel automatic AT extraction pipeline for CTI reports (AECR). AECR explores the feasibility of extracting AT intelligence based on a fined-tuned large language model (LLM). Particularly, we endow the selected LLM with enhanced domain-specific knowledge to improve its comprehension of AT-relevant content and alleviate the hallucination problem. Experimental results demonstrate that AECR outperforms state-of-theart methods by a wide margin with a reasonable time cost. Specifically, we improve the accuracy, precision, recall, and F1-score by 108%, 37.2%, 22.4%, and 67.5% respectively. To the best of our knowledge, AECR is the first to perform AT extraction based on fine-tuned LLM.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] TopicBERT: A Topic-Enhanced Neural Language Model Fine-Tuned for Sentiment Classification
    Zhou, Yuxiang
    Liao, Lejian
    Gao, Yang
    Wang, Rui
    Huang, Heyan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) : 380 - 393
  • [32] Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large Language Models
    Foley, Myles
    Rawat, Ambrish
    Lee, Taesung
    Hou, Yufang
    Picco, Gabriele
    Zizzo, Giulio
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7423 - 7442
  • [33] Enhancing Solution Diversity in Arithmetic Problems using Fine-Tuned AI Language Model
    Lee, Chang-Yu
    Lai, I-Wei
    2024 11TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN, ICCE-TAIWAN 2024, 2024, : 515 - 516
  • [34] Need of Fine-Tuned Radiology Aware Open-Source Large Language Models for Neuroradiology
    Ray, Partha Pratim
    CLINICAL NEURORADIOLOGY, 2024,
  • [35] Comparing Fine-Tuned Transformers and Large Language Models for Sales Call Classification: A Case Study
    Eisenstadt, Roy
    Asi, Abedelkader
    Ronen, Royi
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5240 - 5241
  • [36] Fine-tuned large language models can generate expert-level echocardiography reports
    Sowa, Achille
    Avram, Robert
    EUROPEAN HEART JOURNAL - DIGITAL HEALTH, 2024, 6 (01): : 5 - 6
  • [37] RankMean: Module-Level Importance Score for Merging Fine-tuned Large Language Models
    Perin, Gabriel J.
    Chen, Xuxi
    Liu, Shusen
    Kailkhura, Bhavya
    Wang, Zhangyang
    Gallagher, Brian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 1776 - 1782
  • [38] Mining Insights from Large-Scale Corpora Using Fine-Tuned Language Models
    Palakodety, Shriphani
    KhudaBukhsh, Ashiqur R.
    Carbonell, Jaime G.
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1890 - 1897
  • [39] Fine-Tuned Large Language Model for Extracting Patients on Pretreatment for Lung Cancer from a Picture Archiving and Communication System Based on Radiological Reports
    Yasaka, Koichiro
    Kanzawa, Jun
    Kanemaru, Noriko
    Koshino, Saori
    Abe, Osamu
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2025, 38 (01): : 327 - 334
  • [40] ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
    Li, Yunxiang
    Li, Zihan
    Zhang, Kai
    Dan, Ruilong
    Jiang, Steve
    Zhang, You
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (06)