AECR: Automatic attack technique intelligence extraction based on fine-tuned large language model

被引:0
|
作者
Chen, Minghao [1 ,2 ]
Zhu, Kaijie [3 ,4 ]
Lu, Bin [1 ,2 ]
Li, Ding [1 ,2 ]
Yuan, Qingjun [1 ,2 ]
Zhu, Yuefei [1 ,2 ]
机构
[1] Minist Educ, Key Lab Cyberspace Secur, Zhengzhou 450001, Peoples R China
[2] Henan Key Lab Network Cryptog Technol, Zhengzhou 450001, Peoples R China
[3] Henan Key Lab Cyberspace Situat Awareness, Zhengzhou 450001, Peoples R China
[4] Zhongguancun Lab, Beijing 100194, Peoples R China
关键词
Cyber threat intelligence (CTI); Attack technique extraction; Prompt engineering; Large language model (LLM); Advanced persistent threat (APT);
D O I
10.1016/j.cose.2024.104213
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cyber Threat Intelligence (CTI) reports contain resourceful intelligence on cyber-attack campaigns, which provides great help for security analysts to infer attack trends and enhance their defenses. However, due to the diversity of report content and writing styles, current intelligence extraction is mostly based on time-consuming manual efforts. Moreover, existing automatic methods generally neglect the importance of background knowledge and produce inexact extraction results. These problems prevent the effective utilization and sharing of intelligence from CTI reports. In this paper, we primarily focus on the automatic extraction of attack technique (AT) intelligence, which reveals patterns of attack behaviors and hardly changes over time. We propose a novel automatic AT extraction pipeline for CTI reports (AECR). AECR explores the feasibility of extracting AT intelligence based on a fined-tuned large language model (LLM). Particularly, we endow the selected LLM with enhanced domain-specific knowledge to improve its comprehension of AT-relevant content and alleviate the hallucination problem. Experimental results demonstrate that AECR outperforms state-of-theart methods by a wide margin with a reasonable time cost. Specifically, we improve the accuracy, precision, recall, and F1-score by 108%, 37.2%, 22.4%, and 67.5% respectively. To the best of our knowledge, AECR is the first to perform AT extraction based on fine-tuned LLM.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Automated classification of brain MRI reports using fine-tuned large language models
    Kanzawa, Jun
    Yasaka, Koichiro
    Fujita, Nana
    Fujiwara, Shin
    Abe, Osamu
    NEURORADIOLOGY, 2024, 66 (12) : 2177 - 2183
  • [22] Generating Software Tests for Mobile Applications Using Fine-Tuned Large Language Models
    Hoffmann, Jacob
    Frister, Demian
    PROCEEDINGS OF THE 2024 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST, AST 2024, 2024, : 76 - 77
  • [23] Efficacy of Fine-Tuned Large Language Model in CT Protocol Assignment as Clinical Decision-Supporting System
    Kanemaru, Noriko
    Yasaka, Koichiro
    Okimoto, Naomasa
    Sato, Mai
    Nomura, Takuto
    Morita, Yuichi
    Katayama, Akira
    Kiryu, Shigeru
    Abe, Osamu
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2025,
  • [24] The Impact of AUTOGEN and Similar Fine-Tuned Large Language Models on the Integrity of Scholarly Writing
    Resnik, David B.
    Hosseini, Mohammad
    AMERICAN JOURNAL OF BIOETHICS, 2023, 23 (10): : 50 - 52
  • [25] Comparative Analysis of Generic and Fine-Tuned Large Language Models for Conversational Agent Systems
    Villa, Laura
    Carneros-Prado, David
    Dobrescu, Cosmin C.
    Sanchez-Miguel, Adrian
    Cubero, Guillermo
    Hervas, Ramon
    ROBOTICS, 2024, 13 (05)
  • [26] Understanding language-elicited EEG data by predicting it from a fine-tuned language model
    Schwartz, Dan
    Mitchell, Tom
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 43 - 57
  • [27] Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues
    Li, Chuyuan
    Huber, Patrick
    Xiao, Wen
    Amblard, Maxime
    Braud, Chloe
    Carenini, Giuseppe
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2562 - 2579
  • [28] CD-LLMCARS: Cross Domain Fine-Tuned Large Language Model for Context-Aware Recommender Systems
    Cheema, Adeel Ashraf
    Sarfraz, Muhammad Shahzad
    Habib, Usman
    Zaman, Qamar Uz
    Boonchieng, Ekkarat
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2025, 6 : 49 - 59
  • [29] NM-GPT: Advancing Nuclear Medicine Report Processing Through a Specialized Fine-tuned Large Language Model
    Lyu, Zhiliang
    Zeng, Fang
    Guo, Ning
    Li, Xiang
    Li, Quanzheng
    JOURNAL OF NUCLEAR MEDICINE, 2024, 65
  • [30] Using fine-tuned large language models to parse clinical notes in musculoskeletal pain disorders
    Vaid, Akhil
    Landi, Isotta
    Nadkarni, Girish
    Nabeel, Ismail
    LANCET DIGITAL HEALTH, 2023, 5 (12): : E855 - E858