AECR: Automatic attack technique intelligence extraction based on fine-tuned large language model

被引:0
|
作者
Chen, Minghao [1 ,2 ]
Zhu, Kaijie [3 ,4 ]
Lu, Bin [1 ,2 ]
Li, Ding [1 ,2 ]
Yuan, Qingjun [1 ,2 ]
Zhu, Yuefei [1 ,2 ]
机构
[1] Minist Educ, Key Lab Cyberspace Secur, Zhengzhou 450001, Peoples R China
[2] Henan Key Lab Network Cryptog Technol, Zhengzhou 450001, Peoples R China
[3] Henan Key Lab Cyberspace Situat Awareness, Zhengzhou 450001, Peoples R China
[4] Zhongguancun Lab, Beijing 100194, Peoples R China
关键词
Cyber threat intelligence (CTI); Attack technique extraction; Prompt engineering; Large language model (LLM); Advanced persistent threat (APT);
D O I
10.1016/j.cose.2024.104213
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cyber Threat Intelligence (CTI) reports contain resourceful intelligence on cyber-attack campaigns, which provides great help for security analysts to infer attack trends and enhance their defenses. However, due to the diversity of report content and writing styles, current intelligence extraction is mostly based on time-consuming manual efforts. Moreover, existing automatic methods generally neglect the importance of background knowledge and produce inexact extraction results. These problems prevent the effective utilization and sharing of intelligence from CTI reports. In this paper, we primarily focus on the automatic extraction of attack technique (AT) intelligence, which reveals patterns of attack behaviors and hardly changes over time. We propose a novel automatic AT extraction pipeline for CTI reports (AECR). AECR explores the feasibility of extracting AT intelligence based on a fined-tuned large language model (LLM). Particularly, we endow the selected LLM with enhanced domain-specific knowledge to improve its comprehension of AT-relevant content and alleviate the hallucination problem. Experimental results demonstrate that AECR outperforms state-of-theart methods by a wide margin with a reasonable time cost. Specifically, we improve the accuracy, precision, recall, and F1-score by 108%, 37.2%, 22.4%, and 67.5% respectively. To the best of our knowledge, AECR is the first to perform AT extraction based on fine-tuned LLM.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] A scientific-article key-insight extraction system based on multi-actor of fine-tuned open-source large language models
    Song, Zihan
    Hwang, Gyo-Yeob
    Zhang, Xin
    Huang, Shan
    Park, Byung-Kwon
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [42] An open-source fine-tuned large language model for radiological impression generation: a multi-reader performance study
    Serapio, Adrian
    Chaudhari, Gunvant
    Savage, Cody
    Lee, Yoo Jin
    Vella, Maya
    Sridhar, Shravan
    Schroeder, Jamie Lee
    Liu, Jonathan
    Yala, Adam
    Sohn, Jae Ho
    BMC MEDICAL IMAGING, 2024, 24 (01):
  • [43] Assessment of fine-tuned large language models for real-world chemistry and material science applications
    Van Herck, Joren
    Gil, Maria Victoria
    Jablonka, Kevin Maik
    Abrudan, Alex
    Anker, Andy S.
    Asgari, Mehrdad
    Blaiszik, Ben
    Buffo, Antonio
    Choudhury, Leander
    Corminboeuf, Clemence
    Daglar, Hilal
    Elahi, Amir Mohammad
    Foster, Ian T.
    Garcia, Susana
    Garvin, Matthew
    Godin, Guillaume
    Good, Lydia L.
    Gu, Jianan
    Xiao Hu, Noemie
    Jin, Xin
    Junkers, Tanja
    Keskin, Seda
    Knowles, Tuomas P. J.
    Laplaza, Ruben
    Lessona, Michele
    Majumdar, Sauradeep
    Mashhadimoslem, Hossein
    Mcintosh, Ruaraidh D.
    Moosavi, Seyed Mohamad
    Mourino, Beatriz
    Nerli, Francesca
    Pevida, Covadonga
    Poudineh, Neda
    Rajabi-Kochi, Mahyar
    Saar, Kadi L.
    Hooriabad Saboor, Fahimeh
    Sagharichiha, Morteza
    Schmidt, K. J.
    Shi, Jiale
    Simone, Elena
    Svatunek, Dennis
    Taddei, Marco
    Tetko, Igor
    Tolnai, Domonkos
    Vahdatifar, Sahar
    Whitmer, Jonathan
    Wieland, D. C. Florian
    Willumeit-Roemer, Regine
    Zuttel, Andreas
    Smit, Berend
    CHEMICAL SCIENCE, 2025, 16 (02) : 670 - 684
  • [44] Melanoma identification and classification model based on fine-tuned convolutional neural network
    Almufareh, Maram F.
    Tariq, Noshina
    Humayun, Mamoona
    Khan, Farrukh Aslam
    DIGITAL HEALTH, 2024, 10
  • [45] Enhancing Zero-Shot Crypto Sentiment With Fine-Tuned Language Model and Prompt Engineering
    Wahidur, Rahman S. M.
    Tashdeed, Ishmam
    Kaur, Manjit
    Lee, Heung-No
    IEEE ACCESS, 2024, 12 : 10146 - 10159
  • [46] Development of Fine-Tuned Retrieval Augmented Language Model specialized to manual books on machine tools
    Cho, Seongwoo
    Park, Jongsu
    Urn, Jumyung
    IFAC PAPERSONLINE, 2024, 58 (19): : 187 - 192
  • [47] Leveraging fine-tuned Large Language Models with LoRA for Effective Claim, Claimer, and Claim Object Detection
    Kotitsas, Sotiris
    Kounoudis, Panagiotis
    Koutli, Eleni
    Papageorgiou, Haris
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2540 - 2554
  • [48] A deep dive into automated sexism detection using fine-tuned deep learning and large language models
    Vetagiri, Advaitha
    Pakray, Partha
    Das, Amitava
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 145
  • [49] Heterogeneous data-based information retrieval using a fine-tuned pre-trained BERT language model
    Shaik, Amjan
    Saxena, Surabhi
    Gupta, Manisha
    Parveen, Nikhat
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 59537 - 59559
  • [50] An Intelligent Fine-Tuned Forecasting Technique for Covid-19 Prediction Using Neuralprophet Model
    Khurana, Savita
    Sharma, Gaurav
    Miglani, Neha
    Singh, Aman
    Alharbi, Abdullah
    Alosaimi, Wael
    Alyami, Hashem
    Goyal, Nitin
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (01): : 629 - 649