Retrieval-Augmented Generation for Identifying ATT&CK Technique
Abstract
Cyber Threat Intelligence (CTI) analysis faces significant challenges due to the scale and complexity of threat data. Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) offer promising solutions; however, existing approaches often struggle with limited accuracy and hallucination. We propose an enhanced RAG framework that incorporates fine-tuned BERT embeddings for semantic retrieval and technique annotation, coupled with structured prompt generation to guide LLMs toward more precise and context-aware threat analysis. Compared with traditional encoder-only architectures, our framework substantially improves both accuracy and efficiency. Experiments conducted on the MITRE ATT&CK database and recent open-source threat reports demonstrate that our model achieves an F1-score of 0.93, outperforming state-of-the-art baselines including GPT-4 and LLaMA-3. These results highlight the potential of advanced RAG architectures to enable scalable, accurate, and trustworthy automated CTI analysis.
Sheng-Shan Chen, Kai-Siang Cao, Chung-Kuan Chen, Chin-Yu Sun, "Retrieval-Augmented Generation for Identifying ATT&CK Technique," Communications of the CCISA, vol. 31, no. 3 , pp. 20-39, Aug. 2025.
Full Text:
PDFRefbacks
- There are currently no refbacks.
Published by Chinese Cryptology and Information Security Association (CCISA), Taiwan, R.O.C
CCCISA Editorial Office
E-mail: ccisa.editor@gmail.com