Extracting Actionable Cyber Threat Intelligence with Large Language Models

Categories: Seminar Series

Oct 5th 11:30-12:30 WWH 335

Actionable cyber threat intelligence is vital for effective defense. In practice, indicators of compromises (IP addresses or domains) are used to alert potential malicious activities. However, such alerts lack important context for defenders to take effective actions. For example, given an alert concerning an IP address, a defender wants to know what type of malicious act (e.g., phishing website vs. C2) was observed associated with it. What technical manifestations (e.g., modification of a particular registry key) have been observed from the attacker using this IP address? Such context information helps defenders prioritize alerts and quickly determine whether a system has been compromised. Much of such context information exists in real-time information-sharing systems such as messaging apps and forums. This paper describes an approach to extract technical manifestations of attacks from tweets. We have compared our results with the performance of the GPT-3.5-Turbo model and text-embedding-ada-002 model of OpenAI and also created an open-source project to make our tools and data available for the cybersecurity research community.

Moumita Das Purba is a Ph.D. candidate in the Software and Information System Department at the University of North Carolina at Charlotte. She is currently working under the supervision of Dr. Bill Chu. Broadly, Her research direction is in text mining on unstructured cybersecurity texts. She is particularly interested in developing AI systems based on large language models and knowledge graphs to extract critical Cybersecurity information from noisy texts and explain the outcomes. She leverages ML/DL techniques to extract malicious technical manifestations and map them to the MITRE ATT&CK framework.