Advancing Cybersecurity through Explainable AI
Jiahao Yu, Department of Computer Science Northwestern University
March 25, 2026 WWH 335 11:30-12:30
There is growing momentum—from industry, government, and academia—to use AI for automating cybersecurity tasks. Yet practitioners remain skeptical: while 87% of security leaders expect AI to enhance their roles, only 9% believe it will replace significant parts of them. This gap stems from two fundamental barriers: limited capability and lack of trust. In this talk, I present my research on addressing these barriers through explainable AI. I first introduce StateMask, a method that automatically identifies critical decision steps in AI agent trajectories, enabling security professionals to understand why an AI-generated patch succeeded or failed. A user study with 41 experienced developers shows that 89% find our explanations aligned with their reasoning. I then present GPO, which leverages these explanations to synthesize high-quality training data without expensive expert annotation, thereby improving model capability. GPO-trained open-source models achieve performance competitive with leading commercial models on vulnerability patching, and its extension, EntroPO, ranks 1st on SWE-Bench Lite among all open-weight models. I conclude by discussing future directions toward building AI systems that are robust to imperfect data, trusted by security professionals, and capable of tackling real-world cybersecurity challenges.
Jiahao Yu is a final-year Ph.D. candidate in Computer Science at Northwestern University, advised by Professor Xinyu Xing. His research sits at the intersection of AI and cybersecurity, focusing on using large language models and reinforcement learning to advance security capabilities. His work on LLM-Fuzzer (USENIX Security 2024), the first fuzzing framework for discovering LLM jailbreak vulnerabilities, has been downloaded over 154,000 times and adopted by Microsoft, OpenAI, Meta, Anthropic, and ByteDance. As a core member of Team 42-b3yond-6ug in the DARPA AI Cyber Challenge (AIxCC), he contributed to winning the semi-final and securing $3 million in prizes; the team’s fuzzing system discovered the most zero-day vulnerabilities among all finalists. His PatchAgent work (USENIX Security 2025) was selected as CSAW 2025 Runner-up in Technical Impact, and his GPO framework (NeurIPS 2025) and its extension EntroPO rank 1st on SWE-Bench Lite among open-weight models. Jiahao has published at venues including USENIX Security, NeurIPS, ICML, and ICSE, with work featured in WIRED and MIT Technology Review. He serves as a program committee member for USENIX Security and CCS, and was recognized as an ICLR 2025 Notable Reviewer.