Towards Crowd Sourced Artifact Curation for Cyberattacks Through a Learner Centered AI Co-Pilot
A learner centered AI co-pilot to help crowd source end to end cyberattack records with behavioral data for research and workforce development.

About
Creating robust and comprehensive cybersecurity solutions has become increasingly costly and time consuming due to the ever expanding list of vulnerabilities to be considered and new attacks found continuously. On the other hand, security research quite often only focuses on a specific attack or even sub components of it with a narrow scope. These constraints severely limit the opportunity in creating holistic, cross disciplinary cybersecurity solutions. This project aims to develop a learner centered co-pilot tool leveraging advances in artificial intelligence (AI) to produce attack scenarios and capture related data, which includes end to end attack interactions between the red team attacker and the cyber systems. The resulting high quality and structured attack artifact repository will be a highly valuable resource to the cyber security research community, especially for the test and validation of security solutions.
This project adopts large language model (LLM) to help cybersecurity research. Through an LLM adaptation approach, the red team co-pilot will incorporate techniques such as prompt engineering, reasoning, parameter efficient fine tuning, and few shot learning to guide users to emulate attack scenarios. The project will develop a curator friendly methodology to enable the crowd sourced aggregation of high quality cyberattack artifacts associated with attack behaviors and system settings, when the tool is deployed in the research community. The captured dataset contains both functional and behavioral aspects of attacks such as tactics, techniques, and procedures. A successful research outcome, including the tools generated, can help facilitate security benchmarking, AI based penetration testing, adversarial modeling, and research reproducibility. In addition, the red team co-pilot brings a useful tool to cyber security education and workforce development since it offers an accessible, adaptive, reusable, and learner centric platform for users to emulate attacks and develop cyber defense experiences.
NSF Award Summary
NSF Award No.: #2344237
Recipient Organization: Rochester Institute of Technology
Project Period: 06/01/2024 to 05/31/2027
PD PI: Michael Zuzak
Co PI: Shanchieh J. Yang
Publications
PDF Links Are Provided For Each Item Below.
- R. Fayyazi, S. Hoyos Trueba, M. Zuzak, S. J. Yang, “ProveRAG: Provenance Driven Vulnerability Analysis with Automated Retrieval Augmented LLMs,” IEEE Access, 13 (2025): 212815 to 212826. [PDF]
- K. Nakano, R. Fayyazi, S. J. Yang, M. Zuzak, “Guided Reasoning in LLM Driven Penetration Testing Using Structured Attack Trees,” COLM, 2025. [PDF]
- R. Fayyazi, M. Zuzak, S. J. Yang, “LLM Embedding Based Attribution (LEA): Quantifying Source Contributions to Generative Model Response for Vulnerability Analysis,” arXiv, 2025. [PDF]
Open Source Tooling and Teaching Resources
Research Tools
- Structured Task Tree Reasoning: Co-pilot reasoning pipeline implementing structured task tree based guidance for penetration testing workflows.
- Probabilistic Structured Task Guidance: Co-pilot implementation with probabilistic structured reasoning for guided penetration testing.
- LLM Embedding Based Attribution (LEA): Attribution tooling for provenance aware vulnerability analysis using embedding space evidence.
- ProveRAG: Provenance driven vulnerability analysis with automated retrieval augmented generation.
Teaching Resources
Generative AI in Cybersecurity: A hands on course for researchers, engineers, and security professionals who want to understand, adapt, and deploy large language model (LLM) technology in modern cyber defense operations.
- Open Source Slides and Lab Assignments: Course materials including lecture slides and laboratory exercises.
RenCTF: A gamified, team based platform designed to teach penetration testing skills. RenCTF provides an engaging experience for participants and supports hands on learning activities.
- RIT News Coverage: Overview of the RenCTF platform and its use for peer training.
- Open Source RenCTF Code Base: Public release of the RenCTF platform implementation.
Student Researchers
- Reza Fayyazi: PhD Student Researcher
- Katsuaki Nakano: PhD Student Researcher
- Matthew Heller: MS Student Researcher
- Ali Stambayev: MS Student Researcher
- Renaaron Ellis: Undergraduate Student Researcher
- Stella Hoyos Trueba: Undergraduate Student Researcher
- Christopher Nokes: Undergraduate Student Researcher
Project Figures and Experimental Results
RenCTF Workforce Development Platform
RenCTF is a gamified, team based platform that supports interactive security workforce development activities, including weekly Capture the Flag and Hack The Box challenges, and an exhibit augmented with the learner centered co-pilot.




Co-Pilot Reasoning Pipeline and Performance


Tool Results and Analyses


Contact
- Michael Zuzak, Assistant Professor, Department of Computer Engineering, Rochester Institute of Technology: mjzeec@rit.edu
- Shanchieh J. Yang, Professor and Inaugural David and Cathleen Reisenauer Family Director, Institute for Informatics and Applied Technology, Gonzaga University: yangj@gonzaga.edu
Acknowledgment

This material is based upon work supported by the National Science Foundation under NSF Award No. #2344237.
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
