AI Security Resources
Hand-picked papers, talks, courses, and communities. Refreshed periodically; older entries kept for reference where they remain canonical.
Foundational Papers
- Universal and Transferable Adversarial Attacks on Aligned Language Models (GCG)Zou et al., 2023 #attacks
- Are Aligned Neural Networks Adversarially Aligned?Carlini et al., 2023 #attacks
- Sleeper Agents: Training Deceptively Aligned LLMsHubinger et al., 2024 #alignment
- Constitutional AI: Harmlessness from AI FeedbackBai et al., 2022 #alignment
- Training Language Models to Follow Instructions (InstructGPT)Ouyang et al., 2022 #alignment
- Privacy Risks of General-Purpose Language ModelsPan et al., 2020 #privacy
- Extracting Training Data from Large Language ModelsCarlini et al., 2021 #privacy
- Membership Inference Attacks Against ML ModelsShokri et al., 2017 #privacy
Operational Reading
- OWASP LLM Top 10OWASP Project #framework
- MITRE ATLASMITRE #framework
- NIST AI Risk Management FrameworkNIST #framework
- Anthropic Acceptable Use PolicyAnthropic #policy
- OpenAI Usage PoliciesOpenAI #policy
Tools & Frameworks
- garak — LLM vulnerability scannerNVIDIA #tool
- PyRIT — Python Risk Identification ToolMicrosoft #tool
- promptfoo — eval + red-team frameworkpromptfoo #tool
- NeMo GuardrailsNVIDIA #tool
- Guardrails AIGuardrails AI #tool
- Adversarial Robustness ToolboxIBM Trusted AI #tool
- Langfuse — LLM observabilityLangfuse #tool
Talks & Videos
- AI Village @ DEF CON (annual)AI Village #talk
- Generative Red Team @ DEF CON 31 — full results paperAI Village + Humane Intel #talk
- Black Hat USA AI track sessionsBlack Hat #talk
- USENIX Security AI papers (annual)USENIX #talk
Certifications & Training
- HackTheBox AI Red Teamer PathHackTheBox #training
- OffSec OSCP / OSEPOffSec #training
- SANS AI Security curriculumSANS #training
- Coursera — AI Security & Ethics specializationsCoursera #training
Communities
- AI Village DiscordAI Village #community
- MLSecOps communityProtect AI #community
- OWASP LLM Top 10 SlackOWASP #community
- AI Alignment ForumMIRI / FHI #community