AI Security Resources

Hand-picked papers, talks, courses, and communities. Refreshed periodically; older entries kept for reference where they remain canonical.

Foundational Papers

Universal and Transferable Adversarial Attacks on Aligned Language Models (GCG)
Zou et al., 2023 #attacks
Are Aligned Neural Networks Adversarially Aligned?
Carlini et al., 2023 #attacks
Sleeper Agents: Training Deceptively Aligned LLMs
Hubinger et al., 2024 #alignment
Constitutional AI: Harmlessness from AI Feedback
Bai et al., 2022 #alignment
Training Language Models to Follow Instructions (InstructGPT)
Ouyang et al., 2022 #alignment
Privacy Risks of General-Purpose Language Models
Pan et al., 2020 #privacy
Extracting Training Data from Large Language Models
Carlini et al., 2021 #privacy
Membership Inference Attacks Against ML Models
Shokri et al., 2017 #privacy

Operational Reading

OWASP LLM Top 10
OWASP Project #framework
MITRE ATLAS
MITRE #framework
NIST AI Risk Management Framework
NIST #framework
Anthropic Acceptable Use Policy
Anthropic #policy
OpenAI Usage Policies
OpenAI #policy

Tools & Frameworks

garak — LLM vulnerability scanner
NVIDIA #tool
PyRIT — Python Risk Identification Tool
Microsoft #tool
promptfoo — eval + red-team framework
promptfoo #tool
NeMo Guardrails
NVIDIA #tool
Guardrails AI
Guardrails AI #tool
Adversarial Robustness Toolbox
IBM Trusted AI #tool
Langfuse — LLM observability
Langfuse #tool

Talks & Videos

AI Village @ DEF CON (annual)
AI Village #talk
Generative Red Team @ DEF CON 31 — full results paper
AI Village + Humane Intel #talk
Black Hat USA AI track sessions
Black Hat #talk
USENIX Security AI papers (annual)
USENIX #talk

Certifications & Training

HackTheBox AI Red Teamer Path
HackTheBox #training
OffSec OSCP / OSEP
OffSec #training
SANS AI Security curriculum
SANS #training
Coursera — AI Security & Ethics specializations
Coursera #training

Communities

AI Village Discord
AI Village #community
MLSecOps community
Protect AI #community
OWASP LLM Top 10 Slack
OWASP #community
AI Alignment Forum
MIRI / FHI #community

Newsletters

tldrsec — Weekly security newsletter
#newsletter
Risky.Biz — infosec podcast/newsletter
#newsletter
Embedded.ai — Helen Toner on AI policy
#newsletter
The Batch — Andrew Ng's weekly AI newsletter
#newsletter