Practitioner-grade analysis of offensive AI security. Prompt injection, model jailbreaks, agent and tool-use exploitation, AI red team techniques, and adversarial ML — distilled from primary sources, not press releases.
A practitioner's breakdown of how LLM prompt injection payloads are constructed, why the threat class changes when agents can invoke tools, and what defenders actually need to change.
Unit 42 documented 12 prompt injection attacks in production with 22 distinct delivery techniques. Here's how attackers build payloads that reach the model — and what red teamers should actually be testing.
Three essential questions for anyone building, securing, or red-teaming LLM applications — covering the distinction between jailbreaks and prompt injection, direct vs. indirect attack vectors, and proven defensive mitigations.
A technical breakdown of real prompt injection examples — direct, indirect, multimodal, and RAG-poisoning attacks — with conditions, payloads, and what actually defends against them.
A technical breakdown of LLM jailbreak attack classes — many-shot, Crescendo multi-turn escalation, roleplay, and encoding tricks — plus an honest look at what defenses actually stop them and what doesn't.
Direct and indirect prompt injection are fundamentally different attacks with different attack surfaces, threat actors, and mitigations. Understanding which one you're defending against determines where you spend your defensive budget.
A technical guide to jailbreak AI attacks — from manual prompt exploits to automated adversarial suffixes — covering the major technique families, transferability, and what defenses actually work.
How automated jailbreak LLM techniques like TAP use attacker LLMs to iteratively crack target models, why success transfers across model families, and what that means for red team practice.
A technical breakdown of LLM prompt injection — direct, indirect, and agent-targeting variants — grounded in real-world attack patterns observed in production and defensive controls that survive adversarial pressure.
Model extraction and model inversion both threaten model confidentiality, but they target different aspects of the model and require different defense architectures. Extraction recovers the model itself; inversion recovers the training data it memorized.
A practitioner's breakdown of prompt hacking — the three attack families (injection, leaking, jailbreaking), how each works mechanically, and what defenses hold up under real adversarial pressure.
A practitioner's breakdown of prompt injection attacks — direct, indirect, and multi-modal — covering the HouYi framework, real CVEs, and mitigations that hold up under adversarial pressure.
A technical breakdown of LLM bypass techniques — adversarial suffixes, shallow alignment exploits, fine-tuning attacks, and guardrail evasion — with practitioner-level implications for red teams and production defenders.
Three active attack classes — IRIS self-refinement, Crescendo multi-turn escalation, and classic prompt-engineering patterns — consistently breach GPT-4 safety guardrails. Here is how each works and what belongs in your engagement toolkit.
AI Sec is part of a 26-site editorial network covering adversarial ML, AI governance, defensive tooling, and ops engineering — all open access.
Offensive AI security — prompt injection, jailbreaks, agent exploitation, red team writeups. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.