Offensive AI security — prompt injection, jailbreaks, agent exploitation, red team writeups.
Practitioner-grade analysis of offensive AI security. Prompt injection, model jailbreaks, agent and tool-use exploitation, AI red team techniques, and adversarial ML — distilled from primary sources, not press releases.
Prompt Hacking: A Practitioner's Taxonomy of LLM Attack Classes
Prompt hacking covers three distinct attack classes against LLMs: direct injection, indirect injection, and jailbreaking. Here is how each works, what distinguishes them, and what actually stops them.
The Adversarial ML Attack Taxonomy: A Red Teamer's Reference
A working taxonomy of attacks against ML systems — evasion, poisoning, privacy, and abuse — mapped to attacker knowledge and capability, grounded in the NIST AML report and the tools that actually run each attack.
AI Red Team Engagement Methodology: Scoping to Reporting
The full lifecycle of an LLM red team engagement — scoping and rules of engagement, threat modeling, the test plan by attack class, the tooling that runs it, evidence capture, and a report a model team will actually act on.
The Audit Gap: Why Red-Teaming Can't Certify Governance Claims
A new position paper by Seth and Sankarapu formalizes the structural mismatch between what AI governance frameworks require evaluators to verify and what behavioral assurance methods can epistemically support—and the implications for anyone writing safety reports.
// All entries
-
Prompt Injection in 2025: OpenAI vs. Broken Defenses
OpenAI's November 2025 advisory on prompt injection arrived the same week a 14-researcher arXiv paper showed adaptive attacks achieve >90% success against published defenses. CVE-2024-5184 (CVSS 9.1) shows what no defense looks like in production.
-
LLM Prompt Injection: From Instruction Override to Agent Takeover
A practitioner's breakdown of how LLM prompt injection payloads are constructed, why the threat class changes when agents can invoke tools, and what defenders actually need to change.
-
Prompt Injection Delivery: Real Techniques and Payload Methods
Unit 42 documented 12 prompt injection attacks in production with 22 distinct delivery techniques. Here's how attackers build payloads that reach the model — and what red teamers should actually be testing.
-
LLM Security FAQ: Prompt Injection, Jailbreaking, and Defenses
Three essential questions for anyone building, securing, or red-teaming LLM applications — covering the distinction between jailbreaks and prompt injection, direct vs. indirect attack vectors, and proven defensive mitigations.
-
Prompt Injection Examples: A Practitioner's Attack Library
A technical breakdown of real prompt injection examples — direct, indirect, multimodal, and RAG-poisoning attacks — with conditions, payloads, and what actually defends against them.
-
Agent Tool-Use Exfiltration: When Indirect Injection Does Damage
Why agentic LLM systems convert injection bugs into data exfiltration, financial loss, and remote code execution — with concrete attack chains and the capability-restriction patterns that contain blast radius.
-
AI Red Teaming Hub: Your Guide to Offensive AI Security
The central resource index for offensive AI security on aisec.blog — prompt injection, jailbreaks, adversarial ML, red team methodology, and tooling, organized for practitioners.
-
Direct vs. Indirect Prompt Injection: Threats and Defenses
Direct and indirect prompt injection are fundamentally different attacks with different attack surfaces, threat actors, and mitigations. Understanding which one you're defending against determines where you spend your defensive budget.
-
Indirect Prompt Injection in RAG Pipelines: Patterns and Defenses
How retrieval-augmented generation surfaces become injection vectors, with concrete attack patterns from production RAG systems and the chunking, sanitization, and provenance controls that actually help.
-
Jailbreak AI: How Attackers Break Safety Alignment and Defenses
A technical guide to jailbreak AI attacks — from manual prompt exploits to automated adversarial suffixes — covering the major technique families, transferability, and what defenses actually work.
Trusted by researchers across the AI security community
AI Sec is part of a 26-site editorial network covering adversarial ML, AI governance, defensive tooling, and ops engineering — all open access.
AI Sec — in your inbox
Offensive AI security — prompt injection, jailbreaks, agent exploitation, red team writeups. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.