All posts

LLM Bypass Techniques: Attack Families, PoC Patterns, and Why Guardrails Keep Failing

A practitioner map of LLM bypass technique families — prompt injection, jailbreak personas, encoding obfuscation, RAG poisoning, and agent-specific attacks — with PoC patterns and what current research says about defense gaps.
June 12, 2026
AI Red Team: Methodology, Tooling, and the Attack Surface That Actually Matters

A practitioner's guide to AI red teaming — what makes LLM attack surface different from traditional app testing, the techniques that reliably produce results, and the open-source tools worth deploying.
June 4, 2026
Prompt Hacking: A Practitioner's Taxonomy of LLM Attack Classes

Prompt hacking covers three distinct attack classes against LLMs: direct injection, indirect injection, and jailbreaking. Here is how each works, what distinguishes them, and what actually stops them.
June 1, 2026
The Adversarial ML Attack Taxonomy: A Red Teamer's Reference

A working taxonomy of attacks against ML systems — evasion, poisoning, privacy, and abuse — mapped to attacker knowledge and capability, grounded in the NIST AML report and the tools that actually run each attack.
May 22, 2026
AI Red Team Engagement Methodology: Scoping to Reporting

The full lifecycle of an LLM red team engagement — scoping and rules of engagement, threat modeling, the test plan by attack class, the tooling that runs it, evidence capture, and a report a model team will actually act on.
May 22, 2026
The Audit Gap: Why Red-Teaming Can't Certify Governance Claims

A new position paper by Seth and Sankarapu formalizes the structural mismatch between what AI governance frameworks require evaluators to verify and what behavioral assurance methods can epistemically support—and the implications for anyone writing safety reports.
May 15, 2026
Prompt Injection in 2025: OpenAI vs. Broken Defenses

OpenAI's November 2025 advisory on prompt injection arrived the same week a 14-researcher arXiv paper showed adaptive attacks achieve >90% success against published defenses. CVE-2024-5184 (CVSS 9.1) shows what no defense looks like in production.
May 15, 2026
LLM Prompt Injection: From Instruction Override to Agent Takeover

A practitioner's breakdown of how LLM prompt injection payloads are constructed, why the threat class changes when agents can invoke tools, and what defenders actually need to change.
May 13, 2026
Prompt Injection Delivery: Real Techniques and Payload Methods

Unit 42 documented 12 prompt injection attacks in production with 22 distinct delivery techniques. Here's how attackers build payloads that reach the model — and what red teamers should actually be testing.
May 13, 2026
LLM Security FAQ: Prompt Injection, Jailbreaking, and Defenses

Three essential questions for anyone building, securing, or red-teaming LLM applications — covering the distinction between jailbreaks and prompt injection, direct vs. indirect attack vectors, and proven defensive mitigations.
May 11, 2026
Prompt Injection Examples: A Practitioner's Attack Library

A technical breakdown of real prompt injection examples — direct, indirect, multimodal, and RAG-poisoning attacks — with conditions, payloads, and what actually defends against them.
May 11, 2026
Agent Tool-Use Exfiltration: When Indirect Injection Does Damage

Why agentic LLM systems convert injection bugs into data exfiltration, financial loss, and remote code execution — with concrete attack chains and the capability-restriction patterns that contain blast radius.
May 10, 2026
AI Red Teaming Hub: Your Guide to Offensive AI Security

The central resource index for offensive AI security on aisec.blog — prompt injection, jailbreaks, adversarial ML, red team methodology, and tooling, organized for practitioners.
May 10, 2026
Direct vs. Indirect Prompt Injection: Threats and Defenses

Direct and indirect prompt injection are fundamentally different attacks with different attack surfaces, threat actors, and mitigations. Understanding which one you're defending against determines where you spend your defensive budget.
May 10, 2026
Indirect Prompt Injection in RAG Pipelines: Patterns and Defenses

How retrieval-augmented generation surfaces become injection vectors, with concrete attack patterns from production RAG systems and the chunking, sanitization, and provenance controls that actually help.
May 10, 2026
Jailbreak AI: How Attackers Break Safety Alignment and Defenses

A technical guide to jailbreak AI attacks — from manual prompt exploits to automated adversarial suffixes — covering the major technique families, transferability, and what defenses actually work.
May 10, 2026
Jailbreak LLM: Automated Attacks and the Transfer Problem

How automated jailbreak LLM techniques like TAP use attacker LLMs to iteratively crack target models, why success transfers across model families, and what that means for red team practice.
May 10, 2026
LLM Bypass: How Attackers Circumvent Safety Alignment by Layer

A technical breakdown of LLM bypass techniques — adversarial suffixes, shallow alignment exploits, fine-tuning attacks, and guardrail evasion — with practitioner-level implications for red teams and production defenders.
May 10, 2026
LLM Jailbreak: Attack Taxonomy, Techniques, and Defense Reality

A technical breakdown of LLM jailbreak attack classes — many-shot, Crescendo multi-turn escalation, roleplay, and encoding tricks — plus an honest look at what defenses actually stop them and what doesn't.
May 10, 2026
LLM Prompt Injection: Taxonomy, Real Patterns, and Defenses

A technical breakdown of LLM prompt injection — direct, indirect, and agent-targeting variants — grounded in real-world attack patterns observed in production and defensive controls that survive adversarial pressure.
May 10, 2026
Model Extraction vs. Model Inversion: Two Confidentiality Attacks

Model extraction and model inversion both threaten model confidentiality, but they target different aspects of the model and require different defense architectures. Extraction recovers the model itself; inversion recovers the training data it memorized.
May 10, 2026
Prompt Hacking: Taxonomy, Techniques, and What Works on LLMs

A practitioner's breakdown of prompt hacking — the three attack families (injection, leaking, jailbreaking), how each works mechanically, and what defenses hold up under real adversarial pressure.
May 10, 2026
Prompt Injection Attack Compendium (2026 Edition)

A practitioner's pillar reference on prompt injection attacks against LLM systems — direct and indirect variants, real-world payloads, detection signals, and defense trade-offs.
May 10, 2026
Prompt Injection Attack: Techniques, Variants, and Defenses

A practitioner's breakdown of prompt injection attacks — direct, indirect, and multi-modal — covering the HouYi framework, real CVEs, and mitigations that hold up under adversarial pressure.
May 10, 2026
Prompt Injection Detection Signals in Production LLM Systems

The observable signals that indicate a prompt injection attempt or success in a live LLM application — input classifiers, output classifiers, canary tokens, tool-use anomalies, and how to combine them.
May 10, 2026
GPT-4 Jailbreak Techniques: A Red Teamer's Technical Reference

Three active attack classes — IRIS self-refinement, Crescendo multi-turn escalation, and classic prompt-engineering patterns — consistently breach GPT-4 safety guardrails. Here is how each works and what belongs in your engagement toolkit.
May 9, 2026
LLM Security: A Practitioner's Map of the Attack Surface

What LLM security actually means in 2026 — the attack classes red teamers test, the controls that hold up under fire, and the frameworks that map the territory.
May 8, 2026
Why Your Prompt Injection Guardrails Fail: Bypass Classes

Vendor 'AI guardrails' detect 80% of textbook payloads and 30% of real ones. Here's how attackers actually bypass them — and what your detection layer is missing.
May 6, 2026
AI Jailbreak: How LLM Safety Bypasses Actually Work

An AI jailbreak is any input that makes an aligned language model violate its own safety policy. We walk through the technique families that actually work, why defenses keep failing, and what to test for in 2026.
May 5, 2026
ChatGPT Jailbreak Prompt Taxonomy: Classes, Rates, and Defenses

A research-grounded breakdown of ChatGPT jailbreak prompt categories — DAN, privilege escalation, persona injection, and multi-turn escalation — plus what the empirical success-rate data actually says and where current defenses fail.
May 4, 2026
OSCP and CEH in 2026: What Carries Over to AI Red Teaming

A Reddit offer to teach OSCP and CEH fundamentals for free surfaces a question every traditional pentester should answer: which of those skills transfer when the target is an LLM system?
May 2, 2026
FlashRT Cuts the GPU Bill on Long-Context Injection Attacks

A new optimization-based red-teaming framework claims 2–7x speedup and 2–4x lower memory than nanoGCG against 32K-context LLMs, putting GCG-class attacks back inside the budget of academic and small-team red teams.
May 2, 2026
What this site is for

AI Sec covers offensive AI security from a working practitioner's perspective. Here's what we publish, what we don't, and how to read it.
May 1, 2026