AI Sec
Isometric vector illustration of interconnected tech tools for AI red teaming and security analysis
hub

AI Red Teaming Hub: Your Guide to Offensive AI Security

The central resource index for offensive AI security on aisec.blog — prompt injection, jailbreaks, adversarial ML, red team methodology, and tooling

By AI Sec Editorial · · 7 min read

AI red teaming is not a checklist exercise. It is a sustained, adversarial practice — probing the boundaries of deployed LLM systems with the same creativity and persistence that real attackers bring. That framing shapes everything we publish on aisec.blog: technical depth over vendor marketing, reproducible attack patterns over speculation, and honest coverage of what current defenses do and do not stop.

This page collects the most useful writing on aisec.blog into a single indexed resource. Use it as a map when you’re starting on a new engagement, building an AI red team program, or looking to go deeper on a specific attack class. We update it as the site grows.

The attack surface for modern LLM deployments spans five broad categories. Prompt injection — both direct and indirect — remains the highest-volume class and the one most likely to appear in a production engagement. Jailbreaks are distinct from injection; they target alignment training rather than context hijacking, and require different methodology. Adversarial ML covers optimization-based attacks, knowledge corruption, membership inference, and model extraction — approaches that require ML expertise but are increasingly accessible via open tooling. Agent exploitation extends all of the above into agentic pipelines where a compromised LLM can act: reading files, calling APIs, browsing the web. Supply chain attacks target model artifacts themselves, not running inference.

If you’re coming from traditional penetration testing and mapping these to the PTES or OWASP, the closest analogs are injection (OWASP A03), broken access control (A01), and insecure design (A04) — but the primitives are different and the tooling has almost no overlap.


Foundations

Start here if you’re mapping the overall attack surface or need a reference that covers multiple classes at once.

LLM Security: A Practitioner’s Map of the Attack Surface The broadest entry point. Covers the five major attack categories, situates them against OWASP LLM Top 10 and MITRE ATLAS, and flags where standard AppSec methodology transfers and where it doesn’t. Read this first.

What this site is for Mission statement and scope. Explains what we publish (technical writeups, adversarial ML applied, red team methodology) and what we don’t.

OSCP and CEH in 2026: What Carries Over to AI Red Teaming For practitioners with traditional pentesting backgrounds: which core skills and frameworks transfer when the target is an LLM system, and what you need to add.


Prompt Injection

The primary attack class for LLM-integrated applications. Covers both direct (user-controlled input) and indirect (retrieved content) injection.

Why Your Prompt Injection Guardrails Fail: A Practitioner’s Tour of Bypass Classes Vendor guardrail benchmarks look good on textbook payloads; real-world bypass rates are far lower. This post maps the bypass technique landscape: role manipulation, encoding evasion, delimiter injection, multi-turn escalation, and retrieval poisoning. Required reading before evaluating any detection layer.


Jailbreaks

Attacks that target alignment training rather than context injection. Different threat actor, different methodology, different mitigations.

GPT-4 Jailbreak Techniques: A Red Teamer’s Technical Reference Three attack classes with documented success rates: IRIS self-refinement, Crescendo multi-turn escalation, and structured prompt-engineering patterns. Grounded in peer-reviewed research (EMNLP 2024, USENIX Security 2025). Covers what each technique exploits in the model’s training and what defenses it bypasses.


Tooling & Optimization-Based Attacks

Reproducible attack tools and frameworks — the things you actually run in an engagement.

FlashRT cuts the GPU bill on long-context prompt injection attacks GCG-class optimization attacks against 32K-context LLMs previously required multi-A100 cluster infrastructure. FlashRT cuts memory requirements by up to 4x and runtime by 2–7x, putting this class of attack back inside academic and small-team budgets.

FlashRT: Optimization-Based LLM Red-Teaming Without the 264 GB GPU Bill Companion piece covering FlashRT’s memory efficiency mechanism and benchmark results across prompt injection and knowledge corruption attack types.


Cross-Site Reading

aisec.blog publishes the offensive side. GuardML publishes the corresponding defensive patterns. When we cover a new attack class, GuardML covers what to build in response. Reading both gives you the full picture for an engagement that requires both red and blue deliverables.

Subscribe

AI Sec — in your inbox

Offensive AI security — prompt injection, jailbreaks, agent exploitation, red team writeups. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments