Tag

#adversarial-ml

15 posts tagged adversarial-ml.

red-team

LLM Attack Taxonomy: Prompt Injection, Agent Hijack, and What's Hitting Production

A practitioner's map of LLM attack classes — from direct prompt injection and jailbreaks to indirect injection, RAG poisoning, and agent tool-call abuse — organized by OWASP 2025 and MITRE ATLAS.
June 21, 2026
jailbreak

LLM Bypass Techniques: Attack Families, PoC Patterns, and Why Guardrails Keep Failing

A practitioner map of LLM bypass technique families — prompt injection, jailbreak personas, encoding obfuscation, RAG poisoning, and agent-specific
June 12, 2026
red-team

The Adversarial ML Attack Taxonomy: A Red Teamer's Reference

A working taxonomy of attacks against ML systems — evasion, poisoning, privacy, and abuse — mapped to attacker knowledge and capability, grounded in the
May 22, 2026
red-team

The Audit Gap: Why Red-Teaming Can't Certify Governance Claims

A new position paper by Seth and Sankarapu formalizes the structural mismatch between what AI governance frameworks require evaluators to verify and what
May 15, 2026
prompt-injection

Prompt Injection Examples: A Practitioner's Attack Library

A technical breakdown of real prompt injection examples — direct, indirect, multimodal, and RAG-poisoning attacks — with conditions, payloads, and what
May 11, 2026
hub

AI Red Teaming Hub: Your Guide to Offensive AI Security

The central resource index for offensive AI security on aisec.blog — prompt injection, jailbreaks, adversarial ML, red team methodology, and tooling
May 10, 2026
jailbreak

Jailbreak AI: How Attackers Break Safety Alignment and Defenses

A technical guide to jailbreak AI attacks — from manual prompt exploits to automated adversarial suffixes — covering the major technique families
May 10, 2026
jailbreak

Jailbreak LLM: Automated Attacks and the Transfer Problem

How automated jailbreak LLM techniques like TAP use attacker LLMs to iteratively crack target models, why success transfers across model families, and
May 10, 2026
jailbreak

LLM Bypass: How Attackers Circumvent Safety Alignment by Layer

A technical breakdown of LLM bypass techniques — adversarial suffixes, shallow alignment exploits, fine-tuning attacks, and guardrail evasion — with
May 10, 2026
jailbreak

LLM Jailbreak: Attack Taxonomy, Techniques, and Defense Reality

A technical breakdown of LLM jailbreak attack classes — many-shot, Crescendo multi-turn escalation, roleplay, and encoding tricks — plus an honest look at
May 10, 2026
prompt-injection

Prompt Hacking: Taxonomy, Techniques, and What Works on LLMs

A practitioner's breakdown of prompt hacking — the three attack families (injection, leaking, jailbreaking), how each works mechanically, and what
May 10, 2026
prompt-injection

Prompt Injection Attack: Techniques, Variants, and Defenses

A practitioner's breakdown of prompt injection attacks — direct, indirect, and multi-modal — covering the HouYi framework, real CVEs, and mitigations that
May 10, 2026
jailbreak

AI Jailbreak: How LLM Safety Bypasses Actually Work

An AI jailbreak is any input that makes an aligned language model violate its own safety policy. We walk through the technique families that actually
May 5, 2026
jailbreak

ChatGPT Jailbreak Prompt Taxonomy: Classes, Rates, and Defenses

A research-grounded breakdown of ChatGPT jailbreak prompt categories — DAN, privilege escalation, persona injection, and multi-turn escalation — plus what
May 4, 2026
red-team

OSCP and CEH in 2026: What Carries Over to AI Red Teaming

A Reddit offer to teach OSCP and CEH fundamentals for free surfaces a question every traditional pentester should answer: which of those skills transfer
May 2, 2026