Tag
#jailbreak
11 posts tagged jailbreak.
- prompt-injection
Prompt Hacking: A Practitioner's Taxonomy of LLM Attack Classes
Prompt hacking covers three distinct attack classes against LLMs: direct injection, indirect injection, and jailbreaking. Here is how each works, what distinguishes them, and what actually stops them.
- primer
LLM Security FAQ: Prompt Injection, Jailbreaking, and Defenses
Three essential questions for anyone building, securing, or red-teaming LLM applications — covering the distinction between jailbreaks and prompt injection, direct vs. indirect attack vectors, and proven defensive mitigations.
- hub
AI Red Teaming Hub: Your Guide to Offensive AI Security
The central resource index for offensive AI security on aisec.blog — prompt injection, jailbreaks, adversarial ML, red team methodology, and tooling, organized for practitioners.
- jailbreak
Jailbreak AI: How Attackers Break Safety Alignment and Defenses
A technical guide to jailbreak AI attacks — from manual prompt exploits to automated adversarial suffixes — covering the major technique families, transferability, and what defenses actually work.
- jailbreak
Jailbreak LLM: Automated Attacks and the Transfer Problem
How automated jailbreak LLM techniques like TAP use attacker LLMs to iteratively crack target models, why success transfers across model families, and what that means for red team practice.
- jailbreak
LLM Bypass: How Attackers Circumvent Safety Alignment by Layer
A technical breakdown of LLM bypass techniques — adversarial suffixes, shallow alignment exploits, fine-tuning attacks, and guardrail evasion — with practitioner-level implications for red teams and production defenders.
- jailbreak
LLM Jailbreak: Attack Taxonomy, Techniques, and Defense Reality
A technical breakdown of LLM jailbreak attack classes — many-shot, Crescendo multi-turn escalation, roleplay, and encoding tricks — plus an honest look at what defenses actually stop them and what doesn't.
- prompt-injection
Prompt Hacking: Taxonomy, Techniques, and What Works on LLMs
A practitioner's breakdown of prompt hacking — the three attack families (injection, leaking, jailbreaking), how each works mechanically, and what defenses hold up under real adversarial pressure.
- jailbreak
GPT-4 Jailbreak Techniques: A Red Teamer's Technical Reference
Three active attack classes — IRIS self-refinement, Crescendo multi-turn escalation, and classic prompt-engineering patterns — consistently breach GPT-4 safety guardrails. Here is how each works and what belongs in your engagement toolkit.
- jailbreak
AI Jailbreak: How LLM Safety Bypasses Actually Work
An AI jailbreak is any input that makes an aligned language model violate its own safety policy. We walk through the technique families that actually work, why defenses keep failing, and what to test for in 2026.
- jailbreak
ChatGPT Jailbreak Prompt Taxonomy: Classes, Rates, and Defenses
A research-grounded breakdown of ChatGPT jailbreak prompt categories — DAN, privilege escalation, persona injection, and multi-turn escalation — plus what the empirical success-rate data actually says and where current defenses fail.