Tag
#llm-security
28 posts tagged llm-security.
- red-team
LLM Attack Taxonomy: Prompt Injection, Agent Hijack, and What's Hitting Production
A practitioner's map of LLM attack classes — from direct prompt injection and jailbreaks to indirect injection, RAG poisoning, and agent tool-call abuse — organized by OWASP 2025 and MITRE ATLAS.
- prompt-injection
Prompt Injection Examples: Attack Payloads by Class
Concrete prompt injection examples across five attack classes — direct override, system-prompt leak, indirect RAG poisoning, agent tool-call hijack, and multimodal smuggling — with PoC payloads and defender actions.
- red-team
AI Red Team: Methodology, Tooling, and the Attack Surface That Actually Matters
A practitioner's guide to AI red teaming — what makes LLM attack surface different from traditional app testing, the techniques that reliably produce
- prompt-injection
Prompt Hacking: A Practitioner's Taxonomy of LLM Attack Classes
Prompt hacking covers three distinct attack classes against LLMs: direct injection, indirect injection, and jailbreaking.
- red-team
AI Red Team Engagement Methodology: Scoping to Reporting
The full lifecycle of an LLM red team engagement — scoping and rules of engagement, threat modeling, the test plan by attack class, the tooling that runs
- prompt-injection
Prompt Injection in 2025: OpenAI vs. Broken Defenses
OpenAI's November 2025 advisory on prompt injection arrived the same week a 14-researcher arXiv paper showed adaptive attacks achieve >90% success against
- prompt-injection
Prompt Injection Delivery: Real Techniques and Payload Methods
Unit 42 documented 12 prompt injection attacks in production with 22 distinct delivery techniques. Here's how attackers build payloads that reach the
- primer
LLM Security FAQ: Prompt Injection, Jailbreaking, and Defenses
Three essential questions for anyone building, securing, or red-teaming LLM applications — covering the distinction between jailbreaks and prompt
- prompt-injection
Prompt Injection Examples: A Practitioner's Attack Library
A technical breakdown of real prompt injection examples — direct, indirect, multimodal, and RAG-poisoning attacks — with conditions, payloads, and what
- Spoke
Agent Tool-Use Exfiltration: When Indirect Injection Does Damage
Why agentic LLM systems convert injection bugs into data exfiltration, financial loss, and remote code execution — with concrete attack chains and the
- hub
AI Red Teaming Hub: Your Guide to Offensive AI Security
The central resource index for offensive AI security on aisec.blog — prompt injection, jailbreaks, adversarial ML, red team methodology, and tooling
- primer
Direct vs. Indirect Prompt Injection: Threats and Defenses
Direct and indirect prompt injection are fundamentally different attacks with different attack surfaces, threat actors, and mitigations.
- Spoke
Indirect Prompt Injection in RAG Pipelines: Patterns and Defenses
How retrieval-augmented generation surfaces become injection vectors, with concrete attack patterns from production RAG systems and the chunking
- jailbreak
Jailbreak AI: How Attackers Break Safety Alignment and Defenses
A technical guide to jailbreak AI attacks — from manual prompt exploits to automated adversarial suffixes — covering the major technique families
- jailbreak
Jailbreak LLM: Automated Attacks and the Transfer Problem
How automated jailbreak LLM techniques like TAP use attacker LLMs to iteratively crack target models, why success transfers across model families, and
- jailbreak
LLM Bypass: How Attackers Circumvent Safety Alignment by Layer
A technical breakdown of LLM bypass techniques — adversarial suffixes, shallow alignment exploits, fine-tuning attacks, and guardrail evasion — with
- jailbreak
LLM Jailbreak: Attack Taxonomy, Techniques, and Defense Reality
A technical breakdown of LLM jailbreak attack classes — many-shot, Crescendo multi-turn escalation, roleplay, and encoding tricks — plus an honest look at
- prompt-injection
LLM Prompt Injection: Taxonomy, Real Patterns, and Defenses
A technical breakdown of LLM prompt injection — direct, indirect, and agent-targeting variants — grounded in real-world attack patterns observed in
- primer
Model Extraction vs. Model Inversion: Two Confidentiality Attacks
Model extraction and model inversion both threaten model confidentiality, but they target different aspects of the model and require different defense
- prompt-injection
Prompt Hacking: Taxonomy, Techniques, and What Works on LLMs
A practitioner's breakdown of prompt hacking — the three attack families (injection, leaking, jailbreaking), how each works mechanically, and what
- Pillar
Prompt Injection Attack Compendium (2026 Edition)
A practitioner's pillar reference on prompt injection attacks against LLM systems — direct and indirect variants, real-world payloads, detection signals
- prompt-injection
Prompt Injection Attack: Techniques, Variants, and Defenses
A practitioner's breakdown of prompt injection attacks — direct, indirect, and multi-modal — covering the HouYi framework, real CVEs, and mitigations that
- Spoke
Prompt Injection Detection Signals in Production LLM Systems
The observable signals that indicate a prompt injection attempt or success in a live LLM application — input classifiers, output classifiers, canary
- jailbreak
GPT-4 Jailbreak Techniques: A Red Teamer's Technical Reference
Three active attack classes — IRIS self-refinement, Crescendo multi-turn escalation, and classic prompt-engineering patterns — consistently breach GPT-4
- red-team
LLM Security: A Practitioner's Map of the Attack Surface
What LLM security actually means in 2026 — the attack classes red teamers test, the controls that hold up under fire, and the frameworks that map the territory.
- red-team
Why Your Prompt Injection Guardrails Fail: Bypass Classes
Vendor 'AI guardrails' detect 80% of textbook payloads and 30% of real ones. Here's how attackers actually bypass them — and what your detection layer is
- jailbreak
AI Jailbreak: How LLM Safety Bypasses Actually Work
An AI jailbreak is any input that makes an aligned language model violate its own safety policy. We walk through the technique families that actually
- jailbreak
ChatGPT Jailbreak Prompt Taxonomy: Classes, Rates, and Defenses
A research-grounded breakdown of ChatGPT jailbreak prompt categories — DAN, privilege escalation, persona injection, and multi-turn escalation — plus what