Topics
Browse posts by category and tag — every topic we cover, with the latest pieces under each.
Tags
- #llm-security 28
- #red-team 28
- #prompt-injection 26
- #adversarial-ml 15
- #jailbreak 14
- #agent-security 12
- #indirect-injection 6
- #tooling 4
- #spoke 3
- #attack-techniques 2
- #attack-vectors 2
- #membership-inference 2
- #model-extraction 2
- #prompt-engineering 2
- #rag 2
- #agents 1
- #ai-red-team 1
- #alignment 1
- #application-security 1
- #automated-attacks 1
- #behavioral-evaluation 1
- #bypass-techniques 1
- #ceh 1
- #chatgpt 1
- #defense 1
- #detection 1
- #evasion 1
- #faq 1
- #garak 1
- #gcg 1
- #governance 1
- #gpt-4 1
- #guardrails 1
- #hub 1
- #interpretability 1
- #knowledge-corruption 1
- #llm-bypass 1
- #llm-monitoring 1
- #long-context 1
- #methodology 1
- #model-inversion 1
- #model-theft 1
- #oscp 1
- #owasp 1
- #owasp-llm01 1
- #payload-construction 1
- #payload-delivery 1
- #pillar 1
- #poisoning 1
- #pyrit 1
- #reporting 1
- #scoping 1
- #taxonomy 1
- #threat-modeling 1
- #tool-use 1
- #training-data-privacy 1
Categories
prompt-injection 10 posts
- Prompt Injection Examples: Attack Payloads by ClassConcrete prompt injection examples across five attack classes — direct override, system-prompt leak, indirect RAG poisoning, agent tool-call hijack, and multimodal smuggling — with PoC payloads and defender actions.
- Prompt Hacking: A Practitioner's Taxonomy of LLM Attack ClassesPrompt hacking covers three distinct attack classes against LLMs: direct injection, indirect injection, and jailbreaking.
- Prompt Injection in 2025: OpenAI vs. Broken DefensesOpenAI's November 2025 advisory on prompt injection arrived the same week a 14-researcher arXiv paper showed adaptive attacks achieve >90% success against
- LLM Prompt Injection: From Instruction Override to Agent TakeoverA practitioner's breakdown of how LLM prompt injection payloads are constructed, why the threat class changes when agents can invoke tools, and what
- Prompt Injection Delivery: Real Techniques and Payload MethodsUnit 42 documented 12 prompt injection attacks in production with 22 distinct delivery techniques. Here's how attackers build payloads that reach the
- Prompt Injection Examples: A Practitioner's Attack LibraryA technical breakdown of real prompt injection examples — direct, indirect, multimodal, and RAG-poisoning attacks — with conditions, payloads, and what
jailbreak 8 posts
- LLM Bypass Techniques: Attack Families, PoC Patterns, and Why Guardrails Keep FailingA practitioner map of LLM bypass technique families — prompt injection, jailbreak personas, encoding obfuscation, RAG poisoning, and agent-specific
- Jailbreak AI: How Attackers Break Safety Alignment and DefensesA technical guide to jailbreak AI attacks — from manual prompt exploits to automated adversarial suffixes — covering the major technique families
- Jailbreak LLM: Automated Attacks and the Transfer ProblemHow automated jailbreak LLM techniques like TAP use attacker LLMs to iteratively crack target models, why success transfers across model families, and
- LLM Bypass: How Attackers Circumvent Safety Alignment by LayerA technical breakdown of LLM bypass techniques — adversarial suffixes, shallow alignment exploits, fine-tuning attacks, and guardrail evasion — with
- LLM Jailbreak: Attack Taxonomy, Techniques, and Defense RealityA technical breakdown of LLM jailbreak attack classes — many-shot, Crescendo multi-turn escalation, roleplay, and encoding tricks — plus an honest look at
- GPT-4 Jailbreak Techniques: A Red Teamer's Technical ReferenceThree active attack classes — IRIS self-refinement, Crescendo multi-turn escalation, and classic prompt-engineering patterns — consistently breach GPT-4
red-team 8 posts
- LLM Attack Taxonomy: Prompt Injection, Agent Hijack, and What's Hitting ProductionA practitioner's map of LLM attack classes — from direct prompt injection and jailbreaks to indirect injection, RAG poisoning, and agent tool-call abuse — organized by OWASP 2025 and MITRE ATLAS.
- AI Red Team: Methodology, Tooling, and the Attack Surface That Actually MattersA practitioner's guide to AI red teaming — what makes LLM attack surface different from traditional app testing, the techniques that reliably produce
- The Adversarial ML Attack Taxonomy: A Red Teamer's ReferenceA working taxonomy of attacks against ML systems — evasion, poisoning, privacy, and abuse — mapped to attacker knowledge and capability, grounded in the
- AI Red Team Engagement Methodology: Scoping to ReportingThe full lifecycle of an LLM red team engagement — scoping and rules of engagement, threat modeling, the test plan by attack class, the tooling that runs
- The Audit Gap: Why Red-Teaming Can't Certify Governance ClaimsA new position paper by Seth and Sankarapu formalizes the structural mismatch between what AI governance frameworks require evaluators to verify and what
- LLM Security: A Practitioner's Map of the Attack SurfaceWhat LLM security actually means in 2026 — the attack classes red teamers test, the controls that hold up under fire, and the frameworks that map the territory.
primer 3 posts
- LLM Security FAQ: Prompt Injection, Jailbreaking, and DefensesThree essential questions for anyone building, securing, or red-teaming LLM applications — covering the distinction between jailbreaks and prompt
- Direct vs. Indirect Prompt Injection: Threats and DefensesDirect and indirect prompt injection are fundamentally different attacks with different attack surfaces, threat actors, and mitigations.
- Model Extraction vs. Model Inversion: Two Confidentiality AttacksModel extraction and model inversion both threaten model confidentiality, but they target different aspects of the model and require different defense
Spoke 3 posts
- Agent Tool-Use Exfiltration: When Indirect Injection Does DamageWhy agentic LLM systems convert injection bugs into data exfiltration, financial loss, and remote code execution — with concrete attack chains and the
- Indirect Prompt Injection in RAG Pipelines: Patterns and DefensesHow retrieval-augmented generation surfaces become injection vectors, with concrete attack patterns from production RAG systems and the chunking
- Prompt Injection Detection Signals in Production LLM SystemsThe observable signals that indicate a prompt injection attempt or success in a live LLM application — input classifiers, output classifiers, canary