AI Red Team Gym
A gamified sandbox for security researchers and AI red teamers. Practice writing adversarial prompts against 12 synthetic target models — from easy extractions to expert-level Unicode tricks. Public scoreboard. No login required.
Free tier: 20 attempts/day per IP · Paid tier ($19/mo): 500/day + new monthly challenges.
Challenges
Loading challenges…
Free
- · 20 attempts/day per IP
- · All 12 challenges
- · Public scoreboard
- · No login required
You're on this tier now.
Pro
- · 500 attempts/day
- · New monthly challenges (v2)
- · Priority scoreboard ranking
- · Stripe-managed billing · cancel anytime
Redirects to Stripe Checkout. Cancel anytime.
Scoreboard
Top 20 by total wins. Emails partially masked.
| # | Player | Wins | Attempts |
|---|---|---|---|
| Loading scoreboard… | |||
How it works
Each challenge gives you a target bot's description and difficulty. Expert challenges require deep knowledge of adversarial ML.
Write an adversarial prompt that bypasses the target's guardrails. The server evaluates it against a win condition — no real LLM calls in v1.
Add your email (optional) to track wins on the public scoreboard. Efficiency matters: fewer attempts = better rank for equal wins.
v1 uses a deterministic regex-based mock (no LLM API calls). v2 will run select challenges against real Haiku via Workers AI.