Let autonomous AI agents hack you
so the real ones can't.
Fully autonomous agentic pentesting framework. Attacks AI/LLM apps, web apps, npm packages, and source code. Blind PoC verification to minimize false positives.
npx pwnkit-cli Or read the documentation
Shell-first. Minimal tools. Real exploits.
Web Apps
SQLi, IDOR, SSTI, XSS, auth bypass, SSRF — 35+ XBOW flags
AI/LLM Apps
Prompt injection, jailbreaks, PII leakage, MCP tool abuse
npm Packages
Supply chain attacks, malware, CVEs, typosquatting
Source Code
White-box mode reads code before attacking
35+ flags across 5 benchmarks.
35+ flags on XBOW (104 Docker CTF challenges). 10/10 on AI/LLM security. Validated across 5 benchmark suites. Playwright for XSS. White-box mode for source-aware scanning. Every finding independently re-exploited to kill false positives.
SQLi, IDOR, SSTI, XSS, auth bypass, RCE
Prompt injection, jailbreaks, PII leakage
33 network/CVE tasks (Log4Shell, Heartbleed)
510 LLM safety behaviors
30 packages — first npm security benchmark
Cracked impossible challenges with source access
Run it yourself: pnpm bench --agentic ·
View benchmark source
Just give it a target.
pwnkit-cli express Audit an npm package
pwnkit-cli ./my-repo Review source code
pwnkit-cli https://api.com/chat Scan an LLM API
pwnkit-cli https://example.com --mode web Pentest a web app
pwnkit-cli dashboard Local mission control
pwnkit-cli findings list --severity critical Triage across scans
Auto-detects target type. No subcommands needed for most targets.
Why pwnkit
Zero config
No YAML. No Python. Just npx pwnkit-cli and you're running.
Blind verification
Every finding is independently re-exploited. Can't reproduce it? Killed as a false positive.
Bring your own AI
Your API key, or use Claude Code CLI / Codex CLI with your subscription. Any model, any provider.
CLI runs. Dashboard triages.
pwnkit-cli
The execution surface. Run scans, audits, and reviews from your terminal or CI. Resume interrupted scans. Replay attack chains. Export SARIF, JSON, Markdown, or HTML.
$ pwnkit-cli scan --target https://api.com $ pwnkit-cli audit express --depth deep $ pwnkit-cli review ./my-repo --diff-base main pwnkit-cli dashboard
The operator surface. Local web UI for finding triage, evidence review, scan provenance, and human sign-off. Launch scans, manage the verification queue, and track finding families across runs.
$ pwnkit-cli dashboard → http://127.0.0.1:48123 How it compares
Scroll to compare →
| Feature | promptfoo (acquired by OpenAI) | garak | nuclei | Semgrep | |
|---|---|---|---|---|---|
| Autonomous multi-agent | Agentic pipeline | — | — | — | — |
| Verification (no false positives) | Re-exploits | — | — | — | — |
| AI/LLM app scanning | ✓ | ✓ | ✓ | — | — |
| npm package audit | ✓ | — | — | — | Rules |
| Source code review | AI-powered | — | — | — | Rules |
| AI attack coverage | 30+ agentic | Partial | Partial | — | — |
| Zero config | npx | YAML | Python | Templates | Config |
| Independent | ✓ | Acquired | ✓ | ✓ | VC-backed |
| Open source | Apache-2.0 | OpenAI-owned | OSS | MIT | LGPL |
Findings in GitHub's Security tab.
name: Security Pentest
on: [push, pull_request]
jobs:
pwnkit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run pwnkit
uses: peaktwilight/pwnkit/action@v1
with:
target: $${{ secrets.STAGING_API_URL }}
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: pwnkit-report/report.sarif Dogfooding
pwnkit reviews its own source code
pwnkit runs pwnkit review . on its own repository. The same agentic pipeline that found 7 CVEs — pointed at itself. If it finds something, you'll see it here.
Set it up on your repo in 2 minutes:
1. Add to your GitHub Actions workflow:
- run: npx pwnkit-cli review . --format json > pwnkit-report.json 2. Add the badge to your README:
[](https://pwnkit.com) Built from real security research
7 CVEs found in packages with 40M+ weekly downloads.
Stop guessing.
Start proving.
pwnkit-cli https://api.example.com/chat pwnkit-cli express pwnkit-cli ./my-repo pwnkit-cli https://github.com/org/repo