pwnkit is an open-source agentic framework for autonomous security research. It uses AI agents in a research-then-verify pipeline to find and prove vulnerabilities in AI/LLM apps, npm packages, and source code.

How does pwnkit eliminate false positives?

pwnkit's Verify agent independently re-exploits every finding. If it can't reproduce the vulnerability, the finding is killed as a false positive. Only confirmed vulnerabilities with working proof-of-concept code make it into the final report. The local dashboard provides a triage workbench for operators to review evidence, manage finding families, and control the verification workflow.

How much does pwnkit cost?

pwnkit is free and open source (Apache 2.0 license). It's an agentic harness — bring your own API key, or use it with Claude Code CLI or Codex CLI through your existing subscription. pwnkit orchestrates the pipeline, your tools power the AI.

What can pwnkit scan?

pwnkit scans AI/LLM apps, traditional web apps, npm packages, and source code repositories. It includes resumable scans, finding triage with deduplication, deterministic replay, a local verification dashboard, diff-aware PR review, and autonomous orchestration workers.

Let autonomous AI agents hack you
so the real ones can't.

The leading open-source AI pentest agent.

91.3% 

on XBOW · 95 of 104 challenges · best-of-N aggregate

87.5% (91/104) black-box · both numbers reported separately

npx pwnkit-cli

GitHub

Or read the documentation

Terminal

$ npx pwnkit-cli express

pwnkit-cli v0.3.4

Target: https://demo.app/api/chat

▸ Discover Found 3 endpoints, system prompt extracted

▸ Attack Ran 47 test cases across 6 categories

▸ Verify Confirmed 4 of 7 findings (3 false positives eliminated)

▸ Report Written to ./pwnkit-report/

4 verified vulnerabilities found.

Published benchmarks

XBOW

91.3% / 87.5%

Cybench

80%

npm-bench

F1 0.444

Shell-first. Minimal tools. Real exploits.

Web Apps

SQLi, IDOR, SSTI, XSS, auth bypass, SSRF — 35+ XBOW flags

AI/LLM Apps

Prompt injection, jailbreaks, PII leakage, MCP tool abuse

npm Packages

Supply chain attacks, malware, CVEs, typosquatting

Source Code

White-box mode reads code before attacking

91.3% on standard XBOW.

95 of 104 challenges — best-of-N aggregate across configurations. Black-box mode alone is 91/104 = 87.5%; both numbers reported separately, no methodology blending. Plus 8/10 on Cybench (first run).

Scroll to compare →

System	XBOW score	Maintained?	Comparable?	Notes
BoxPwnr (best-of-N)	97.1%	Yes	No	Best-of-N across 10+ model+solver configs
Shannon	96.15%	Yes	No	Modified hint-free fork + white-box source access
KinoSec	92.3%	Yes	No	Proprietary, closed source
XBOW (own agent)	85%	Yes	No	Built by XBOW for their own benchmark
pwnkit (white-box best-of-N)	91.3%	Yes	Yes	95/104 · same model + tools · `--repo` source access · aggregate across `features=none`/`experimental`/`all` · open source
pwnkit (black-box)	87.5%	Yes	Yes	91/104 · single model, single command, standard benchmark · open source
Cyber-AutoAgent	85%	Archived Nov 2025	Yes	Repo archived 2025-11-29 — project is dead
BoxPwnr (single config)	~80-82%	Yes	Yes	Apples-to-apples single-config baseline
deadend-cli	~80%	Yes	Yes	Open source agent
MAPTA	76.9%	Yes	Yes	Academic agent (arXiv:2508.20816)

Comparable = standard 104-challenge XBOW with methodology stated explicitly in the row. Source access, best-of-N aggregation, modified forks, and closed-source constraints are called out directly so black-box and white-box results are not silently blended.

Cybench — first run

10-challenge subset across web, crypto, pwn, reverse, forensics · includes a Medium-difficulty solve

8 / 10 = 80%

npm-bench — first published score

81 packages (27 malicious / 27 CVE / 27 safe) · the only open-source AI npm-audit benchmark with public results

F1 0.444 · 50% acc

Run it yourself: pnpm bench --agentic · Full benchmark writeup · Source

Built for builders.

One model. One command. Every layer open and inspectable.

Real exploits, not pattern matching

Every finding is independently re-exploited by a blind verify agent that never sees the original reasoning. If it can't be proven, it doesn't ship.

11-layer triage

Holding-it-wrong filter, per-class oracles, reachability gate, multi-modal cross-validation, adversarial debate. Every finding survives the gauntlet before you see it.

Apache 2.0

Read every line. Fork it. Vendor it. 188 tests, 25k+ lines of TypeScript, daily releases. No SaaS lock-in, no per-finding billing, no asterisks.

Target → Scan → Triage → Verify → Outputs

The same plan-discover-attack-verify-report loop a real pentester runs.

Target

web / LLM
npm / code

→

Scan

shell-first
agent loop

→

Triage

reject &
downgrade

→

Verify

blind
re-exploit

→

Outputs

SARIF/MD/PDF
JSON/Issues

Architecture

Just give it a target.

pwnkit-cli express

Audit an npm package

pwnkit-cli ./my-repo

Review source code

pwnkit-cli https://api.com/chat

Scan an LLM API

pwnkit-cli https://example.com --mode web

Pentest a web app

pwnkit-cli dashboard

Local mission control

pwnkit-cli findings list --severity critical

Triage across scans

Auto-detects target type. No subcommands needed for most targets.

CLI runs the scans. pwnkit-cli dashboard opens a local web UI for triage, evidence review, and human sign-off. Drop the GitHub Action into CI to push verified findings into GitHub's Security tab as SARIF.