pwnkit is an open-source agentic framework for autonomous security research. It uses AI agents in a research-then-verify pipeline to find and prove vulnerabilities in AI/LLM apps, npm packages, and source code.

How does pwnkit eliminate false positives?

pwnkit's Verify agent independently re-exploits every finding. If it can't reproduce the vulnerability, the finding is killed as a false positive. Only confirmed vulnerabilities with working proof-of-concept code make it into the final report. The local dashboard provides a triage workbench for operators to review evidence, manage finding families, and control the verification workflow.

How much does pwnkit cost?

pwnkit is free and open source (Apache 2.0 license). It's an agentic harness — bring your own API key, or use it with Claude Code CLI or Codex CLI through your existing subscription. pwnkit orchestrates the pipeline, your tools power the AI.

What can pwnkit scan?

pwnkit scans AI/LLM apps, traditional web apps, npm packages, and source code repositories. It includes resumable scans, finding triage with deduplication, deterministic replay, a local verification dashboard, diff-aware PR review, and autonomous orchestration workers.

Let autonomous AI agents hack you
so the real ones can't.

Fully autonomous agentic pentesting framework. Attacks AI/LLM apps, web apps, npm packages, and source code. Blind PoC verification to minimize false positives.

npx pwnkit-cli

GitHub

Or read the documentation

Terminal

$ npx pwnkit-cli express

pwnkit-cli v0.3.4

Target: https://demo.app/api/chat

▸ Discover Found 3 endpoints, system prompt extracted

▸ Attack Ran 47 test cases across 6 categories

▸ Verify Confirmed 4 of 7 findings (3 false positives eliminated)

▸ Report Written to ./pwnkit-report/

4 verified vulnerabilities found.

ID	Finding	Risk	Conf
NF-001	Direct prompt injection	HIGH	99%
NF-002	System prompt extraction	MED	95%
NF-003	SSRF via tool callback	HIGH	97%
NF-004	PII leak in chat context	HIGH	92%

Shell-first. Minimal tools. Real exploits.

Web Apps

SQLi, IDOR, SSTI, XSS, auth bypass, SSRF — 35+ XBOW flags

AI/LLM Apps

Prompt injection, jailbreaks, PII leakage, MCP tool abuse

npm Packages

Supply chain attacks, malware, CVEs, typosquatting

Source Code

White-box mode reads code before attacking

35+ flags across 5 benchmarks.

35+ flags on XBOW (104 Docker CTF challenges). 10/10 on AI/LLM security. Validated across 5 benchmark suites. Playwright for XSS. White-box mode for source-aware scanning. Every finding independently re-exploited to kill false positives.

35+ flags

XBOW Web Pentesting

SQLi, IDOR, SSTI, XSS, auth bypass, RCE

10/10

AI/LLM Security

Prompt injection, jailbreaks, PII leakage

built

AutoPenBench

33 network/CVE tasks (Log4Shell, Heartbleed)

built

HarmBench

510 LLM safety behaviors

built

npm Audit

30 packages — first npm security benchmark

3/5

White-box Mode

Cracked impossible challenges with source access

10/10

Detection

10/10

Flag Extraction

False Positives

Run it yourself: pnpm bench --agentic · View benchmark source

Just give it a target.

pwnkit-cli express

Audit an npm package

pwnkit-cli ./my-repo

Review source code

pwnkit-cli https://api.com/chat

Scan an LLM API

pwnkit-cli https://example.com --mode web

Pentest a web app

pwnkit-cli dashboard

Local mission control

pwnkit-cli findings list --severity critical

Triage across scans

Auto-detects target type. No subcommands needed for most targets.

Why pwnkit

Zero config

No YAML. No Python. Just npx pwnkit-cli and you're running.

Blind verification

Every finding is independently re-exploited. Can't reproduce it? Killed as a false positive.

Bring your own AI

Your API key, or use Claude Code CLI / Codex CLI with your subscription. Any model, any provider.

CLI runs. Dashboard triages.

pwnkit-cli

The execution surface. Run scans, audits, and reviews from your terminal or CI. Resume interrupted scans. Replay attack chains. Export SARIF, JSON, Markdown, or HTML.

$ pwnkit-cli scan --target https://api.com $ pwnkit-cli audit express --depth deep $ pwnkit-cli review ./my-repo --diff-base main

pwnkit-cli dashboard

The operator surface. Local web UI for finding triage, evidence review, scan provenance, and human sign-off. Launch scans, manage the verification queue, and track finding families across runs.

$ pwnkit-cli dashboard → http://127.0.0.1:48123

How it compares

Scroll to compare →

Feature	pwnkit	promptfoo (acquired by OpenAI)	garak	nuclei	Semgrep
Autonomous multi-agent	Agentic pipeline	—	—	—	—
Verification (no false positives)	Re-exploits	—	—	—	—
AI/LLM app scanning	✓	✓	✓	—	—
npm package audit	✓	—	—	—	Rules
Source code review	AI-powered	—	—	—	Rules
AI attack coverage	30+ agentic	Partial	Partial	—	—
Zero config	npx	YAML	Python	Templates	Config
Independent	✓	Acquired	✓	✓	VC-backed
Open source	Apache-2.0	OpenAI-owned	OSS	MIT	LGPL

Findings in GitHub's Security tab.

.github/workflows/pwnkit.yml

name: Security Pentest
on: [push, pull_request]

jobs:
  pwnkit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run pwnkit
        uses: peaktwilight/pwnkit/action@v1
        with:
          target: $${{ secrets.STAGING_API_URL }}
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: pwnkit-report/report.sarif

Dogfooding

pwnkit reviews its own source code

On every push via GitHub Actions

pwnkit runs pwnkit review . on its own repository. The same agentic pipeline that found 7 CVEs — pointed at itself. If it finds something, you'll see it here.

View CI runs

Set it up on your repo in 2 minutes:

1. Add to your GitHub Actions workflow:

- run: npx pwnkit-cli review . --format json > pwnkit-report.json

2. Add the badge to your README:

[![pwnkit](https://pwnkit.com/badge/ORG/REPO)](https://pwnkit.com)

Built from real security research

7 CVEs found in packages with 40M+ weekly downloads.

node-forge 32M/week mysql2 5M/week Uptime Kuma 86K stars LiquidJS CVE jsPDF 2 CVEs picomatch CVE

Full CVE writeups

Stop guessing.
Start proving.

pwnkit-cli https://api.example.com/chat

pwnkit-cli express

pwnkit-cli ./my-repo

pwnkit-cli https://github.com/org/repo

Star on GitHub

Let autonomous AI agents hack youso the real ones can't.

Shell-first. Minimal tools. Real exploits.

Web Apps

AI/LLM Apps

npm Packages

Source Code

35+ flags across 5 benchmarks.

Just give it a target.

Why pwnkit

Zero config

Blind verification

Bring your own AI

CLI runs. Dashboard triages.

pwnkit-cli

pwnkit-cli dashboard

How it compares

Findings in GitHub's Security tab.

Dogfooding

pwnkit reviews its own source code

Built from real security research

Stop guessing.Start proving.

Let autonomous AI agents hack you
so the real ones can't.

Stop guessing.
Start proving.