Let autonomous AI agents hack you
so the real ones can't.

Fully autonomous agentic pentesting framework. Attacks AI/LLM apps, web apps, npm packages, and source code. Blind PoC verification to minimize false positives.

npx pwnkit-cli
GitHub

Or read the documentation

Shell-first. Minimal tools. Real exploits.

Web Apps

SQLi, IDOR, SSTI, XSS, auth bypass, SSRF — 35+ XBOW flags

AI/LLM Apps

Prompt injection, jailbreaks, PII leakage, MCP tool abuse

npm Packages

Supply chain attacks, malware, CVEs, typosquatting

Source Code

White-box mode reads code before attacking

35+ flags across 5 benchmarks.

35+ flags on XBOW (104 Docker CTF challenges). 10/10 on AI/LLM security. Validated across 5 benchmark suites. Playwright for XSS. White-box mode for source-aware scanning. Every finding independently re-exploited to kill false positives.

35+ flags
XBOW Web Pentesting

SQLi, IDOR, SSTI, XSS, auth bypass, RCE

10/10
AI/LLM Security

Prompt injection, jailbreaks, PII leakage

built
AutoPenBench

33 network/CVE tasks (Log4Shell, Heartbleed)

built
HarmBench

510 LLM safety behaviors

built
npm Audit

30 packages — first npm security benchmark

3/5
White-box Mode

Cracked impossible challenges with source access

10/10
Detection
10/10
Flag Extraction
0
False Positives

Run it yourself: pnpm bench --agentic · View benchmark source

Just give it a target.

pwnkit-cli express

Audit an npm package

pwnkit-cli ./my-repo

Review source code

pwnkit-cli https://api.com/chat

Scan an LLM API

pwnkit-cli https://example.com --mode web

Pentest a web app

pwnkit-cli dashboard

Local mission control

pwnkit-cli findings list --severity critical

Triage across scans

Auto-detects target type. No subcommands needed for most targets.

Why pwnkit

Zero config

No YAML. No Python. Just npx pwnkit-cli and you're running.

Blind verification

Every finding is independently re-exploited. Can't reproduce it? Killed as a false positive.

Bring your own AI

Your API key, or use Claude Code CLI / Codex CLI with your subscription. Any model, any provider.

CLI runs. Dashboard triages.

pwnkit-cli

The execution surface. Run scans, audits, and reviews from your terminal or CI. Resume interrupted scans. Replay attack chains. Export SARIF, JSON, Markdown, or HTML.

$ pwnkit-cli scan --target https://api.com $ pwnkit-cli audit express --depth deep $ pwnkit-cli review ./my-repo --diff-base main

pwnkit-cli dashboard

The operator surface. Local web UI for finding triage, evidence review, scan provenance, and human sign-off. Launch scans, manage the verification queue, and track finding families across runs.

$ pwnkit-cli dashboard → http://127.0.0.1:48123

How it compares

Scroll to compare →

Feature pwnkit promptfoo (acquired by OpenAI) garak nuclei Semgrep
Autonomous multi-agent Agentic pipeline
Verification (no false positives) Re-exploits
AI/LLM app scanning
npm package audit Rules
Source code review AI-powered Rules
AI attack coverage 30+ agentic Partial Partial
Zero config npx YAML Python Templates Config
Independent Acquired VC-backed
Open source Apache-2.0 OpenAI-owned OSS MIT LGPL

Findings in GitHub's Security tab.

.github/workflows/pwnkit.yml
name: Security Pentest
on: [push, pull_request]

jobs:
  pwnkit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run pwnkit
        uses: peaktwilight/pwnkit/action@v1
        with:
          target: $${{ secrets.STAGING_API_URL }}
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: pwnkit-report/report.sarif

Dogfooding

pwnkit reviews its own source code

On every push via GitHub Actions

pwnkit runs pwnkit review . on its own repository. The same agentic pipeline that found 7 CVEs — pointed at itself. If it finds something, you'll see it here.

Set it up on your repo in 2 minutes:

1. Add to your GitHub Actions workflow:

- run: npx pwnkit-cli review . --format json > pwnkit-report.json

2. Add the badge to your README:

[![pwnkit](https://pwnkit.com/badge/ORG/REPO)](https://pwnkit.com)

Built from real security research

7 CVEs found in packages with 40M+ weekly downloads.

node-forge 32M/week mysql2 5M/week Uptime Kuma 86K stars LiquidJS CVE jsPDF 2 CVEs picomatch CVE
Full CVE writeups

Stop guessing.
Start proving.

pwnkit-cli https://api.example.com/chat
pwnkit-cli express
pwnkit-cli ./my-repo
pwnkit-cli https://github.com/org/repo
Star on GitHub
pwnkit