Question 1

What is pwnkit?

Accepted Answer

pwnkit is the leading open-source adversarial testing framework for AI systems, web apps, packages, and source code.

Question 2

How does pwnkit eliminate false positives?

Accepted Answer

pwnkit's Verify agent independently re-exploits every finding. If it can't reproduce the vulnerability, the finding is killed as a false positive. Only confirmed vulnerabilities with working proof-of-concept code make it into the final report.

Question 3

How much does pwnkit cost?

Accepted Answer

pwnkit is free and open source under the Apache 2.0 license. It is an agentic harness — bring your own API key, or run it through Claude Code CLI or Codex CLI on your existing subscription.

Question 4

What can pwnkit scan?

Accepted Answer

pwnkit scans AI/LLM apps, traditional web apps, npm packages, and source code repositories. It includes resumable scans, finding triage with deduplication, deterministic replay, a local verification dashboard, diff-aware PR review, and autonomous orchestration workers.

Question 5

How does pwnkit compare to XBOW?

Accepted Answer

pwnkit publishes its benchmark methodology and evidence ledger publicly. 99 of 104 on the XBOW benchmark with full traces, per-model breakdowns, and methodology documented on the benchmark page.

Question 6

How is pwnkit cloud different from the other AI pentest platforms?

Accepted Answer

The pwnkit engine is open source, so security teams can inspect the prompts, tool loop, and benchmark methodology before trusting the managed layer. The cloud product adds scoped orchestration, evidence handling, and operator review around that engine.

Question 7

Can pwnkit cloud be used for SOC 2, ISO 27001, or customer security questionnaires?

Accepted Answer

Use it as supporting security evidence, not as a SOC 2 or ISO 27001 certification claim. Findings are shaped to include the transcript, proof, and review status so engineering and security reviewers can inspect the work.

Question 8

Can pwnkit cloud run against production?

Accepted Answer

Only inside an agreed scope, with an action allowlist and stop procedure defined before testing. If production is not appropriate, the engagement runs against an authenticated mirror instead.

Question 9

Does pwnkit train models on customer data?

Accepted Answer

No shared training claim is part of the product. Customer data handling, retention, and deletion are defined in the engagement scope before work starts.

Status	Target	Type	Mode	Model	Findings	Cost	Duration	Finished
running	checkout-api	api	auth-bypass	gpt-5	1	$0.4123	4m 12s	running
complete	shop.wayne-enterprises.com	web	audit	claude-opus-4-7	4	$0.9374	11m 03s	12m ago
complete	[email protected]	npm	scan	gpt-5	1	$0.6121	7m 48s	31m ago
complete	wayne-enterprises/payments-svc	source	review	claude-opus-4-7	2	$0.5012	5m 22s	1h ago
pending	support-bot-staging	agent	tool-abuse	gpt-5	0	—	—	queued

Cybersecurity was built for human speed .
We're rebuilding it for the AI era.

Dashboard

Recent scans

PwnKit finds bugs in software
billions depend on.

Open-source hacking agents,
proven in public.

Agentic by design

Validated by exploit

Open by default

Leading the benchmarks

Point it at
what you ship.

Pressure-test
critical systems.

Agree what can be attacked.

Apply adversarial pressure.

Replay before reporting.

Keep the noise out.

Give engineers the trail.

Questions we answer
before you ask.

Start locally.
Scale when it matters.

Cybersecurity was built for human speed . We're rebuilding it for the AI era. AI era.

PwnKit finds bugs in software billions depend on.

Open-source hacking agents, proven in public.

Agentic by design

Validated by exploit

Open by default

Leading the benchmarks

Point it at what you ship.

Pressure-test critical systems.

Agree what can be attacked.

Apply adversarial pressure.

Replay before reporting.

Keep the noise out.

Give engineers the trail.

Questions we answer before you ask.

How is this different from the other AI pentest platforms?

Can I use this for SOC 2, ISO 27001, or customer security questionnaires?

Can you run this against production?

Do you use our data to train models for other customers?

Where is the pricing?

Start locally. Scale when it matters.