How AI Agents Found 7 CVEs in Popular npm Packages
A systematic workflow using Claude Opus to audit open-source packages led to 73 findings, 7 published CVEs, and the realization that this process could be automated for everyone.
In early March 2026, I started a project that I expected to last a weekend. I wanted to see if I could use an AI agent — specifically Claude Opus — to systematically audit popular npm packages for security vulnerabilities. Not just run a linter. Actually read source code, trace data flows, identify trust boundary violations, and produce working proof-of-concept exploits.
Three weeks later, I had 73 security findings across dozens of packages, 7 published CVEs, and a framework that had found vulnerabilities in packages with a combined download count exceeding 40 million per week.
This post is about how that workflow operates, what it found, and why it led me to build pwnkit.
The workflow
The process is not complicated. It is, however, extremely methodical — which is exactly where AI agents excel. Here is the pipeline I run for each target:
Pick a package based on download count, attack surface (does it parse untrusted input? handle crypto? process URLs?), and history of prior vulnerabilities. High downloads plus complex parsing logic is the sweet spot.
The agent reads the source code front to back. Not skimming — reading. It maps entry points, traces how user input flows through the system, identifies trust boundaries, and flags patterns that historically lead to vulnerabilities: unvalidated input, missing bounds checks, string concatenation in security-sensitive contexts.
Every finding gets a working proof of concept. If the agent can't write a PoC that demonstrates the vulnerability, the finding is discarded. No maybes. No theoretical risks. Working exploits or nothing.
Responsible disclosure through GitHub Security Advisories or direct maintainer contact. Full writeup, PoC code, suggested fix, 90-day timeline. Then wait.
That is the entire system. No proprietary scanning engine. No signature database. Just an AI agent that reads code the way a security researcher reads code — except it does not get tired, does not skip the boring parts, and can process an entire codebase in minutes.
What it found
Here are some of the highlights. Each of these has a full writeup on doruk.ch with technical details, PoCs, and disclosure timelines.
node-forge — Certificate forgery
CVE-2026-3389632 million weekly downloads. The core certificate chain verification logic had a conditional check that only validated basicConstraints when the extension was present. When absent — which is normal for end-entity certificates — any certificate could act as a CA. One conditional. A billion yearly downloads. Certificate forgery for any domain.
mysql2 — Connection override + 3 more
4 findings5 million weekly downloads. URL query parameters could override the host, disable TLS, and enable multi-statement queries. Plus prototype pollution, geometry parsing DoS, and an out-of-bounds read in packet framing. Four vulnerabilities that chain together: redirect the connection, then crash the client. The maintainer shipped all four fixes in 24 hours.
Read the full writeup →Uptime Kuma / LiquidJS — SSTI bypass
CVE-2026-33130A previously "patched" SSTI vulnerability was still exploitable. The entire security boundary — three separate mitigations — was bypassed by removing two quote characters from the payload. The root cause was in LiquidJS's require.resolve() fallback, which had no path containment checks. Four independent researchers found the same bug through different vectors.
jsPDF — PDF injection + XSS
CVE-2026-31898 / CVE-2026-31938Arbitrary PDF object injection via unsanitized annotation color parameters. Plus HTML injection through document.write() in output methods — CVSS 9.6 Critical. Another researcher reported first; I independently found the same issues and contributed defense-in-depth hardening to the fixes.
Why AI agents are good at this
The common thread across all of these findings is that they are not sophisticated. A missing conditional check. An unfiltered URL parameter. A fallback code path with no validation. A string concatenation where there should be DOM construction. These are not zero-days requiring months of reverse engineering. They are the kind of bugs that exist because nobody sat down and read the code carefully enough.
That is precisely what AI agents are good at. The tedious, methodical work of reading every function, tracing every input, checking every assumption. A human researcher gets fatigued after a few hours of source review. An AI agent processes the entire codebase with the same level of attention on the last file as the first.
The key insight: the agent does not need to be creative. It needs to be thorough. Creativity helps for novel attack classes, but the vast majority of real-world vulnerabilities are variants of known patterns — missing validation, improper access control, trust boundary violations. An agent that systematically checks for those patterns across an entire codebase will find things that humans miss through fatigue or oversight.
73 findings, 7 CVEs — the numbers
After three weeks of running this workflow across popular npm packages, the security audit framework had accumulated:
- — 73 total findings across dozens of packages
- — 7 published CVEs in node-forge, mysql2, Uptime Kuma, LiquidJS, jsPDF, and picomatch
- — 40M+ weekly downloads affected across the vulnerable packages
- — Every finding verified with a working proof of concept
Not every finding became a CVE. Some were lower severity, some were in packages with smaller install bases, some were reported but not yet disclosed. But every single one was verified with a working exploit before it was reported. No theoretical risks. No "this might be a problem." Working code or it did not count.
From manual workflow to pwnkit
The security audit framework worked. But it was manual. I had to set up each audit, configure the agent, manage the output, track findings, write reports. The workflow was repeatable, but it required me at the controls.
The obvious next step: automate the workflow so anyone can run it.
That is what pwnkit is. The same agentic pipeline — discover, attack, verify, report — packaged as an open-source CLI tool. Point it at an npm package, an LLM endpoint, an MCP server, or a source code repository. It runs autonomous AI agents in sequence, each specialized for a phase of the security assessment. The verification agent independently re-exploits every finding. If it cannot reproduce, the finding is killed.
The 7 CVEs were the proof that this approach works. pwnkit is the tool that makes it accessible.
npx pwnkit-cli audit --package node-forge If you are shipping software that depends on open-source packages — and you almost certainly are — the question is not whether these vulnerabilities exist in your dependency tree. They do. The question is whether you find them before someone else does.