The Finding-Fixing Gap: A Framework for What Comes After Mythos

Anthropic just showed us the future of offensive security. Their new model, Claude Mythos Preview, autonomously found and exploited zero-day vulnerabilities in every major OS and browser. A 27-year-old OpenBSD bug. A 17-year-old FreeBSD RCE. Thousands more they can't disclose because patches don't exist yet.

The number that stopped me: Mythos took a known, already-patched Linux kernel vulnerability and turned it into a working root exploit in under a day, for under $2,000. No human involved after the initial prompt. (Source)

The fix existed. Most organizations still wouldn't have deployed it in time. That's the part I can't stop thinking about.

The right move, but not enough

Anthropic's response, Project Glasswing, gives critical infrastructure partners early access to find and fix. Right move. But the same vulnerable software runs in every mid-size SaaS company, every startup on a legacy stack. There is no scaled infrastructure for helping organizations without a 50-person security team go from "vulnerability disclosed" to "patched in production."

Finding just got mass-produced. Fixing is still manual. Over 32% of exploited CVEs in H1 2025 showed exploitation activity on or before disclosure day. Most organizations take weeks to months to patch.

For zero-days, you can't patch what you can't see

Two things from Anthropic's blog give me hope. Even Mythos couldn't exploit many Linux kernel vulnerabilities remotely because defense-in-depth held. Hard barriers like KASLR (address randomization) and W^X (memory protection) work. And Mythos's exploits chain well-understood techniques (ROP chains, heap sprays, race conditions) that produce observable system behavior at runtime.

My read: you don't need to know the bug to detect something acting like an exploit. Runtime behavioral monitoring, paired with attack surface reduction (network segmentation, least privilege, zero trust), shifts your posture from "prevent every exploit" to "detect and contain before impact."

The SPAR Framework

No framework covers everything. But for most small teams, the current default is no structured approach at all. SPAR is a starting point. Current frontier models like Claude Opus 4.6 can be applied across all four layers: stacking with your tools, triaging findings, drafting patches, and assisting rewrites. They're not Mythos, but they're already a force multiplier.

S: Stack your defenses

Free tools and frontier models, together. Semgrep for static analysis, Trivy for containers, Grype for dependencies, OWASP ZAP for runtime issues, and MCP Armor for securing MCP tool integrations in AI agent workflows. Layer frontier models on top to scan your codebase with reasoning that rule-based tools can't match. Block builds only on high-confidence critical issues. This layer produces volume. The next layer makes sense of it.

P: Prioritize the 0.5% that matters

Your scan tools and models will produce hundreds of findings. Most don't matter. Feed them into a frontier model along with your organizational context: your architecture (what's internet-facing, where trust boundaries are, how services connect), your history (which modules have the most past issues, what categories keep recurring), and three signals that cut through the noise. CISA KEV: if a CVE is on it and in your environment, fix it now. EPSS: predicts exploitation probability within 30 days, covering 50% of actually-exploited vulnerabilities with 80% less work than CVSS alone. SSVC: adds business context. Their data shows only 0.52% of vulnerabilities classify as "Act immediately." That's your real workload. Repeated patterns in the same component also tell you when to stop patching and start rewriting.

A: Auto-fix with AI

GitHub Copilot Autofix cuts median fix time from 1.5 hours to 28 minutes for the vulnerability classes it covers. But be honest about the boundary: auto-fix works for well-defined patterns. For the subtle bugs Mythos found, a signed integer overflow buried in 27-year-old code, no current tool patches those automatically. AI handles the volume so your team has time for the hard problems.

R: Rewrite when patching isn't enough

70% of Microsoft's CVEs since 2006 were memory safety bugs. When a component keeps getting patched for the same vulnerability class, that's systemic. Use the Strangler Fig pattern: replace the worst modules in a memory-safe language, one at a time. Priority targets: authentication code, crypto implementations, network parsers, anything handling untrusted input. This is the hardest layer. Most teams won't get here immediately. But knowing when something has crossed from "patchable" to "needs replacement" changes how you spend your time.

When Mythos becomes available to everyone

SPAR doesn't expire when better models arrive. It compounds. Swap in a Mythos-class model and the "S" finds the subtle zero-days current models miss. The "A" becomes a tighter loop: the same model that finds the vulnerability drafts the fix with full context of what went wrong.

But the context you've built, your architecture map, vulnerability history, recurring patterns, is what makes any model perform better. The organizations that benefit most won't be the ones with the best model. They'll be the ones that built the best context to feed it.

The gap that matters

I don't have all the answers. But the industry spent decades getting better at finding, and the fixing side hasn't kept pace. That gap is about to widen fast.

If you're a small team, start somewhere. Even an imperfect approach beats the default. The window between disclosure and exploitation is shrinking, and the organizations that survive the transition will be the ones that started building their fixing muscle before they needed it.