The vulnerability disclosure timeline just compressed. A capability that finds 27-year-old bugs in well-audited code can find newer bugs in less-audited code in hours. Defensive teams now have to assume that any latent vulnerability in their codebase will be discovered — by someone — within the operational lifetime of the next two model generations. Security-by-obscurity is dead, formally. The "small attack surface" defense has always been weaker than its proponents suggest, and it's now untenab

Claude Mythos is the highest-capability publicly-disclosed offensive-security AI as of May 2026. Several other systems exist in the same neighborhood; the comparison frames the field. System Provider Access model Reported autonomy Notable finds Claude Mythos Preview Anthropic Restricted (Glasswing partners only) End-to-end zero-day discovery + exploit 27-year-old OpenBSD bug; CVE-2026-4747 FreeBSD RCE; thousands of zero-days in major OSes/browsers Big Sleep (Google DeepMind) Google Internal use,

Claude Mythos Found a 17-Year-Old FreeBSD Zero-Day

Q: What's next

Three trajectories matter for the rest of 2026. First, Mythos-class capability will diffuse. Anthropic's competitive landscape doesn't change, even with Glasswing's restrictions. Other labs are training similar models. Within twelve to eighteen months, Mythos-equivalent capability will be available via at least one model with public API access. That's the window for defenders. Use it. Second, regulation is coming. The capability profile of Mythos — autonomous zero-day discovery — sits exactly wh

Anthropic‘s Claude Mythos Preview model autonomously identified and exploited a 17-year-old remote code execution vulnerability in FreeBSD’s NFS implementation, now triaged as CVE-2026-4747. The same model has surfaced thousands of zero-day vulnerabilities across every major operating system and every major web browser during internal testing, with the oldest find being a 27-year-old bug in OpenBSD — an OS whose security posture is the gold standard. In response, Anthropic launched Project Glasswing, restricting Mythos Preview access to a curated group of large-organization partners and open-source maintainers so defenders can patch first, before equivalent capability inevitably proliferates. This is the most consequential AI-security news of 2026 so far, and it deserves a careful read.

Want the complete, hands-on version of this guide?Browse the Library →

What’s actually new

The capability itself is the headline. Prior LLMs could assist a security researcher who already knew where to look — finishing exploits, suggesting fuzzing harnesses, explaining vulnerable code paths. Claude Mythos operates an order of magnitude beyond that: it autonomously plans the search, reads target code, formulates hypotheses about exploit primitives, validates them via execution in a sandbox, and produces working proof-of-concept exploits without human intervention. The CVE-2026-4747 discovery — full unauthenticated RCE in FreeBSD NFS, exploitable by anyone on the internet — was found end-to-end by Mythos with a single high-level instruction.

The 27-year-old OpenBSD bug demands separate emphasis. OpenBSD has been the security-conscious OS for three decades, audited continuously by professional and academic researchers, with formal verification on portions of its kernel. That a frontier LLM found an exploitable defect that survived 27 years of human scrutiny says something specific: the search space of code review is large enough that humans miss things consistently, and the right kind of automated reviewer can catch what humans don’t. This isn’t a “more eyes” argument; it’s a different category of eye altogether.

Project Glasswing is the second half of the story. Rather than release Mythos Preview broadly, Anthropic gated access to organizations with concrete defensive responsibilities: AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, and select open-source maintainers. The reasoning is laid out plainly in Anthropic’s announcement: the same capability that lets defenders find bugs lets attackers find them, so the responsible release pattern is “give defenders a head start.” The Glasswing partners are running Mythos against their own codebases, surfacing vulnerabilities, and patching them before the model — or any successor with similar capability — reaches broad availability.

The model itself is described as a general-purpose successor in the Claude family, with the security-research capability arising as an emergent property of broader reasoning improvements rather than as a domain-specific fine-tune. That detail matters: it means the next several frontier models from any major lab are likely to have similar offensive-security capabilities, whether or not those labs intend or advertise them.

Why it matters

The vulnerability disclosure timeline just compressed. A capability that finds 27-year-old bugs in well-audited code can find newer bugs in less-audited code in hours. Defensive teams now have to assume that any latent vulnerability in their codebase will be discovered — by someone — within the operational lifetime of the next two model generations.
Security-by-obscurity is dead, formally. The “small attack surface” defense has always been weaker than its proponents suggest, and it’s now untenable. Every line of shipping code is reachable by a model that can read, hypothesize, and validate.
The defender-attacker arms race got faster. Glasswing-class restricted access models give defenders a months-to-years head start. That’s enough to harden critical infrastructure, but not enough to fix the long tail of legacy code. Triage hard.
Open-source maintainers face new triage pressure. A solo maintainer with a popular library can suddenly receive dozens of high-quality CVE reports from automated systems. Without staffing, those reports pile up. Foundations and corporate sponsors need to fund maintenance commensurate with the new disclosure volume.
Compliance frameworks lag the new capability. SOC 2, ISO 27001, FedRAMP — none of these explicitly require continuous AI-assisted vulnerability scanning yet. Within 18 months they will. Get ahead of it.
The supply-chain integrity problem worsens. If an attacker can find zero-days in dependencies faster than maintainers can patch them, supply-chain attacks become the dominant threat vector. SBOM tooling, dependency-pinning discipline, and vendored-code reviews matter more than ever.

How to use it today

Claude Mythos itself is access-restricted via Project Glasswing — most readers won’t have direct API access, and that’s intentional. But the operational implications are immediate, and there are concrete steps every engineering and security team should take this quarter regardless of whether they have Mythos access.

Update your dependency posture for CVE-2026-4747. If you run FreeBSD-based infrastructure with NFS exposed to untrusted networks, patch immediately. The exploit is straightforward and weaponized proof-of-concept code is now circulating in security-research circles.
```
MASK12
```
Apply for Project Glasswing partnership if you qualify. Glasswing accepts critical-infrastructure operators and high-impact open-source maintainers. The application form is at anthropic.com/glasswing. Approval has been faster than typical for partner programs because Anthropic genuinely wants high-volume defensive use; expect 2-6 weeks if your org fits the criteria.
Run continuous AI-assisted vulnerability scanning on your codebase. Even without Mythos-level capability, current-generation Claude, GPT-5.5, and DeepSeek V4 can find meaningful classes of vulnerabilities at production scale. Set up a scheduled job that scans recent commits for common vulnerability patterns:
```
MASK13
```
Audit your incident response playbook for AI-discovered vulnerabilities. When a Glasswing-class disclosure lands in your inbox, you need to know: who triages, what’s the SLA, who has authority to declare an emergency patch cycle, and how customers get notified. If you don’t have answers today, write them down before you receive your first notice.
Increase your code-review depth on safety-critical paths. Authentication, authorization, deserialization, parsers, network-facing services — these were always the high-risk zones, and the new threat model makes them dramatically higher-stakes. If you don’t have a “tier 1 review” policy that requires two senior engineers to sign off on changes to these paths, implement one this quarter.
Sponsor vulnerability bounties for the open-source dependencies you actually rely on. A maintainer overwhelmed with AI-discovered CVE reports needs funding to triage them. Work out which OSS dependencies are critical to your stack, identify whether their maintainers are paid for security work, and contribute (via OpenSSF, Tidelift, or direct sponsorship) where they aren’t.

How it compares

Claude Mythos is the highest-capability publicly-disclosed offensive-security AI as of May 2026. Several other systems exist in the same neighborhood; the comparison frames the field.

System	Provider	Access model	Reported autonomy	Notable finds
Claude Mythos Preview	Anthropic	Restricted (Glasswing partners only)	End-to-end zero-day discovery + exploit	27-year-old OpenBSD bug; CVE-2026-4747 FreeBSD RCE; thousands of zero-days in major OSes/browsers
Big Sleep (Google DeepMind)	Google	Internal use, partial public disclosures	End-to-end with human-in-the-loop validation	SQLite stack-overflow CVEs; multiple Chrome/V8 exploits
GPT-5.5 + Codex agent	OpenAI	API access, broad availability	Strong assistance; less autonomous full-loop	Bug-finding in CI; exploit generation when guided
XBOW	XBOW Inc.	Commercial product	Autonomous web-app pentest	Top of HackerOne leaderboard mid-2025
DARPA AIxCC finalists	Multiple labs	Open-source release after 2025 finals	Autonomous repair + discovery	Linux kernel CVEs; OpenSSL findings

The comparison surfaces the key shift: Mythos’s reported capability is qualitatively beyond the next-best public system. Big Sleep’s most famous finds required human validation of generated exploits; Mythos validates autonomously. XBOW operates at web-application scope; Mythos operates across kernels, browsers, and protocol stacks. The DARPA AIxCC systems are excellent but operate against well-defined challenge problems; Mythos operates against arbitrary production code. That qualitative gap explains the access restriction — a model that finds 27-year-old kernel bugs across every OS isn’t something you ship via OpenAPI.

What’s next

Three trajectories matter for the rest of 2026.

First, Mythos-class capability will diffuse. Anthropic’s competitive landscape doesn’t change, even with Glasswing’s restrictions. Other labs are training similar models. Within twelve to eighteen months, Mythos-equivalent capability will be available via at least one model with public API access. That’s the window for defenders. Use it.

Second, regulation is coming. The capability profile of Mythos — autonomous zero-day discovery — sits exactly where AI policy in the US, EU, and UK has been signaling for the past two years. Expect formal guidance, possibly mandatory restrictions on offensive-security AI capabilities, possibly mandatory disclosure timelines for AI-discovered vulnerabilities. The exact shape will matter less than the timing; whatever lands will land within the next twelve months.

Third, defensive AI tooling will commercialize fast. The Glasswing partner roster reads like the buyer list for any successful enterprise security vendor — and several of those partners are themselves vendors. Expect commercial products built on Mythos-class capability for defensive use to launch in Q3-Q4 2026, with the security-tooling category seeing a wave of consolidation and re-platforming. If you sell software security products, your roadmap probably has to change. If you buy them, your vendor’s roadmap probably has to change too.

The deeper question Mythos surfaces is one the AI safety community has been arguing for years: when a frontier capability is dual-use, do you release broadly and let the market figure it out, or restrict and risk the capability leaking from elsewhere? Anthropic chose restriction. Reasonable people will disagree about whether that was the right call. What’s harder to disagree with: a frontier lab finally treated dual-use capability as a release-engineering problem rather than a marketing event. That’s a precedent worth setting.

Frequently Asked Questions

Can I get access to Claude Mythos through normal Anthropic API access?

No. Mythos Preview is restricted to Project Glasswing partners only. Standard Anthropic API users get Claude Opus 4.6/4.7 and Claude Sonnet 4.6, which have meaningful coding-security capability but not Mythos’s autonomous zero-day-discovery profile. The application route is anthropic.com/glasswing for organizations that fit the partner criteria.

Should we be patching FreeBSD systems immediately for CVE-2026-4747?

If you run FreeBSD with NFS server exposed to any untrusted network: yes, today. The patch is in the FreeBSD security advisory; deploy via freebsd-update or your configuration management. If NFS is internal-only with strict firewall enforcement, the urgency is lower but patching is still recommended within your normal cadence.

Will Mythos-class models be used by attackers, not just defenders?

Eventually, yes. Project Glasswing’s premise is that defenders get a head start. Once Mythos-equivalent capability is broadly available — via leaked weights, independent training runs at other labs, or planned wider release — attackers will use it. The defensive window is months to maybe two years; plan accordingly.

Does this change how we should think about open-source security?

Yes. Open-source projects with sparse maintainer support are now disproportionately exposed — capable AI security tools surface vulnerabilities faster than under-resourced maintainers can patch them. Sponsor maintenance, fund bug bounties, and contribute review capacity for the OSS dependencies your stack actually relies on.

Are LLMs now a replacement for traditional security tools?

No, complementary. Static analyzers, fuzzers, and SAST/DAST tools find different categories of bugs than LLM-based reviewers. The combination is more effective than either alone. The right architecture is multi-layer: traditional tooling for the easy / fast / deterministic finds, LLM-assisted review for the deeper / contextual / cross-cutting issues.

What does this mean for the AI-doomerism debate?

It’s evidence — both sides will use it differently. AI-safety advocates will point to the autonomous-zero-day capability as proof that frontier models pose security risks needing mitigation. AI-progress advocates will point to Glasswing as proof that responsible development patterns can manage those risks. Both readings are partly right. The empirical question — does Glasswing-style restriction actually buy meaningful defensive lead time? — will be answered by what happens over the next eighteen months.

Go deeper than this article

This article covers the essentials. Our premium eguide library gives you the full step-by-step playbooks — prompts, workflows, and copy-paste recipes you can put to work today.

Browse Premium Eguides →