Four Chinese AI labs shipped frontier-grade open-weights coding models inside a twelve-day window in late April 2026. Z.ai’s GLM-5.1, MiniMax M2.7, Moonshot’s Kimi K2.6, and DeepSeek V4 all landed at roughly the same capability ceiling on agentic engineering benchmarks — and none of them costs more than a third of Claude Opus 4.7 to run. The release cluster forces a question Western developers can no longer wave away: are open-weights coding models from China now a serious alternative to closed frontier models for production work? The short answer, after a week of community benchmarking, is yes for most use cases. This guide unpacks what’s actually new, why it matters, and how to evaluate the four contenders for your own pipeline.
What’s actually new
The headline is the simultaneity. Open-weights coding models have been creeping closer to closed-frontier capability for a year, but the late-April release cluster compressed that progress into a twelve-day burst. Each lab released independently — there was no coordination — and yet each landed within a few benchmark points of the others. The implication: the recipe for a frontier-grade open-weights coding model is now well-understood inside multiple labs, and the bottleneck is no longer research, it’s compute and training data.
The four releases also share a common shape. All four ship under permissive licenses (Apache 2.0 or near-equivalents). All four target the agentic-coding use case specifically — long-horizon tool use, multi-file edits, error recovery — rather than the older “complete this function” benchmark suite. All four publish detailed eval reports. And all four are cheaper to run than equivalent closed models by a factor of three to ten, depending on your inference setup.
Specifics matter. Z.ai’s GLM-5.1 is a 235B-parameter mixture-of-experts model with 32 experts and 8B active parameters per token, putting it in the same architectural neighborhood as DeepSeek V3. It scores 76.4% on SWE-Bench Verified and 71.2% on Aider’s polyglot benchmark — within a point of GPT-5.5 on both. MiniMax M2.7 is dense at 65B parameters with full long-context support out to 1M tokens, optimized for codebases that exceed typical context windows. Kimi K2.6 is the smallest of the four at 32B parameters, leans heavily into reasoning before code generation, and ships with a custom inference runtime that gets 3x the throughput of standard llama.cpp on consumer hardware. DeepSeek V4 is the most ambitious — 671B total parameters, 37B active, with strong claims on math-heavy code (numerical computing, scientific simulation) where the others trail.
For deployment, the practical news is that all four publish quantized versions on Hugging Face the same day weights drop. GLM-5.1 in 4-bit fits on two H100s. MiniMax M2.7 fits on a single H100 with INT8 quantization. Kimi K2.6 runs comfortably on a 24GB consumer GPU. DeepSeek V4 needs a serious cluster, but the API price (run by DeepSeek directly) is $0.30 per million input tokens and $1.10 per million output — roughly one-tenth of Claude Opus 4.7.
Why it matters
- The “open-weights tax” is gone. A year ago, choosing open-weights for coding meant accepting a 10-15% capability gap. Today, the gap is small enough that for most agentic-coding tasks you can’t tell the difference in production. The cost savings (often 70-90%) are real and immediate.
- Vendor lock-in becomes a deliberate choice, not an architectural inevitability. Teams that built their stack around one closed-frontier API now have a credible second source. The standard procurement playbook applies: dual-source, run continuous evals, switch when the cost-quality curve crosses.
- The competitive frontier moves to inference economics. When four labs ship the same capability the same week, capability stops being the differentiator. The new game is throughput per dollar, time-to-first-token, and how cleanly the model integrates with agent frameworks. This favors infrastructure operators (DeepSeek, Together, Fireworks, Groq) over base-model labs.
- Self-hosting goes from niche to normal. A 32B-parameter Kimi K2.6 fits on a $2,000 GPU. For a team handling sensitive code that can’t go to a US-based API, self-hosting a coding model just got cheap and viable.
- Compliance and data-residency questions get easier. EU teams, healthcare teams, and government teams that previously had to fight for closed-frontier model access can now point to open-weights coding models with comparable capability and run them on infrastructure they fully control.
- The Western frontier labs face real pricing pressure. When DeepSeek V4 API costs are 10x cheaper for similar capability, Anthropic and OpenAI either drop prices, differentiate harder on agentic / multimodal capability, or watch volume migrate. Expect price cuts within the next two quarters.
How to use it today
If you’ve been running an agentic-coding workflow on a closed API, here’s the fastest way to swap in an open-weights coding model without rewriting your stack.
- Pick the right model for your hardware budget. Use the comparison table below as a quick filter. For a single H100, MiniMax M2.7 is the easiest fit. For consumer hardware, Kimi K2.6. For a managed API at the lowest cost, DeepSeek V4. For maximum capability on agentic benchmarks, GLM-5.1.
- Run the model behind an OpenAI-compatible API shim. All four ship with vLLM and SGLang support, both of which expose an OpenAI-compatible endpoint. Your existing client code that calls
openai.chat.completions.createworks unchanged — you just point it at a different base URL.