SubQ Ships First Commercial Subquadratic LLM With 12M Context

Subquadratic, a Miami-based startup, launched SubQ on May 5, 2026 — the first commercially available LLM built on a fully subquadratic sparse attention architecture, shipping with a 1 million token production API context window and a 12 million token research context window. SubQ 1M-Preview claims 7.2x faster attention at 128K, 52.2x faster at 1M, and ~1,000x reduction in attention compute at 12M relative to standard transformer architectures. The release reframes the long-context AI conversation: where Claude, GPT, and Gemini hit 1-2M token limits with degraded quality, SubQ argues that subquadratic architecture is the path forward to genuinely long-context AI without quadratic cost explosion.

What’s actually new

Standard transformer LLMs use quadratic-cost attention: doubling context length quadruples attention compute. This is the mathematical bottleneck that limits practical context windows. SubQ’s architectural bet: a sparse attention mechanism where compute grows linearly with context length. The math is the news; the product is the proof.

The claimed performance numbers are substantial. At 128K tokens, SubQ runs ~7x faster than baseline. At 1M tokens, ~52x faster. At 12M tokens (research-only context), the speedup approaches 1,000x. If true at production quality, this is a meaningful efficiency leap.

Two products at launch. SubQ API exposes the full 12M token window for developers willing to test research-grade capabilities. SubQ Code is a CLI agent built on the same model — competing with Claude Code, Codex CLI, and similar tools — with the long-context capability as differentiator.

Subquadratic is small: $29M seed round from notable investors including early backers of Anthropic, OpenAI, and Stripe. The team is technical; the architecture is novel; the early performance numbers are striking. Researchers have called for independent verification of the most extreme claims (1,000x compute reduction). That verification is in progress; early independent tests confirm some of the speedups while emphasizing the typical caveats about benchmark-vs-real-world performance.

The pricing positioning is aggressive. SubQ targets ~1/5 the cost of frontier models for equivalent work. For long-context use cases — codebase analysis, document review, research synthesis — the math is compelling if the quality holds.

The architecture isn’t entirely new in research literature. Sparse attention has been explored for years (Longformer, BigBird, Performer, and many others). Subquadratic’s bet is that they’ve assembled the right combination of techniques for production-grade quality at extreme context length — a claim that requires field testing to validate.

Why it matters

  • Long-context AI may finally be viable. Claude’s 1M context and GPT-5.5 Pro’s similar limits both degrade in quality at scale. SubQ claims to maintain quality at much larger context. If true, new use cases open up.
  • Pricing pressure on frontier labs. 1/5 the cost is significant. Anthropic, OpenAI, and Google have flexible pricing models; downward pressure from a credible competitor changes the market.
  • Architectural diversity is healthy for the field. Frontier AI dominated by quadratic-attention transformers from 2017 to 2025. Subquadratic represents a meaningful architectural alternative gaining commercial traction.
  • Specific use cases benefit dramatically. Codebase Q&A over millions of lines; deep document review (legal, scientific, regulatory); long-form creative work; multi-hour conversation analysis.
  • Independent verification matters. The biggest claims need third-party validation. Watch for benchmarks from Artificial Analysis, Vellum, LMSYS, and academic groups.
  • Small startup, frontier-level ambition. Subquadratic with $29M is competing with labs spending billions. The asymmetric ambition has historical precedent (DeepSeek, Mistral) but execution risk is real.

How to use it today

SubQ launched in beta on May 5, 2026. Access is available but throttled. Practical engagement:

  1. Sign up for the SubQ beta. The 1M-Preview API and SubQ Code CLI are both in beta access programs.
    # Visit:
    https://subq.ai/
    
    # Apply for beta access:
    # - API access for the 12M context window
    # - SubQ Code CLI for terminal-based agent work
    # - May have waitlist depending on demand
    
    # Once approved, generate API key in their dashboard.
  2. Try a small test to verify behavior. Don’t immediately throw your hardest workload at a new model.
    # Initial test prompt (small, easy):
    curl https://api.subq.ai/v1/messages \
      --header "Authorization: Bearer YOUR_SUBQ_KEY" \
      --header "Content-Type: application/json" \
      --data '{
        "model": "subq-1m-preview",
        "max_tokens": 100,
        "messages": [{"role": "user", "content": "Hello. Tell me about your context window capability in one sentence."}]
      }'
    
    # Verify response quality, latency, and that the basic flow works.
  3. Test the long-context capability with a representative use case. The 12M context is the marketing claim; the value depends on what you can actually do with that much context.
    # Practical long-context test cases:
    
    # 1. Whole codebase Q&A
    # Concatenate your project's source code (perhaps 1-5M tokens)
    # Ask SubQ questions that require understanding the whole codebase
    # Compare to splitting the work across smaller-context model calls
    
    # 2. Multi-document analysis
    # Combine 50-100 long documents (research papers, contracts, etc.)
    # Ask SubQ for cross-document analysis
    # Compare quality to manual extraction + traditional model
    
    # 3. Long conversation continuation
    # Build up a multi-hour conversation history
    # Test whether SubQ maintains coherence vs. losing earlier context
  4. Try SubQ Code for terminal-based agent work. Particularly interesting for repository-scale tasks.
    # Install SubQ Code (verify current installation command from subq.ai docs)
    # Typical CLI agent flow:
    
    cd ~/my-project
    subq-code
    
    # Then prompt the agent:
    "Read all files in src/ and tell me what each major module does.
    Identify any cross-module patterns or potential refactoring
    opportunities. The codebase has 12 modules totaling about
    30,000 lines."
    
    # The long context means SubQ can hold the entire codebase
    # in attention without chunking.
  5. Compare SubQ to your current frontier model on your specific tasks. Don’t take vendor claims at face value.
    # Benchmarking pattern:
    # For each task you care about:
    # 1. Run with current model (Claude/GPT/etc.)
    # 2. Run with SubQ
    # 3. Compare quality, speed, cost
    # 4. Decide if SubQ wins, loses, or ties for that task
    
    # Tasks where SubQ likely wins:
    # - Long-context tasks where quality matters at scale
    # - Cost-sensitive batch processing
    # - Tasks where the 1M-12M context is genuinely needed
    
    # Tasks where frontier may still win:
    # - Tasks within smaller context (where speed matters less)
    # - Tasks requiring frontier-quality reasoning
    # - Tasks needing function calling integrations not yet built in SubQ
  6. Watch for independent benchmarks. Third-party validation will refine the picture significantly over the next 4-8 weeks.
    # Sources to watch:
    - Artificial Analysis (artificialanalysis.ai)
    - Vellum AI benchmark dashboards
    - LMSYS Chatbot Arena
    - Academic papers (arXiv)
    - Hacker News discussions
    - Twitter/X discussions among AI researchers
    
    # Look for:
    - Quality at extreme context lengths
    - Performance on specific benchmark suites
    - Real-world use case reports
  7. Plan production adoption cautiously. Beta models, particularly from small startups, have stability and continuity risks.
    # For production:
    # - Don't put SubQ in critical path until it has track record
    # - Maintain fallback to frontier providers
    # - Test thoroughly before customer-facing deployment
    # - Monitor closely after initial deployment
    
    # For exploration:
    # - Use SubQ for batch jobs, research tasks, internal tools
    # - Iterate on use cases where long context helps
  8. Provide feedback to Subquadratic. Beta users shape the product. Specific reports of what works and what doesn’t accelerate refinement.

How it compares

SubQ enters a long-context AI landscape with established players. Comparison:

Model Stated context Architecture Notable
SubQ 1M-Preview 1M API / 12M research Subquadratic sparse attention 1/5 cost of frontier; 52x faster at 1M (claimed)
Claude Opus 4.7 1M tokens Standard transformer (extended context) Frontier quality; quality degrades at extreme context
GPT-5.5 Pro ~1M tokens Standard transformer Strong frontier model; similar context profile
Gemini 3.1 Pro 2M tokens Transformer variants Largest standard-architecture context
Mistral Medium 3.5 ~256K tokens Standard dense transformer Open-weight focus
Long-context specialist research models Various (1M+) Various sparse attention Mostly research; less production-ready

What distinguishes SubQ in the comparison: the architectural choice. If subquadratic attention delivers production-grade quality at extreme context length, the cost and speed advantages translate to real workload improvements. The risks: novel architecture means less battle-tested behavior; small company means continuity questions; the 1,000x compute reduction claim needs independent verification.

What’s next

Signals to watch over the next 4-8 weeks. Independent benchmarks: Artificial Analysis, Vellum, LMSYS, and academic groups will publish evaluations. The picture will clarify substantially. Beta user reports: real workload reports from developers using SubQ for production-shape work. Hacker News and Twitter will be key signals. Pricing finalization: beta pricing may evolve; watch for GA pricing announcements. Feature additions: function calling, vision, voice — all expected for a competitive model. Track what ships.

The longer-term picture. If SubQ delivers on its architectural promise, subquadratic attention may become a major direction in frontier AI. Other labs (Anthropic, OpenAI, Google) likely have their own subquadratic research; commercial offerings from larger labs may follow within 6-12 months. The architectural diversification is healthy for the field.

For developers and AI users, the practical implication: long-context AI is becoming genuinely usable. Use cases that were previously theoretical (whole-codebase analysis, deep document review, persistent agent memory) become more tractable. Plan around the capability; don’t wait for it to be perfect before exploring.

Frequently Asked Questions

How does SubQ’s 12M context compare to Claude’s 1M?

SubQ’s 12M is research-grade; the production API exposes 1M. Compared to Claude’s 1M, SubQ claims faster processing at the same context and dramatically faster at extreme lengths. Quality at extreme context is the open question that independent benchmarks will answer.

Should I switch from Claude/GPT to SubQ?

Not for everything. For tasks where SubQ’s specific strengths (long context, lower cost) matter, it’s worth testing. For tasks where frontier quality is the bottleneck, established models likely still win. Test on your specific workload before committing.

Is the 1,000x compute reduction claim believable?

The 1,000x is at 12M tokens, which is in the company’s research range. Subquadratic architectures genuinely do reduce compute substantially at long context. Whether the 1,000x exactly holds for production-quality output across diverse tasks is what independent verification will determine.

Can I use SubQ Code instead of Claude Code or Codex CLI?

For specific use cases (working with very large codebases that exceed Claude or GPT context), SubQ Code may have advantages. For typical day-to-day coding within smaller contexts, established CLI agents have longer track records and more mature integrations.

Is Subquadratic a credible long-term player?

With $29M, they’re well-funded for a seed-stage startup but small relative to frontier labs. The team and architecture are credible; commercial viability depends on customer adoption, continued funding, and execution. Watch the next 12-18 months for clearer signals about long-term trajectory.

What about quality on standard benchmarks?

SubQ’s specific strength is long-context efficiency. Standard benchmark performance (MMLU, HumanEval, etc.) at shorter contexts will be where it competes most directly with frontier models. Early reports suggest competitive but not best-in-class performance on typical benchmarks. The differentiation is at scale, not necessarily at smaller-context quality.

Can I get free access for research?

The beta has been reasonably accessible for serious testing. Academic researchers and indie developers building meaningful applications can usually get access. Production-scale use requires paid access. Check current beta terms at subq.ai.

Scroll to Top