Grok 4.3 Cuts API Prices 40% and Adds Native Video Inputs

Grok 4.3, xAI’s newest frontier release, hit broad API availability this week with three changes that meaningfully shift the model-pricing landscape: a 40% cut on input pricing (down to $1.25 per million tokens), expansion to a 1-million-token context window, and the first native video-input support in the Grok line. Add in standalone Speech APIs (low-latency STT and TTS with diarization, timestamps, and multilingual support), category-leading scores on specialized legal and corporate-finance benchmarks, and downloadable artifacts (PDFs, spreadsheets, PowerPoint) generated directly from chat — and the practical case for evaluating Grok 4.3 for production workloads is the strongest it’s been since the model line launched. This is the developer breakdown: what’s new, why it matters, and how to evaluate Grok 4.3 against the rest of the field today.

What’s actually new

Three layered changes define the Grok 4.3 release. First, aggressive pricing. Input tokens drop from $2.10/M to $1.25/M (about 40% cut); output tokens drop from $4.00/M to $2.50/M (about 38% cut). For the same workload that cost $100/day on Grok 4.2, Grok 4.3 lands around $60. The cuts close the cost gap with the Chinese open-weights coding models that have been pressuring frontier-model pricing throughout 2026. xAI’s positioning is explicit: “frontier capability at challenger pricing.”

Second, 1-million-token context window. Up from the 256K context of Grok 4.2. The 1M window matches Gemini 3.1 Pro and exceeds GPT-5.5’s 200K window. For workloads that benefit from large context — entire codebases, multi-document research, long-running agentic conversations — Grok 4.3 now competes on equal footing with Gemini and ahead of OpenAI’s flagship.

Third, native video input. Where Grok 4.2 handled images natively but required transcoding video to frame-sequences for analysis, Grok 4.3 ingests video directly with audio track included. The use cases this opens up — meeting analysis, content moderation, video-based instruction following, multimodal RAG over recorded content — were technically possible but operationally cumbersome before. Native handling makes them practical.

Beyond the headline three, the release adds several capabilities worth knowing. Downloadable artifact generation: Grok 4.3 can produce ready-to-share PDFs, fully-formatted spreadsheets (xlsx with formulas, conditional formatting, charts), and PowerPoint decks directly from conversation. Independent evaluators report the output quality is genuinely usable rather than the boilerplate output earlier “AI generates documents” features produced.

Specialized benchmark wins: Vals AI’s independent evaluation puts Grok 4.3 at #1 on CaseLaw v2 (79.3% accuracy) — a benchmark covering legal reasoning across US case law — and #1 on CorpFin, covering corporate finance reasoning. The frontier-model rankings on general-purpose benchmarks (MMLU, GPQA, MATH) are roughly comparable across GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, and Grok 4.3; the differentiation is now in specialized domains.

Speech APIs: separate from the chat completions API, xAI now ships standalone Grok Speech-to-Text and Text-to-Speech endpoints. Speech-to-Text handles 90+ languages with diarization (speaker tagging), timestamps, and real-time streaming. Text-to-Speech offers expressive voices with rapid voice cloning from short audio samples. Pricing is competitive with ElevenLabs and OpenAI‘s voice products, with category-leading latency.

Improved agentic performance: tool-call accuracy, JSON mode reliability, and long-horizon task completion rates all show double-digit improvements over Grok 4.2 in xAI’s reported benchmarks. The agentic gains matter most for production deployments running multi-step workflows; quality on these dimensions translates directly into reliability.

Why it matters

  • The frontier-model price floor moves down. Grok 4.3 at $1.25/$2.50 is meaningfully cheaper than GPT-5.5 ($10/$30) or Claude Opus 4.7 ($15/$75) for input and output. For workloads where Grok 4.3’s capability is sufficient, the cost savings are 80%+. Expect OpenAI and Anthropic to respond with their own price cuts within the next quarter.
  • Native video input changes the multimodal application landscape. Frontier models that handle video natively were limited to Gemini and now Grok 4.3. For applications built around video understanding — content moderation, education, sports analytics, security review — the practical options just expanded.
  • The 1M context window competes directly with Gemini 3.1 Pro. Long-context workloads no longer have just one frontier option; they now have two. Comparison shopping based on quality, latency, and cost just got more interesting.
  • Specialized benchmark leadership signals domain depth. CaseLaw v2 and CorpFin aren’t general benchmarks — they reflect deep work on specific verticals. Legal and financial-services teams that previously defaulted to GPT-5.5 or Claude Opus now have a credible alternative tuned to their domains.
  • Speech APIs at competitive pricing pressure ElevenLabs and OpenAI Voice. The standalone Speech APIs aren’t a side product — they’re a serious entry into the voice category. Production voice deployments now have a third credible vendor option.
  • Document artifact generation closes a UX gap. The “AI helped me write something but I still need to format it” friction point disappears for workflows that produce reports, analyses, or presentations. Time savings compound across knowledge-work use cases.

How to use it today

Grok 4.3 is available through xAI’s API directly and through the Grok consumer apps (web, iOS, Android, X integration). Production deployments use the API; consumer evaluation works through the apps.

  1. API quick start. Get an API key from console.x.ai. The API is OpenAI-compatible — existing code that calls OpenAI’s chat completions API works unchanged after a base URL and key swap.
    MASK12
  2. Long-context workloads. The 1M context window unlocks “throw the entire codebase into context” patterns. Pass full repos, document corpora, or extended conversation histories. Cost scales linearly with context size, so be deliberate — at $1.25/M input, a 1M-token prompt costs $1.25 per call. Cache aggressively where patterns repeat.
    MASK13
  3. Native video input. Pass video files (mp4, mov, webm) directly via the multimodal API. Audio tracks are automatically processed alongside visuals.
    MASK14
  4. Document artifact generation. Ask for downloadable outputs in the prompt. The API returns artifact URLs that can be fetched and saved.
    MASK15
  5. Speech APIs. The standalone speech endpoints are separate from the chat completions API.
    MASK16
  6. Migration from other providers. Code that uses OpenAI’s API works after the base-URL swap shown above. Code using LangChain or LlamaIndex needs the provider configuration changed but no other code changes. Plan for a few hours of migration work plus a week of side-by-side evaluation against your benchmarks.
  7. Cost monitoring. The xAI dashboard tracks usage and spend per API key. Set budget alerts in line with your finance team’s preferences. Average token-per-task cost should drop noticeably versus your prior frontier-model spend; verify in your specific workload before declaring victory.

How it compares

Model Context Input price ($/M) Output price ($/M) Native video Notable strength
Grok 4.3 1M $1.25 $2.50 Yes Pricing + multimodal + legal/finance specialization
Gemini 3.1 Pro 1M $3.00 $15.00 Yes ARC-AGI-2 reasoning leader
GPT-5.5 200K $10.00 $30.00 Limited Agentic coding + tool use
Claude Opus 4.7 500K $15.00 $75.00 No Tone, safety, long-form coding
DeepSeek V4 128K $0.30 $1.10 No Open-weights, lowest cost
Grok 4.2 (predecessor) 256K $2.10 $4.00 No Replaced by 4.3

Grok 4.3 sits in a competitive sweet spot: cheaper than the major closed-frontier models by 5-10x, more capable than the cheapest open-weights options on agentic benchmarks, with the longest context window in the closed-frontier tier and the most modalities among the top three. The trade-off versus the absolute leaders: Claude Opus 4.7 still wins on extended-coding workflows, GPT-5.5 still wins on certain agentic-tool-use patterns, and Gemini 3.1 Pro still leads on raw ARC-AGI-2 reasoning. For workloads where these specific edge-case strengths don’t dominate, Grok 4.3’s pricing and capability mix is hard to beat.

The cost-per-correct-answer math also matters. A model with 80% accuracy at $1.25/M input often beats a model with 85% accuracy at $10/M input on total cost — you’re paying less for retries, validations, and corrections. Run your own evals against representative tasks before committing.

What’s next

Three trajectories worth watching for the rest of 2026.

Pricing pressure on closed-frontier models. Grok 4.3’s pricing forces a response from OpenAI and Anthropic. Either they cut prices to compete, differentiate harder on capability that justifies the premium, or watch volume migrate. Expect price adjustments by Q3 2026 — typically tiered (premium tier maintains pricing; new “value” tier appears).

The voice-AI category consolidates. ElevenLabs has been the dominant standalone voice provider. xAI’s entry with Grok Speech APIs at competitive pricing — combined with OpenAI’s voice products and Microsoft’s neural voice offerings — turns voice AI into a four-way competitive market. Expect rapid feature parity, then differentiation on specific use cases (real-time vs offline, language coverage, voice cloning quality).

Specialized benchmark leadership becomes meaningful. General-purpose benchmark scores are now bunched at the top — small differences between frontier models. Specialized benchmarks (legal, finance, medical, code) increasingly differentiate vendors. Expect more specialized evals to emerge and more vendors to claim leadership in specific verticals. The buyer takeaway: evaluate vendors on the specific tasks your workloads actually do, not on general benchmark rankings.

The deeper observation: frontier-model competition in 2026 is no longer about “who has the smartest model in absolute terms.” All four major closed providers (OpenAI, Anthropic, Google, xAI) ship models within a few capability points of each other on most tasks. Competition has shifted to pricing, modality, context, specialization, and developer experience. Grok 4.3 wins more of those dimensions than any competitor today; the picture will likely shift again within months as competitors respond.

Frequently Asked Questions

Is Grok 4.3 actually better than GPT-5.5 or Claude Opus 4.7?

Depends on the workload. On general-purpose benchmarks the three are close. On specialized benchmarks (legal via CaseLaw v2, corporate finance via CorpFin) Grok 4.3 leads. On extended-coding workflows Claude Opus 4.7 still leads. On agentic tool-use patterns GPT-5.5 still leads. Run your own evals against your workloads.

Is the 40% price cut sustainable, or will Grok 4.3 prices rise later?

Public posture is that the new pricing is durable. Realistically, AI pricing across the industry has been trending down as compute economics improve. The risk of a price reversal is lower than it would be for, say, a software license. Lock in your contract terms if your spend justifies it; for variable-spend customers, monitor announcements and benchmark periodically.

Can Grok 4.3 fully replace GPT-5.5 or Claude in our stack?

Probably for many workloads but not all. Migrate one workload at a time, validate quality and reliability, then expand. Multi-model deployments that route different workloads to different providers are increasingly common; consider this pattern instead of a wholesale switch.

What about safety and content moderation?

Grok 4.3 has reportedly less restrictive content policies than GPT-5.5 or Claude Opus 4.7, more aligned with xAI’s “less filtered” positioning. For most enterprise use cases this is a non-issue. For consumer-facing products, evaluate the safety profile carefully — what’s acceptable depends heavily on your users and brand posture.

Does Grok 4.3 work with our existing OpenAI-API-based code?

Yes, with a base-URL change. xAI’s API is OpenAI-compatible at the request and response level. SDK swaps are minimal. LangChain, LlamaIndex, and most other frameworks support xAI as a provider with one config change.

How does the 1M context window perform in practice?

Quality remains strong through the upper context range, with some softening near the maximum (typical for any large-context model). For workloads at 800K+ tokens, expect modest quality degradation versus shorter prompts. Test specific use cases at expected context sizes before committing to long-context patterns.

Scroll to Top