Grok Build Beta: xAI Launches 8-Parallel-Agent Coding CLI

Grok Build is xAI’s first agentic coding CLI, launched in early beta on May 14, 2026. The headline feature: up to eight parallel AI agents working concurrently on separate branches of a codebase, each running through a three-stage plan-search-build workflow. The underlying model — grok-code-fast-1 — is purpose-built for coding tasks, scores 70.8% on SWE-Bench Verified, and is priced at an aggressive $0.20 per million input tokens. Grok Build runs locally with no source code transmitted to xAI servers by default, putting it in direct competition with Anthropic’s Claude Code and OpenAI’s Codex CLI but with distinctly different architecture choices.

What’s actually new about Grok Build

Three concrete shifts from the existing coding-CLI landscape. First, the parallel-agent execution model. Grok Build spawns up to eight concurrent sub-agents, each working on a separate code branch with its own planning context. The pattern is similar in spirit to Anthropic’s multi-agent Code Review, but applied to authoring rather than reviewing — the agents propose changes in parallel, and the developer (optionally aided by the upcoming Arena Mode) picks the winner. For tasks where multiple plausible solutions exist (refactoring choices, library selection, naming conventions), parallel exploration produces qualitatively different results from single-thread execution.

Second, the explicit plan-search-build workflow. Before touching any file, Grok Build produces a plain-English plan listing step-by-step the files it will modify, the commands it will run, and the intermediate checks. The plan is human-reviewable; you can approve it as-is, edit it, or reject and ask for a different approach. Only after explicit approval does the agent move to execution. The pattern mirrors how senior engineers actually work — research and plan before code — and gives developers a yield point that competing tools either omit or implement less rigorously.

Third, the local-first architecture. By default, Grok Build runs inference on local file content without transmitting source to xAI’s servers. For teams in regulated industries (finance, healthcare, defense) or with proprietary codebases that can’t leave the perimeter, this is a meaningful design choice. The trade-off is that some features (Arena Mode, cloud-based scaling) require opt-in to cloud execution; the privacy default is local.

Why Grok Build matters for the coding-agent landscape

  • The coding-agent race now has four serious entrants. Anthropic Claude Code, OpenAI Codex CLI, Google Gemini CLI / Jules, and now xAI Grok Build. Competition is producing rapid feature iteration and downward price pressure.
  • Parallel-agent execution is a meaningful capability advance. Eight agents exploring eight approaches in parallel is qualitatively different from one agent doing one thing. Expect Claude Code and Codex to follow with similar features within months.
  • $0.20 per million input tokens is aggressive pricing. Claude Opus 4.7 sits at $5/M input; GPT-5.5 at $2.50/M. Grok’s coding-specific model at one-tenth the price changes the cost equation for high-volume code generation.
  • Local-first execution unlocks regulated-industry adoption. Some of the largest enterprise pipelines have been blocked by data-sovereignty requirements; a local-first CLI removes that blocker.
  • Plan-before-execute reduces blast radius. The explicit approval step before file changes means fewer “the agent did something I didn’t expect” incidents. Most teams that adopt Grok Build will report this is the feature they appreciate most.
  • Arena Mode (when it ships) will be the differentiator. Automated scoring of parallel agent outputs before developer review is a workflow shift; the early beta doesn’t have it live yet, but the code is in the binary and demos have been shown internally.

How to use Grok Build today

Grok Build is in early beta as of May 14, 2026. Access is opening to xAI Premium+ subscribers, SpaceXAI internal users, and a rolling invite-only list for early enterprise partners. Here’s the setup path.

  1. Confirm access. You need either an xAI Premium+ subscription ($200/month, includes Grok 4.3 plus Grok Build) or an enterprise pilot agreement. Visit x.ai/grok-build from a logged-in xAI account to request access.
  2. Install the CLI. The binary is available for macOS (Apple Silicon and Intel), Linux (x86_64 and arm64), and Windows. The install script handles platform detection and PATH setup.
# Install Grok Build (macOS / Linux)
curl -fsSL https://grok.build/install.sh | sh

# Or with Homebrew
brew install xai/tap/grok-build

# Windows (PowerShell)
iwr https://grok.build/install.ps1 -useb | iex

# Verify
grok --version
grok auth login
  1. Initialize in a project. Run grok init from the project root to create a .grok/ directory with configuration. The init pass reads your repo to understand the codebase before any actual work.
# In your project directory
cd ~/dev/my-project
grok init

# This creates .grok/config.yaml with sensible defaults
# Includes which file patterns to consider, which to ignore
# Reads .gitignore + adds Grok-specific exclusions

# Inspect the generated config
cat .grok/config.yaml
  1. Run your first task. Grok Build accepts natural-language instructions. The agent produces a plan, you review, then it executes.
# Run a coding task
grok run "Refactor the auth module to use JWT instead of session cookies. Keep the existing route handlers but update the middleware."

# Output:
# Analyzing codebase...
# Found relevant files:
#   src/middleware/auth.ts
#   src/routes/login.ts
#   src/routes/logout.ts
#   tests/auth.test.ts
#
# Plan:
# 1. Add jsonwebtoken dependency
# 2. Update src/middleware/auth.ts to verify JWT instead of session
# 3. Update src/routes/login.ts to issue JWT
# 4. Update src/routes/logout.ts to invalidate JWT
# 5. Update tests/auth.test.ts for new flow
# 6. Run tests to verify
#
# Approve plan? [y/n/edit]:
  1. Enable parallel agents for exploration. The --parallel flag spawns multiple agents working on different approaches; you compare results before merging.
# Spawn 4 parallel agents exploring different approaches
grok run "Improve the database query performance in the user search endpoint" \
    --parallel=4 \
    --branches=approach-a,approach-b,approach-c,approach-d

# Each agent creates a separate git branch
# Each produces a different implementation
# Compare with:
grok review approach-a approach-b approach-c approach-d

# Pick the best and merge
grok accept approach-b
  1. Configure local-first vs cloud mode. By default, Grok Build runs inference locally if you have appropriate hardware (16GB+ RAM, modern GPU helps but isn’t required); falls back to cloud only if the local hardware can’t handle the model. For mandatory local mode (regulated environments), set execution: local-only in config.
# .grok/config.yaml
model: grok-code-fast-1
execution: local-only   # never call xAI cloud
local_inference:
  device: auto          # cpu, cuda, metal, or auto
  context_window: 32768
  quantization: fp8     # fp16, fp8, int8, or int4

# Or for cloud mode (faster, more capable)
execution: cloud
api_key: ${GROK_API_KEY}
  1. Use the grok chat interactive mode for exploratory work. Like Claude Code’s chat mode but with the parallel-agent option always available.
# Interactive session
grok chat
# Within the session:
# > /spawn 4    # spawn 4 parallel agents
# > /plan       # produce a plan for the current question
# > /run        # execute the approved plan
# > /diff       # show pending changes
# > /accept     # commit approved changes
# > /reject     # discard changes
# > /exit       # leave the session

How Grok Build compares to Claude Code, Codex CLI, and Gemini CLI

Feature Grok Build (xAI) Claude Code (Anthropic) Codex CLI (OpenAI) Gemini CLI / Jules (Google)
Underlying model grok-code-fast-1 (purpose-built) Claude Opus 4.7 / Sonnet 4.6 GPT-5.5 with reasoning Gemini 3.5
SWE-Bench Verified 70.8% ~74-78% (Opus 4.7) ~70-75% (GPT-5.5) ~65-72%
Parallel agents Up to 8 native Multi-agent Code Review (shipping) Limited / experimental Not yet
Plan-before-execute Yes (mandatory by default) Yes (configurable) Optional Variable
Local-first inference Yes (default for code privacy) No (cloud only) No (cloud only) No (cloud only)
Pricing (input tokens) $0.20/M $5/M Opus, $3/M Sonnet $2.50/M $0.30/M Flash
Subscription tier $200/mo Premium+ $20-$200/mo (Pro through Max) $20-$200/mo (Plus through Pro) $20-$200/mo (Pro through Ultra)
Multi-step orchestration Built-in via parallel agents Via Agent SDK + Advisor/Executor Via Codex orchestration Via Vertex Agent Builder
Repo size handled Up to ~50K files local; cloud for larger Cloud, large repos OK Cloud, large repos OK Cloud, large repos OK

Grok Build’s clearest advantages are the aggressive pricing (10-25x cheaper per token than Claude/GPT) and the local-first option. Its clearest gap is the SWE-Bench score — 70.8% is competitive but trails Claude Opus 4.7. For routine and high-volume coding tasks where cost matters more than peak quality, Grok Build’s economics are compelling. For complex multi-file refactors or hard reasoning tasks, Claude Code’s higher quality may earn the premium.

The parallel-agents feature is the wild card. If Grok Build’s eight-agent exploration consistently produces better outputs than single-agent runs from competitors, the quality gap closes regardless of base model. Early users report mixed results — for routine tasks, single-agent execution is faster and the parallel exploration adds overhead; for ambiguous or design-heavy tasks, parallel agents surface options a single agent would miss.

What’s next for Grok Build and the agent CLI race

Three threads to watch over the next 90 days. First, Arena Mode going live. The automated scoring layer that ranks parallel agent outputs before developer review is in the binary but not yet enabled. xAI has signaled “summer 2026” for the public rollout; if it ships well, it’s a workflow advantage that competitors will scramble to match.

Second, enterprise distribution. xAI is reportedly negotiating distribution agreements with major IT vendors and managed-service providers. Expect specific deals analogous to OpenAI-Dell for Codex (covered separately) — xAI partnerships with HPE, Lenovo, or another major hardware vendor would put Grok Build on the same competitive footing as Codex on-prem.

Third, the open-weight question. xAI has publicly released earlier Grok weights; whether grok-code-fast-1 will be open-weighted is undecided. An open-weight release would dramatically expand Grok Build’s footprint — every team that builds custom AI tooling could use grok-code-fast-1 as a foundation — but would also reduce xAI’s leverage to differentiate via the model. The decision is reportedly being made over the next 60-90 days.

The broader competitive dynamics. The four-way race (Anthropic, OpenAI, Google, xAI) is producing both feature acceleration and price compression. Expect the major coding agents to ship parallel-agent execution, local-first options, and plan-before-execute workflows within the next six months. Pricing for coding-specific model tiers will likely drop another 30-50% over the same period as competition intensifies.

Frequently Asked Questions

Is Grok Build free?

No. Access requires xAI Premium+ at $200/month, which includes Grok 4.3 plus Grok Build. Limited enterprise pilots are available without the subscription on a case-by-case basis. The aggressive per-token pricing ($0.20/M input, $0.40/M output for cloud mode) applies to usage beyond included quotas. The free Grok tier does not include Grok Build access as of May 2026, though xAI has indicated this may change once the beta exits.

Does Grok Build send my code to xAI’s servers?

By default, no — local mode runs inference on your machine without transmitting source code. If you opt into cloud mode (faster, larger context, additional features like Arena Mode), code is sent to xAI’s servers. The mode is configured per-project in .grok/config.yaml; you can lock projects to local-only execution for regulated workloads. xAI’s data handling policies state that code in cloud mode is not used for model training and is retained per the standard 30-day operational window.

How does the eight-parallel-agent feature actually work?

When you run grok run --parallel=N, the CLI spawns N concurrent sub-agents. Each agent gets the same task description and the same view of the codebase, but independent random seeds and slightly different prompting that encourages exploration of different approaches. Each agent works on its own git branch. After all agents finish, you can review the diffs side-by-side and pick the best (or merge elements from several). The default cap of 8 is configurable; for very simple tasks, 2-4 agents are usually enough.

Will Grok Build work in my IDE or just the CLI?

CLI only at launch. xAI has announced VS Code, JetBrains, and Cursor integrations on the roadmap, with VS Code targeted for Q3 2026. For now, the CLI works alongside any editor — you run Grok Build commands in a terminal and the editor sees the file changes through normal file watching.

Is grok-code-fast-1 available outside of Grok Build?

Yes, through the xAI API. Developers can call grok-code-fast-1 directly for custom integrations at the same $0.20/M input pricing. The CLI is a wrapper around the model plus the multi-agent orchestration; the underlying capability is available for any team building their own coding tools.

How does the plan-before-execute mode handle long-running tasks?

For multi-step tasks (refactors that touch dozens of files, multi-stage feature implementations), the plan is hierarchical — a top-level plan with sub-plans for each major step. You can approve the top-level plan and let the agent autonomously execute, or require approval at each sub-step. The configuration is per-task: --auto-approve trusts the agent through execution; --checkpoint-on-step pauses at each sub-step. For high-stakes work, checkpoint mode is the safer default.

What happens if a parallel agent produces broken code?

Each agent runs the project’s tests (if configured) after producing changes. Agents whose changes break the tests are flagged in the comparison view. Arena Mode (when it ships) will go further — it will rank outputs by automated criteria including test pass rate, code style adherence, and (with cloud mode) LLM-as-judge quality assessment. For now, developers manually compare the diffs and pick the winner. The CLI provides a grok diff approach-a vs approach-b command for side-by-side comparison.

Scroll to Top