Google’s next flagship model, Gemini 3.5 Pro, did not arrive in June as the company had signaled. It is now expected in July 2026, with Google pointing to feedback from enterprise testers about excessive token consumption during extended agentic tasks. In plain terms, the model was using too many tokens, and therefore costing too much, when asked to run long, multi-step jobs.
A slipped release date is not usually big news. But the reason behind this one is worth a closer look, because it points to a shift in what actually matters as AI models get more capable.
What Went Wrong
Agentic tasks are jobs where a model works through many steps on its own: planning, calling tools, checking its work, and iterating toward a goal. These workflows are where a lot of the excitement in AI is right now, because they move beyond one-shot answers toward systems that can actually get things done.
The catch is that every step consumes tokens, and tokens cost money. If a model is inefficient, a long agentic task can rack up a surprisingly large bill and run slower than users expect. Enterprise testers apparently ran into exactly that, and Google chose to hold the release rather than ship a model that would be expensive to run at scale.
Why Efficiency Is the New Battleground
For a while, the race between AI labs was mostly about raw capability: which model scored highest on benchmarks. That still matters, but cost efficiency has quietly become just as important. A model that is marginally smarter but twice as expensive to run is a hard sell for a business trying to deploy AI across thousands of tasks a day.
Google’s decision reflects that reality. Getting the economics right, especially for the long agentic workflows that enterprises increasingly want, can matter more than winning a benchmark by a few points. The company evidently decided a polished, efficient launch was worth a few extra weeks.
What It Means for You
If you are choosing AI tools, the lesson is to look past the headline capabilities and pay attention to cost per task on your actual workload. A model that looks impressive in a demo can become uneconomical once you run it at volume. The practical move is to test candidates on realistic jobs and measure both quality and total cost before committing.
For the broader market, the delay is a healthy sign. Labs holding back releases to fix efficiency problems means the models that do ship are more likely to be genuinely usable in production, not just impressive on paper. When Gemini 3.5 Pro does land, the token-efficiency work that delayed it may end up being one of its most valuable features.
Go deeper than this article
This article covers the essentials. Our premium eguide library gives you the full step-by-step playbooks — prompts, workflows, and copy-paste recipes you can put to work today.