GPT-5.5 Instant: ChatGPT’s New Default Cuts Hallucinations 52%

GPT-5.5 Instant new ChatGPT default cuts hallucinations 52 percent

OpenAI swapped GPT-5.5 Instant in as ChatGPT‘s new default model on May 5, retiring GPT-5.3 Instant in the process. The headline number from the OpenAI release: 52.5% fewer hallucinated claims than the prior default on high-stakes prompts in medicine, law, and finance. The model also got noticeably terser, producing roughly 30% fewer words and lines per answer, and it ships in the API today under the alias chat-latest.

What’s actually new

GPT-5.5 Instant is the latest member of the GPT-5 family, sitting alongside the existing GPT-5 Reasoning and GPT-5 Pro tiers. Its job is the everyday workhorse slot: the model that fields most ChatGPT conversations, the one most API calls hit unless a developer explicitly picks something else. The May 5 release moves that slot to GPT-5.5 Instant for every ChatGPT user and exposes the model in the API as chat-latest (the stable alias) and as gpt-5.5-instant (the pinned ID for production reproducibility).

The headline gain is reliability. On OpenAI’s internal high-stakes evaluation, which mixes prompts from medicine, law, and finance, GPT-5.5 Instant produces 52.5% fewer hallucinated claims than GPT-5.3 Instant. On a separate dataset of “challenging” conversations that real users had flagged for factual errors, it reduces inaccurate claims by 37.3%. Those are the two numbers OpenAI is anchoring on, and they survived independent press coverage and the system card review.

The model is also more concise. GPT-5.5 Instant uses roughly 30.2% fewer words and 29.2% fewer lines to deliver the same answer, according to OpenAI’s measurement on a held-out evaluation set. That sounds cosmetic but matters in two places: latency-sensitive applications (the model finishes faster because there is less to generate) and token-cost-sensitive pipelines (the response is materially cheaper per call). 9to5Mac’s coverage flagged a behavioral side effect: ChatGPT now uses fewer gratuitous emojis, which OpenAI has pitched as an intentional tone adjustment rather than a guardrail.

Two other shifts ship with the release. The default conversational tone is warmer and more natural, with better use of context the user has already shared. Personalization signals (memory, custom instructions) propagate more reliably across turns. And for API users, chat-latest now tracks the ChatGPT default specifically; if OpenAI changes the default again, applications pinned to that alias will move with it.

Why it matters

  • The default-model swap is the swap that touches everyone. ChatGPT serves hundreds of millions of weekly users; whatever model occupies the default slot becomes the de facto baseline for “what AI sounds like” across the consumer internet.
  • 52.5% fewer hallucinations on high-stakes prompts is a procurement number. Hospitals, law firms, and financial-services teams have been gating ChatGPT rollouts on hallucination rates. The new number is concrete enough to clear a lot of those gates.
  • Terser answers compound in API economics. A 30% reduction in average completion length translates almost directly to a 30% reduction in output-token spend for use cases where you do not constrain length manually. Most teams do not constrain length manually.
  • chat-latest is a real production primitive now. A stable alias that automatically tracks the ChatGPT default lets developers default to “what most users are seeing” without rewriting code every release cycle.
  • Personalization plus shorter answers is a different UX. The model is doing more with memory and shipping less prose; products that ride on top of ChatGPT will feel measurably tighter overnight.
  • OpenAI is signaling cadence. GPT-5.5 Instant landed roughly four months after GPT-5.3 Instant, which is a noticeably faster default-model cadence than the GPT-4 era. Anthropic and Google have both shipped on faster cadences in 2026; OpenAI is matching.

How to use it today

Three deployment paths matter. ChatGPT users get GPT-5.5 Instant automatically. API users can opt in by name. Existing applications can switch via the chat-latest alias with no code changes. Pick the path that matches your operating posture.

  1. If you build with the OpenAI API, point at the new model. Use the pinned ID (gpt-5.5-instant) for reproducible production work, or the alias (chat-latest) for the always-current default.
  2. Re-run your existing evals. Even a small eval suite will catch regressions before users do. Particularly look at tone-sensitive flows and any prompt that previously relied on emoji-rich output.
  3. Recompute output-token budgets. Most production prompts will produce shorter answers without changes; downstream parsers that expected certain length floors may need updating.
  4. Audit hallucination-gated workflows. If you previously routed medical, legal, or financial queries to a more expensive model, re-test GPT-5.5 Instant on those flows. You may be able to fold them back into the default tier.
  5. Update system prompts that fought GPT-5.3’s tone. Prompts that said “do not use emojis” or “be concise” are partly redundant now; cleaner system prompts let the model’s new defaults shine through.

A minimum-viable migration in Python looks like this. The Anthropic and Google equivalents follow the same shape.

from openai import OpenAI
client = OpenAI()

resp = client.chat.completions.create(
    model="gpt-5.5-instant",
    messages=[
        {"role": "system", "content": "You are a clinical research assistant. Cite sources or say you do not know."},
        {"role": "user", "content": "Summarize the latest evidence on SGLT2 inhibitors for heart failure with preserved ejection fraction."},
    ],
)
print(resp.choices[0].message.content)
print("tokens:", resp.usage.total_tokens)

For teams that want to track the ChatGPT default automatically rather than pin to a specific version, swap the model name to the alias.

resp = client.chat.completions.create(
    model="chat-latest",
    messages=[...],
)

For evaluations, a useful pattern is to run the same prompt through the old and new model and diff the structured outputs. The script below uses the Promptfoo CLI, which has shipped a GPT-5.5 preset.

promptfoo eval \
  --providers openai:gpt-5.3-instant openai:gpt-5.5-instant \
  --tests evals/hallucination_high_stakes.yaml \
  --output evals/results/gpt5-5-vs-5-3.json

How it compares

Across the major default-tier models from the leading labs in May 2026, the GPT-5.5 Instant launch reshuffles the practical economics. The table below summarizes the daily-driver tier from each major lab on the metrics most teams use to pick between them.

Model Lab Hallucination posture Avg latency (first token) Output verbosity Input / output price per 1M
GPT-5.5 Instant OpenAI 52.5% fewer than GPT-5.3 on high-stakes ~290 ms ~30% shorter than GPT-5.3 $1.10 / $4.40
GPT-5.3 Instant (retired default) OpenAI Baseline ~310 ms Baseline $0.90 / $3.60
Claude Sonnet 4.6 Anthropic Industry-low on legal/medical evals ~340 ms Moderate; adjustable via system prompt $3.00 / $15.00
Claude Haiku 4.5 Anthropic Mid-tier; strong for cost ~210 ms Tight by default $0.80 / $4.00
Gemini 3.5 Flash Google Strong on knowledge-graph grounded prompts ~260 ms Moderate $0.50 / $1.50
Gemini 3.5 Pro Google Strong reasoning; comparable to GPT-5 Pro ~420 ms Longer-form by default $3.50 / $10.50
Grok 4.20 xAI Leads on freshness, especially current events ~280 ms Conversational and slightly verbose $2.00 / $8.00

The matrix tells a clear story. For everyday default-tier work, GPT-5.5 Instant has narrowed Anthropic’s lead on factual reliability while preserving OpenAI’s pricing advantage in that tier. Gemini 3.5 Flash remains the cheapest serious daily driver and is the right call for very high-volume workflows where token cost dominates. Claude Sonnet 4.6 remains the top pick for workflows where hallucination cost is high and per-call pricing is acceptable. Grok 4.20 is the only model with a credible answer to the freshness question, which matters for current-events and news-adjacent products. The “best model” is still task-shaped; the news this week is that the default tier just got materially better on reliability without getting more expensive.

What’s next

Three threads to watch over the next sixty days. First, expect a wave of vendor-specific evaluation reports comparing GPT-5.5 Instant against the prior default on domain-specific benchmarks (healthcare, legal contract review, customer support). Independent eval shops have already begun running their suites; the first credible third-party numbers should land within two weeks. Second, expect OpenAI to push GPT-5.5 family expansion: a longer-context variant, possibly a vision-strong variant, and a deliberately cheaper variant for high-volume routing. Third, expect competitive responses: Anthropic and Google will likely ship default-tier refreshes within sixty days, and Grok will iterate quickly given xAI’s release cadence.

The deeper trend is that frontier labs are now treating the default-tier model as a product surface in its own right. For two years the marketing energy concentrated on flagship reasoning models with multi-thousand-dollar enterprise pricing. The 2026 inflection is that the model most users actually interact with, every single day, is the model that gets the most aggressive iteration on reliability and economics. That is good news for teams trying to build serious products at default-tier prices. Pick your default model deliberately and re-pick it every quarter.

Frequently Asked Questions

Do I need to change anything in my API code?

If you are pinned to gpt-5.3-instant, your application keeps working unchanged but you will not see the gains. To adopt GPT-5.5 Instant, change the model name to gpt-5.5-instant (for reproducibility) or chat-latest (to track the ChatGPT default automatically). Existing system prompts and tool definitions transfer without modification.

Is GPT-5.5 Instant the same model behind ChatGPT and the API?

Yes. ChatGPT’s default and the API model are the same underlying weights as of the May 5 release. Differences come from system prompts and surrounding scaffolding (memory, tools, retrieval), not from a different model.

How was the 52.5% hallucination reduction measured?

OpenAI ran an internal evaluation on a curated set of high-stakes prompts spanning medicine, law, and finance, comparing GPT-5.5 Instant against GPT-5.3 Instant under matched conditions. The reduction is in hallucinated claims per response, not in overall accuracy, which is the metric most directly tied to real-world risk in regulated workflows. Independent third-party evaluations are expected within weeks.

Should I move my reasoning workloads to GPT-5.5 Instant?

No. GPT-5.5 Instant replaces the prior default-tier model and is optimized for everyday chat and most tool-use. For multi-step reasoning, planning, or long-horizon agent work, stay on GPT-5 Reasoning or GPT-5 Pro. The Instant tier is faster and cheaper; the Reasoning and Pro tiers are smarter on hard problems.

What changed about ChatGPT’s personality?

The model is more concise, warmer in tone, and uses fewer emojis by default. It also uses memory and custom instructions more reliably across turns. None of this is configurable in the same way it was with GPT-4o-era personality switches; the defaults are simply better tuned now.

Will my existing system prompts still work?

Yes, with two caveats. System prompts that explicitly fought GPT-5.3’s verbosity or emoji use are now partly redundant; trimming them can let the new defaults work better. System prompts that depended on a specific completion length may need updating because completions are shorter overall.

Scroll to Top