Gemini Omni unifies Google's generative stack. Image, video, and likely audio in one model means simpler product, simpler API, and shared improvements across modalities — a structural advantage over competitors maintaining separate model lines. Editing-first positioning is the right bet. Production video work needs editing, not just generation. The in-chat object swap and scene rewrite capabilities address what professional creators actually need. I/O 2026 keynote pressure just went up. The leak

Gemini Omni enters a crowded video AI market. The comparison table below shows the major players as of mid-May 2026. ModelVendorStrengthsWeaknesses Gemini Omni (leaked)GoogleUnified image/video, in-chat editing, templates, deep Gemini integrationNot yet released; cinematic quality reportedly behind Seedance 2 Veo 3.1GoogleProduction-grade, available via Vertex AI, partner integrationsSoon to be eclipsed by Omni; separate from chat workflows Sora 2OpenAIStrong cinematic quality, integrated with C

Google Gemini Omni Leaks: New Video Model Hits Before I/O 2026

Q: What's next

Signals to watch over the next three weeks. Google I/O 2026 keynote (May 19): is Omni demoed live, what are the specific capabilities Google emphasizes, what's the rollout timeline. Pricing announcements: API pricing per second of video generated, Gemini Advanced consumer pricing, Vertex AI enterprise pricing. Benchmark releases: third-party comparisons against Sora 2, Veo 3.1, Seedance 2 will land within days of public availability. Developer access cadence: which developers get API access firs

Google’s next AI video model, Gemini Omni, surfaced May 2, 2026 through a UI string accidentally exposed inside the Gemini app — “Start with an idea or try a template. Powered by Omni.” Within hours the leak expanded: testers found editable templates, in-chat scene rewrites, object swaps inside clips, and even watermark removal that all worked unusually well for a first public glimpse. The reveal arrives days before Google I/O 2026 (May 19-20), where Omni is expected to be the headline AI announcement. The Gemini Omni leak reshapes how Google is positioned against Sora, Veo 3.1, ByteDance’s Seedance 2, Runway, and Kling 3.0 — and signals that Google is unifying its image and video generation under one omni-modal foundation rather than continuing parallel Veo and Imagen tracks.

Want the complete, hands-on version of this guide?Get the Eguide →

What’s actually new

The leaked UI string from May 2 was the smoking gun. Inside the Gemini app’s video tab, a string read: “Create with Gemini Omni: meet our new video model, remix your videos, edit directly in chat, try templates, and more.” That sentence does three pieces of work. It confirms the name (Omni, not Veo 4). It confirms the form factor (chat-driven video generation and editing). And it confirms the positioning (Google’s new video AI is Gemini-native, not a separate product line).

The capabilities glimpsed in the leak are substantial. Object swaps inside existing clips work — a tester replaced a coffee cup with a wine glass mid-shot, lighting and reflections updating in the swap. Scene rewrites via chat work — “make the protagonist sad instead of cheerful” updated facial expression and posture across multiple frames. Watermark removal works — a controversial capability for sure, but technically impressive. Templates allow users to start from a structured prompt rather than a blank chat. These features collectively cross the threshold from “video generation” to “video editing inside a generation model,” which has been the missing piece for production use.

The naming change matters. Google’s prior video model line was Veo (1, 2, 3, 3.1). The shift to “Gemini Omni” suggests a unified Gemini model that handles image and video together — and possibly audio too — rather than a separate Veo model alongside Imagen. This mirrors OpenAI’s trajectory: Sora was initially separate from GPT, but GPT-5.5 Pro increasingly incorporates video capabilities natively. Google appears to be making the same architectural move.

On raw generation fidelity, early testers report Omni lags behind ByteDance’s Seedance 2 on cinematic quality. Where Omni wins is editability — the in-chat editing workflow is sharper than competitors, and the integration into the Gemini app means existing Gemini users get video generation without learning a new tool. Google is betting that the workflow matters more than peak fidelity for most use cases.

The leak also surfaced “templates” — pre-structured prompts for common video types (product demo, social ad, talking head, scene-to-scene). Templates lower the barrier for non-expert users who don’t know how to prompt a video model effectively. This is a deliberate consumer-facing move, not just a developer-facing one.

Why it matters

Gemini Omni unifies Google’s generative stack. Image, video, and likely audio in one model means simpler product, simpler API, and shared improvements across modalities — a structural advantage over competitors maintaining separate model lines.
Editing-first positioning is the right bet. Production video work needs editing, not just generation. The in-chat object swap and scene rewrite capabilities address what professional creators actually need.
I/O 2026 keynote pressure just went up. The leak set expectations. Google now needs Omni’s keynote demo to match or exceed what testers saw. A weak demo would be a brand hit.
Veo brand may be retired. If Omni replaces Veo entirely, existing Veo workflows (and the Vertex AI integrations Google partners built) need migration paths. Worth watching for announced backward compatibility.
The video AI race compresses further. Sora, Veo/Omni, Seedance, Runway, Kling, Higgsfield’s stack — there are now six-plus serious commercial video AI products. Differentiation has to come from workflow, integration, and pricing, not raw generation quality alone.
Watermark removal will face legal scrutiny. If Omni ships with watermark removal as a feature, expect IP and platform-policy debates. Google may gate this behind specific terms or disable it for consumer use.

How to use it today

Gemini Omni isn’t publicly available as of May 15, 2026. The official unveil and rollout are expected at Google I/O 2026 (May 19-20). What you can do today to prepare:

Watch the Google I/O 2026 keynote live. The keynote runs May 19 with developer sessions May 19-20. Google’s AI announcements typically lead the keynote.

# Google I/O 2026 dates
May 19, 2026 — Keynote (10am Pacific typical)
May 19-20    — Developer sessions

# Live stream:
https://io.google/2026/
# YouTube: search "Google I/O 2026" on Google's YouTube

Sign up for Gemini early access if you haven’t. New Gemini features typically roll out to existing Gemini Advanced and Gemini Business users first.

# Gemini Advanced (consumer):
https://gemini.google.com/

# Gemini for Workspace (business):
Through Google Workspace admin console; AI features are
included in some plans, add-on in others.

Try existing Veo for baseline comparison. Veo 3.1 is the current production model. Trying it now gives you the baseline against which Omni will be judged.

# Vertex AI access
gcloud projects list
gcloud config set project YOUR_PROJECT
gcloud services enable aiplatform.googleapis.com

# Veo 3.1 model endpoint
# Available through Vertex AI Studio:
# https://console.cloud.google.com/vertex-ai/generative

Prepare prompts you’ll want to test on Omni. The leak suggests Omni handles complex multi-shot prompts and chat-style edits well. Draft a representative set of prompts now so you can test rapidly when access opens.

# Example prompt patterns worth testing:

# Multi-shot prompt
"Open on wide shot of a modern kitchen. Cut to close-up
of hands chopping vegetables. Cut to medium shot of the
chef tasting the dish. 15 seconds total. Cinematic
lighting, shallow depth of field."

# Edit-style prompt (assumes existing clip uploaded)
"In the clip I just uploaded, change the protagonist's
shirt from blue to red. Keep everything else the same."

# Template-style prompt
"Use the product demo template. Product: a smart water
bottle. Brand colors: deep teal and white. 30 seconds.
End with the brand logo."

Audit your video workflow for integration points. If you use AI video today (Veo, Runway, Sora, Pika, Kling), identify where Gemini Omni would slot in. The API shape matters; Google is likely to launch with both Vertex AI and consumer-facing access.

# Workflow audit checklist:
- Where do you generate AI video today?
- Which tool handles editing after generation?
- What's your typical generation-to-final time?
- What costs are you paying per minute of finished video?
- Where would in-chat editing eliminate steps?

Test the editing workflow on competitor tools now. Runway’s editing features and Pika’s edit tools are the closest existing analogs to what Omni demonstrated. Trying them establishes your baseline.
```
# Runway Gen-3 editing
https://runwayml.com/

# Pika editing
https://pika.art/

# Both have free tiers for quick comparison
```

Follow Google’s developer blog and Gemini changelog. Official details will land via Google’s developer channels at I/O and in the days following.

# Sources to monitor:
https://blog.google/technology/google-deepmind/
https://developers.googleblog.com/
https://ai.google.dev/changelog
# Also watch:
@GoogleDeepMind on X
@sundarpichai on X (for executive announcements)

Plan rollout strategy for your team. If Omni delivers on the leak’s promise, internal teams will want access. Decide on access governance now — who gets Gemini Advanced, how usage is tracked, what’s permitted for client work.

How it compares

Gemini Omni enters a crowded video AI market. The comparison table below shows the major players as of mid-May 2026.

Model	Vendor	Strengths	Weaknesses
Gemini Omni (leaked)	Google	Unified image/video, in-chat editing, templates, deep Gemini integration	Not yet released; cinematic quality reportedly behind Seedance 2
Veo 3.1	Google	Production-grade, available via Vertex AI, partner integrations	Soon to be eclipsed by Omni; separate from chat workflows
Sora 2	OpenAI	Strong cinematic quality, integrated with ChatGPT	Limited editing capability, slower iteration cycle
Seedance 2	ByteDance	Best raw cinematic fidelity per testers; strong motion	Less workflow integration, China-based vendor risk
Runway Gen-3	Runway	Mature editing UX, professional creator focus	Smaller training corpus; pricing higher than alternatives
Kling 3.0	Kuaishou	Strong on character consistency, action sequences	China-vendor risk for some buyers; English UX less polished
Pika 2.0	Pika	Fast iteration, social-friendly outputs	Less suited for long-form production

What distinguishes Gemini Omni in the leaked form: editability in chat, unified modality, and the distribution advantage of shipping inside Gemini’s existing user base. The risks: cinematic quality lagging the leaders, no public benchmarks yet, and the unknown of how Google will price the model relative to Veo and to competitors.

What’s next

Signals to watch over the next three weeks. Google I/O 2026 keynote (May 19): is Omni demoed live, what are the specific capabilities Google emphasizes, what’s the rollout timeline. Pricing announcements: API pricing per second of video generated, Gemini Advanced consumer pricing, Vertex AI enterprise pricing. Benchmark releases: third-party comparisons against Sora 2, Veo 3.1, Seedance 2 will land within days of public availability. Developer access cadence: which developers get API access first, what waitlist mechanics exist.

The longer-term picture. Gemini Omni signals Google’s strategic bet on omni-modal models — single foundation models that handle all modalities — as the right architecture rather than specialized per-modality models. If Omni delivers, expect every major AI lab to follow with their own omni-modal pivots. OpenAI’s Sora 2 + GPT-5.5 separation may compress into a unified next-gen GPT. Anthropic’s Claude already handles vision and is rumored to be adding video understanding; whether it adds generation is the open question.

For creators and businesses, the next 6-12 months should produce dramatic cost reductions and quality improvements in AI video. Today’s $0.50-$2.00 per second of generated video may drop to $0.10-$0.50 per second by end-of-year. Quality at the budget tier will exceed today’s premium tier. The result: video AI moves from “experimental” to “operational” for most marketing, education, and entertainment use cases.

Frequently Asked Questions

When will Gemini Omni be publicly available?

Not yet confirmed, but I/O 2026 (May 19-20) is the likely unveil. Past Google AI launches have rolled out to Gemini Advanced subscribers within days of the keynote, with broader rollout over weeks. Vertex AI access for developers usually follows within 1-3 weeks of consumer availability.

Does Gemini Omni replace Veo 3.1?

Unclear from the leak. Three possibilities: Omni replaces Veo entirely (consolidating the brand), Omni runs alongside Veo (Omni for chat use cases, Veo for API and enterprise), or Omni is the new name for what was Veo 4. Google’s I/O announcements will clarify. Existing Veo workflows likely retain access for some transition period regardless.

How does Gemini Omni compare to OpenAI’s Sora?

On raw cinematic quality, early testers suggest Sora 2 maintains a small edge. On workflow integration, Omni’s in-chat editing is a step ahead of Sora’s current capability. On pricing and availability, the picture will become clear post-I/O. For most production use, the workflow advantage may matter more than raw quality differences.

Can Gemini Omni really remove watermarks?

The leak suggested yes, technically. Whether Google ships the capability is a separate question — it raises IP and platform-policy concerns. A conservative product launch would gate watermark removal behind specific terms or disable it for consumer use. Expect Google to address this directly at I/O.

What’s an “omni-modal” model and why does it matter?

An omni-modal model handles multiple modalities (text, image, audio, video) in a single foundation, rather than separate models per modality. Benefits: shared training improvements, consistent outputs across modalities, simpler API. Costs: harder to train, larger model size. The industry direction in 2026 is toward omni-modal as the dominant architecture.

Should I wait for Gemini Omni before starting a video AI project?

Depends on the project. For experimentation or proof-of-concept, waiting two weeks costs little and the Omni reveal will inform tool choice. For production work needed today, current tools (Veo 3.1, Sora 2, Runway, Seedance) deliver value. The “wait for the next thing” trap is real in fast-moving AI markets; pick a tool, ship work, evaluate as new options emerge.

Go deeper than this article

This article covers the essentials. Our premium eguide “How to Use Google Veo 3.1” gives you the full step-by-step playbook — prompts, workflows, and copy-paste recipes you can put to work today.

Get “How to Use Google Veo 3.1” →