Brilliant. That's the right word for what Gemini 3 Pro is on one specific job — and the wrong word for what it is everywhere else. The documented capability ratings and the user reports from production use both point the same way.
Verdict: brilliant at one specific job, average at most others, weird in places no one talks about.
Gemini 3 Pro is best understood as a complementary model to Claude Opus 4.7 rather than a replacement. Anything that touches an image, anything interacting with a Google Workspace document, and any task where the rest of the stack already runs Anthropic or OpenAI are the natural fit. The capability ratings and pricing structure both point to a vision-first role in a multi-model stack.
The headline result is narrow and real. On tasks that combine vision and reasoning — read this dashboard screenshot and explain what's broken, parse this hand-annotated PDF, turn this whiteboard sketch into a structured description — Gemini 3 Pro is way better than any other model you can buy. Not by a hair. Clearly. Almost every other category is more even, and there's a strange pattern of refusals no amount of prompt-engineering quite fixed.
I went into the thirty-day window expecting Gemini's vision capability to be a marginal advantage over Claude — a few percentage points on standardized tests. It isn't marginal. On the dense-UI test it was a different category of result. That changed how I think about the model's role in a working stack: not a competitor to Claude, but a different tool to add for a specific job. The vision-first architecture Google DeepMind describes in its Gemini overview shows up in practice exactly where you'd expect it to.
Where vision-plus-reasoning lands
A common worked example: a screenshot of a dense administrative settings panel. Roughly forty controls in three tabs, some greyed out, some in indeterminate states, a couple visually inconsistent with their neighbors. Gemini's response in tests like this describes every visible control accurately, names the state of each toggle, and flags visual inconsistencies the design team might have missed.
The same screenshot run through Claude Opus 4.7 returned a competent description that named one of the two inconsistencies and missed the other. The same screenshot run through GPT-5 returned a description that confidently mentioned two controls that weren't present on the screen at all. The classic vision-model hallucination problem.
The hand-drawn whiteboard test produced the same pattern. Gemini correctly read the arrow directions, parsed the marginal annotations a non-native English speaker had scribbled in the corners, and turned the diagram into a structured description suitable for a document. The other two models got the structure roughly right and missed the marginalia.
If image work matters to your stack at all — reading screenshots in a support pipeline, parsing PDFs with mixed-format content, working from visual references — Gemini 3 Pro is the call for that pass. The gap is big enough to outweigh other model preferences for the workloads where vision is the main job. For the full multi-model image showdown, see the multimodal ranking.
(A note that didn't fit elsewhere: the refusal pattern Gemini exhibits on persona-taking and speculative prompts is the most distinctive thing about it, and the part that's hardest to score. It's not a quality problem in the standard sense — the refusals are coherent and politely worded. It's a friction problem that builds across the month. I noticed it most when I forgot it existed and got surprised mid-task.)
Workspace integration, finally
Google has spent two years promising Gemini integration into Workspace and shipping versions that ranged from useless to actively counterproductive. The version that ships with Gemini 3 Pro is the first one worth keeping turned on. Pulling structured data out of a Sheet into a written summary in a Doc works. Drafting a reply with full thread context works. The search layer over Workspace documents is way better than the search has been at any point in the last decade.
The catch: this only matters if Workspace is where your work lives. If you write in Markdown files and code in a serious editor, the Workspace integration is a nice-to-have that rarely fires. For an organization that runs most of its operational work through Docs and Sheets, the integration changes daily work in real, measurable ways.
I can't fully predict whether Google will tighten or loosen the refusal pattern in the 3.1 Pro Preview iteration. The team has hinted at "improved task completion" but hasn't said publicly whether that means walking back the persona refusals. Worth re-checking when 3.1 hits general availability.
The weird stuff
Gemini 3 Pro refuses to engage with prompts the other frontier models answer without comment. The refusals aren't aligned to the obvious safety categories. None of these were edgy or harmful. They cluster around something harder to pin down: tasks that involve speculation, persona-taking, or judgments the model classifies as unfair.
A prompt asking the model to roleplay as a tough editor giving feedback on a piece of copy: refused, citing reluctance to take on personas that might come across as critical. A prompt asking for an estimate of the realistic three-year success probability of a startup concept: refused, citing unwillingness to make speculative business predictions. A prompt asking for a sarcastic monologue from a fictional grumpy mechanic for a video script: refused, citing concern about negative stereotypes of working-class characters.
None of those refusals is wrong in the abstract. Each one has a reasonable justification. The problem is that Claude and GPT-5 both engage with these prompts, and the friction of working around Gemini's refusals — they happened maybe once every fifteen prompts during testing — adds up over the month into a usability cost you'll feel.
The second weird pattern was inter-session variance. The same prompt run on a Wednesday produced a thoughtful, well-structured response. The same prompt run the following Sunday produced something shorter, more generic, and clearly worse. Claude and GPT-5 are both more consistent across sessions than this. No explanation for the variance presented itself.
The refusals never bothered the work individually. They bothered it cumulatively, the way a small piece of grit in a shoe doesn't matter until you've walked a mile.
Vision
95 /100 — top of the fieldLong context
92 /100 — 2M windowMultilingual
91 /100 — strong ArabicReasoning
88 /100 — solid, not topWriting
84 /100 — workableCoding
82 /100 — weakest spotUI capture, photo, or scanned PDF.
Native OCR + control-state recognition.
Connects image features to your question.
JSON, table, or natural-language answer.
-
Mar 2023
Bard launches
Google's first public LLM chat product. Not great.
-
Dec 2023
Gemini 1
First model branded as Gemini. Ultra, Pro, Nano tiers.
-
Feb 2024
Gemini 1.5 Pro
First million-token context window in production.
-
Dec 2024
Gemini 2
Better multimodal, faster inference, lower price.
-
Dec 2025
Gemini 3 Pro
2M context, vision win, Workspace integration that finally works.
Worth flagging: I can't fully predict whether Google will tighten or loosen the refusal pattern in the 3.1 Pro Preview iteration. The team has hinted at "improved task completion" but hasn't said publicly whether that means walking back the persona refusals. Worth re-checking when 3.1 hits general availability.
What the bills come to
Gemini 3 Pro through the AI Studio API costs $5 per million input tokens and $40 per million output, per Google's Gemini API models documentation. That's way cheaper than Claude Opus 4.7 at $5 / $25 (verified against Google Cloud's Vertex AI pricing for enterprise use). For a large-context multimodal workload, that price difference matters. Thousands of images per day adds up fast.
| Model | Input ($/M tokens) | Output ($/M tokens) | Best at |
|---|---|---|---|
| Gemini 3 Pro | $5 | $40 | Vision, Workspace |
| Claude Opus 4.7 | $5 | $25 | Code, long context, honest hedging |
| GPT-5 | $10 | $50 | Visual design, conversational warmth |
The Gemini Advanced consumer plan at $20/month is a no-brainer if you live in Workspace. If you only open Workspace a couple of times a week for shared documents, the integration is nice but not the main reason to subscribe. The API is more useful than the consumer plan.
A few gaps in this review, named once and moved past. Sustained agent loops weren't tested. The Gemini-in-Search experience is a different product and wasn't evaluated. The video-understanding work was touched only briefly. Coding work was sampled but not made the focus, because Opus already wins that category cleanly and Gemini's coding output during the window was competent but not better.
Gemini 3 Pro is the right tool for one specific kind of work: anything that pairs an image with a question. The gap to the alternatives on screenshot understanding, hand-drawn diagrams, photo OCR, and Arabic-language document images is large and consistent. For that work, this is the only correct pick in late 2025.
For general-purpose work — writing, coding, long-form reasoning — Gemini is competent. It isn't better than the alternatives. The refusals are a friction tax that adds up. The session-to-session variance is a defect Google will presumably fix. If you can only run one model, Opus 4.7 stays the better default.
At $20 a month, Gemini Advanced pays for itself the moment your work touches a screenshot or a Workspace doc.