Most model reviews ask one question: is this thing good? Qwen3.6 makes you ask a better one. The pitch isn't a single number on a single leaderboard, it's coverage. You get two open-weight sizes under the same permissive license, both free to download, and between them they handle a lot of different jobs without ever sending you to a billing page. That breadth, not any one benchmark, is what makes the family worth your attention.
Qwen released the two open-weight Qwen3.6 models in April 2026, per the official GitHub README news section: the 35B-A3B mixture-of-experts variant on April 16 and the 27B dense model on April 22. Both ship under Apache 2.0, on Hugging Face and ModelScope, with no per-seat fee and no clause that blocks commercial use. So the interesting question stopped being whether an open model can keep up and became which size in the family fits the job in front of you.
One housekeeping note before the numbers, because the name is a moving target. The slug says "qwen," but the current open family is Qwen3.6, which builds on Qwen3.5 from February 2026 and supersedes the original Qwen3 from April 2025. If you've seen blog posts claiming a larger hosted flagship shipped in May, treat that as unconfirmed: no official Qwen source backs it up, and the GitHub org still pins Qwen3.6 as the current LLM series. This review covers the two variants that exist in the open.
A whole lineup, laid out
Here's the whole open family in one view. Two sizes, one license, one context spec. The differences that matter for a buying decision are size and architecture, not capability tier, because both are multimodal and both carry the same window.
| Variant | Size | Context | License |
|---|---|---|---|
| Qwen3.6-27B (dense) | 27B dense, multimodal | 262,144 native, up to ~1,010,000 | Apache-2.0 |
| Qwen3.6-35B-A3B (MoE) | 35B total, ~3B active, multimodal | 262,144 native, up to ~1,010,000 | Apache-2.0 |
Read the size column carefully, because it's where the two diverge. The 27B is a dense model: every parameter fires on every token. The 35B-A3B is a mixture of experts, 35B parameters total but only about 3B active per token, which is why it can serve cheaper per token at inference while still drawing on a larger pool of weights. Same family, same license, two different cost-and-quality trades. For the wider picture of how a multi-size family compares to one-size open releases, benchr's guide to small language models covers where a model this size earns its keep against the giants.
Codes well, on Qwen's own numbers
The headline capability is coding. Qwen positions the 27B dense model as flagship-level coding in a small package, and its self-reported tables back the framing: 77.2 on SWE-bench Verified, 53.5 on SWE-bench Pro, 59.3 on Terminal-Bench 2.0, and 83.9 on LiveCodeBench v6. The 35B-A3B trails it on the overlapping tests, posting a self-reported 73.4 on SWE-bench Verified. On general reasoning the two run close: 86.2 MMLU-Pro and 87.8 GPQA Diamond for the 27B, against 85.2 and 86.0 for the MoE.
The architecture under those numbers is worth a line. Qwen describes a hybrid Gated DeltaNet plus Gated Attention design, with the mixture-of-experts setup in the 35B-A3B, plus a "Thinking Preservation" feature that keeps reasoning context across turns. The practical read: the family is tuned for agentic coding and repository-level reasoning rather than chat, which is exactly where the long context earns its keep.
Context that makes it a coding family
Both models state a 262,144-token native context, extensible up to 1,010,000. That's the spec that turns a small open model into a credible coding tool, because a window that large can hold a real codebase in the prompt instead of stitching it together with retrieval. The same window covers long-document review, multi-file refactors, and agent runs that accumulate state over many turns.
Two cautions keep this honest. First, treat the roughly one-million-token figure as the extended ceiling, not the default you should reach for blindly. Accepting a context that large is not the same as using all of it accurately, and the difference between the two is the entire subject of benchr's look at how million-token context claims get marketed. Second, recall degrades across long inputs for every model, so benchmark retrieval on your own long prompts. For the cross-model picture of which windows hold up under load, benchr's roundup of context windows compared sets the baseline you should measure Qwen against.
What it costs to run
The license is the whole pricing story, and the story is short: $0. Both models are Apache-2.0, so there's no licensing cost and no restriction on commercial use. Free chat access is available through Qwen Studio if you just want to try the family. The official API runs through Alibaba Cloud Model Studio, with OpenAI- and Anthropic-compatible endpoints, but no official per-token hosted price could be confirmed from Qwen or Alibaba pages, so don't budget against a number you can't see. If you want a managed endpoint, check the live rate on Model Studio rather than trusting a blog.
What "free weights" doesn't cover is the hardware to run them, and this is where the family's two sizes pay off. A 27B dense model and a 35B mixture-of-experts that activates around 3B per token are both small enough to self-host on modest gear, runnable through Transformers, vLLM, SGLang, llama.cpp via GGUF, or MLX on a Mac. That's a different proposition from the giant open models that demand a multi-GPU server. benchr's guide to running models on your own machine walks through where the hardware line sits, and for Qwen3.6 it sits low enough that a single capable workstation is in play.
Where Qwen3.6 sits against the field
The open-weight tier is crowded, and the way to think about Qwen3.6 is breadth at small scale. It isn't trying to be the single biggest open model. It's offering a coherent family you can size to the job, all under one permissive license. The natural cross-shop is DeepSeek, which competes on raw coding scores with much larger mixture-of-experts weights, and benchr's DeepSeek-V4 review lays out that trade. For the full survey of what's worth downloading right now and where each family lands, benchr's read on the open-weight tier is the map.
The buying logic comes down to scale of job. If you can run a model that fits on hardware you already own, you don't pay for capacity you don't need, and Qwen3.6's two sizes are built for exactly that. The bigger open models win when you truly need the extra weights and have the GPUs to feed them. Most teams don't, most of the time.
The verdict
Qwen3.6 is a strong argument that the best open-weight value isn't a single hero model, it's a well-sized family. Two free Apache-licensed variants, a long context that makes them real coding tools, and small enough footprints to self-host without a server farm. The 4.4 here reflects that breadth, held just short of the top by the fact that every published score is Qwen's own and the hosted pricing can't be confirmed, so the numbers you're buying on are the vendor's until you reproduce them.
Go with the Qwen3.6-27B dense model if you want the strongest single choice in the family, since it tops the pair on Qwen's tables and self-hosts on modest hardware. Reach for the 35B-A3B when you want lower active compute per token and can use the mixture-of-experts trade. Skip the family only if your work needs a larger open model than the 27B and 35B sizes on offer, or if you require independently verified accuracy that a vendor-reported leaderboard can't give you. For everyone running coding and reasoning jobs that fit, the value is hard to argue with: a whole lineup, free, that you size to the task.
Frequently asked
Is Qwen3.6 free?
The weights are free. Both Qwen3.6-27B and Qwen3.6-35B-A3B ship under the Apache-2.0 license on Hugging Face and ModelScope, so you can download them and self-host with no licensing cost and no restriction on commercial use. Free chat access is also available through Qwen Studio. The official API runs through Alibaba Cloud Model Studio; no official per-token hosted price could be confirmed from Qwen or Alibaba pages, so check the live rate there if you want a managed endpoint.
Which Qwen3.6 model should I run, the 27B or the 35B-A3B?
Start with the 27B dense model. On Qwen's own published tables it generally tops the 35B-A3B mixture-of-experts variant, including a self-reported 77.2 on SWE-bench Verified versus 73.4 for the MoE, and Qwen positions it as flagship-level coding in a 27B dense package. Reach for the 35B-A3B when you want lower active compute per token at inference time, since it activates roughly 3B parameters out of 35B total. Both are multimodal and both carry the same long context.
How long is Qwen3.6's context window?
Both Qwen3.6 model cards state a 262,144-token native context, extensible up to 1,010,000 tokens. That puts a real codebase or a large document set inside a single prompt. Treat the roughly one-million-token figure as the extended ceiling rather than the default, and benchmark recall on your own long inputs before you rely on the full window, because usable accuracy across a context that large is not the same thing as the maximum the model will accept.
Are Qwen3.6's benchmark scores independently verified?
No. The scores quoted here are Qwen's own self-reported numbers from its official Hugging Face model cards, including 77.2 on SWE-bench Verified and 87.8 on GPQA Diamond for the 27B. They had no independent third-party reproduction cited in the official sources as of late May 2026. Read them as a vendor-reported ceiling that tells you what the model can do under Qwen's harness, then test on your own tasks before betting production work on a leaderboard figure.
Is there a larger Qwen3.6 flagship model?
Not in the Qwen3.6 series. The two confirmed open-weight Qwen3.6 variants are the 27B dense model and the 35B-A3B mixture-of-experts model; no larger Qwen3.6 size is listed in the official sources. Third-party blogs claim a separate hosted flagship released in May 2026, but that could not be confirmed in any official Qwen source and should be treated as unverified. The earlier Qwen3.5 series did include a larger mixture-of-experts model, but that is a previous, separate generation, not part of Qwen3.6.
Changelog
- May 30, 2026 — Originally published. The two open-weight variants, Apache-2.0 license, 262,144-to-1,010,000-token context, and the coding and reasoning scores verified against the official QwenLM GitHub README and the Qwen Hugging Face model cards for Qwen3.6-27B and Qwen3.6-35B-A3B. All benchmark figures are labeled Qwen-reported and not independently reproduced; hosted per-token pricing could not be confirmed from any official source. Claims of a larger May 2026 flagship are unverified and excluded.
References
- Qwen, "Qwen3.6 repository," github.com/QwenLM/Qwen3.6, accessed May 2026.
- Qwen, "Qwen3.6 README (news section, release dates)," raw.githubusercontent.com/QwenLM/Qwen3.6/main/README.md, accessed May 2026.
- Qwen, "Qwen3.6-27B model card," huggingface.co/Qwen/Qwen3.6-27B, accessed May 2026.
- Qwen, "Qwen3.6-35B-A3B model card," huggingface.co/Qwen/Qwen3.6-35B-A3B, accessed May 2026.
- Qwen, organization page, huggingface.co/Qwen, accessed May 2026.
- QwenLM, GitHub organization, github.com/QwenLM, accessed May 2026.