Kimi K2.6, reviewed

An open-weight trillion-parameter model that runs a swarm of sub-agents across thousands of steps. Free to download, cheap on the API.

· View changelog · Figures verified against official sources, 30 May 2026

Agent Swarm 300 agents Up to ~300 sub-agents across ~4,000 steps in one run
SWE-Bench Verified 80.2 Moonshot-reported, leading open-weight coding
Context window 256K 262,144 tokens. Long, not a million
Weights license $0 Modified MIT, free to download and self-host

Most models give you one assistant that thinks in a straight line. Kimi K2.6's pitch is that you can hand it a big, messy job and it will split the work across a crowd. Moonshot AI calls the mode Agent Swarm: a single autonomous run can spin up to roughly 300 specialized sub-agents and keep them coordinated across as many as 4,000 steps, dividing a research or coding task among workers instead of forcing it all through one chain of thought. For long agentic work, that's a different shape of capability than a bigger context window or a faster decode, and it's the reason this model is worth a separate look from the rest of the open-weight field.

The supporting fact is that the model underneath the swarm is good on its own. Kimi K2.6 is an open-weight, trillion-parameter mixture-of-experts model with native multimodality, published by Moonshot AI on Hugging Face under a Modified MIT license. Moonshot announced it on its official forum on April 21, 2026 (some third-party writeups say April 20). The coding scores it ships with, led by 80.2 on SWE-Bench Verified, put it in the conversation with paid frontier coders while costing nothing to download. So the question this review answers isn't "is it good." It's "what is the swarm for, and what do you discount."

One flag to read before anything else: every headline number below is Moonshot's own, from its model card and blog, not an independent leaderboard. The comparison columns against GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro are Moonshot-run, and some carry asterisks for non-standard conditions. Strong signal, not settled fact.

Which version this is

The Kimi to care about is K2.6. It supersedes the earlier Kimi K2, K2 Thinking, and K2.5 lines, and as of late May 2026 nothing newer had shipped (the model card was last touched May 19). On the API the model id is kimi-k2.6; the open weights live at moonshotai/Kimi-K2.6 on Hugging Face, public and non-gated, with a mirror on ModelScope. One bit of trivia that trips people up: the repo's internal model_type is still kimi_k25, shared with the K2.5 architecture class, so don't read the version off the config. Read it off the name.

What the Agent Swarm is for

The swarm is built for tasks that are too big or too branching for a single agent loop to finish cleanly: deep web research that fans out across hundreds of pages, multi-file refactors, or any job where parallel exploration beats one long sequential trace. The cleanest evidence that it does something is BrowseComp, Moonshot's web-research benchmark, where the score rises from 83.2 to 86.3 once Agent Swarm is switched on. That's a real delta, and it's the number to point at when someone asks whether the swarm is marketing or mechanism.

It's also the feature most likely to be oversold, so be precise about the limits. "Up to ~300 sub-agents" and "up to ~4,000 steps" are ceilings, not what a typical run uses, and more agents means more tokens billed and more ways for a long run to wander. The benchmark gain is reported on a research task Moonshot chose; your refactor may see less. Treat the swarm as a tool for a specific class of long, parallelizable problems, not a free upgrade you leave on for everything.

Codes like a paid model, on Moonshot's own numbers

The coding case is what earns Kimi K2.6 the "open alternative to the frontier" label. On the official model card, it posts 80.2 on SWE-Bench Verified, 76.7 on SWE-Bench Multilingual, and 58.6 on the harder SWE-Bench Pro, alongside 89.6 on LiveCodeBench v6 and 66.7 on Terminal-Bench 2.0. The reasoning and agentic numbers back it up: 96.4 on AIME 2026, 90.5 on GPQA-Diamond, 73.1 on OSWorld-Verified for computer use, and 54.0 on Humanity's Last Exam with tools. For a model you can download for free, that's a serious sheet.

For cross-shopping, this is the open-weight tier's strongest coding entry right now, and it's worth seeing where it lands against the rest of the free-to-download field. benchr's survey of the open-weight tier, right now covers how Kimi stacks against the other open options, and the DeepSeek-V4 review is the natural head-to-head if you're choosing between open coding models. If you'd rather see how it reads next to the open-weight model with the biggest marketing budget, the Llama 4 review is the other obvious comparison.

Context window: long, not enormous

Kimi K2.6 carries a 256K-token context, stated as "256K" on the model card and pinned to exactly 262,144 tokens (256 × 1,024) on the API pricing page. Those two figures agree, which is more than you can say for a lot of context claims. 256K is enough to hold a sizable codebase, a long agent trace, or a stack of documents in a single prompt without retrieval gymnastics, and it pairs well with the agentic angle, because long autonomous runs generate a lot of intermediate context to keep around.

What it is not is a million-token window, and that distinction matters when you're sizing a job. Don't plan around feeding it an entire monorepo or a book-length corpus in one shot; 256K is the hard ceiling. For how to think about window size against what you'll use in practice, benchr's comparison of context windows across models sets the scale, and the piece on how million-token context numbers get marketed is the useful corrective if a spec sheet ever tempts you to over-read a big headline number. Kimi's 256K is honest and useful precisely because it isn't inflated.

What it costs, and how to run it

There are three ways to use Kimi K2.6, and they have very different economics. The first is free: download the open weights under the Modified MIT license and self-host. The second is the hosted Kimi API at platform.kimi.ai, where kimi-k2.6 bills $0.16 per million input tokens on a cache hit, $0.95 per million input on a cache miss, and $4.00 per million output. The third is the chat and agent modes at kimi.com, plus "Kimi Code" at kimi.com/code for a CLI coding agent; the web app has a free access tier, though Moonshot doesn't publish the exact limits.

For almost everyone, the hosted API is the right call. $4.00 per million output is cheap next to the dollars-per-million that closed frontier coders charge, and it skips the operations burden entirely. Self-hosting a trillion-parameter mixture-of-experts model is a real undertaking, a multi-GPU server plus the engineering to keep it serving, and it pays off only at high, steady volume or when data legally can't leave your network. benchr's guide to running models on your own machine walks through where that line sits, and for this size of model it sits higher than people expect. For sizing the decision by workload rather than sticker rate, the price-per-use-case breakdown is the tool: agent swarms burn tokens, and a long multi-agent run can move you from "cents" to "real money" faster than a single chat would.

The verdict

Kimi K2.6 is the most interesting open-weight release of the spring because it competes on a different axis than most: not the longest context or the cheapest token, but the most capable agentic behavior you can download for free. The Agent Swarm is a real differentiator for long, parallelizable work, the coding scores sit with the paid frontier, and the Modified MIT license means you're never locked in. The 4.3 here reflects that: high marks held back by one thing, which is that every headline figure is Moonshot's own and the swarm's ceilings are easy to over-read.

Go with the hosted API if you want the agentic and coding strengths without standing up hardware; it's cheap and drop-in. Bring the weights in-house only when privacy rules demand it or your volume is steady enough to pay off a GPU server. Skip the swarm unless your task fans out into parallel work; for a tight sequential job, a single agent loop is cheaper and easier to reason about. And stick with a closed frontier model when you need accuracy that someone other than the vendor has tested, because a strong vendor-reported sheet isn't the same as an independently reproduced one. On capability per dollar for agentic and coding work, though, this is the open model to beat.

Frequently asked

What is Kimi K2.6's Agent Swarm?

Agent Swarm is Kimi K2.6's mode for long autonomous work. Moonshot describes it as scaling to roughly 300 specialized sub-agents that coordinate across up to about 4,000 steps in a single run, dividing a large task among workers instead of pushing everything through one chain of thought. The clearest signal that it does something is the BrowseComp web-research score, which Moonshot reports rising from 83.2 to 86.3 with Agent Swarm turned on. Those are Moonshot's own numbers, so treat them as the ceiling and test the mode on your own task before relying on it.

Is Kimi K2.6 free, and what is the license?

The weights are free to download. Kimi K2.6 is published on Hugging Face under a Modified MIT license, the same open-weight lineage as earlier Kimi K2, so you can self-host at no licensing cost. There is also a free chat tier at kimi.com, though the exact free-tier limits are not stated in an official source. What costs money is the hosted API, billed per token, and the hardware you need to run a trillion-parameter model yourself.

How good is Kimi K2.6 at coding?

Strong, on Moonshot's own numbers. The model card reports 80.2 on SWE-Bench Verified, 76.7 on SWE-Bench Multilingual, 58.6 on the harder SWE-Bench Pro, 89.6 on LiveCodeBench v6, and 66.7 on Terminal-Bench 2.0. That puts it in the conversation with closed frontier coders as the leading open-weight option. Every figure is vendor-reported from the official model card and blog, not an independent leaderboard, so benchmark it on your own repository before you commit.

What is Kimi K2.6's context window?

256K tokens. The official model card states a 256K context length, and the API pricing page pins it to exactly 262,144 tokens (256 times 1,024). That is a solidly long window, enough to hold a sizable codebase or a long agent trace in one prompt, but it is not a million-token window. Treat the 256K figure as the hard ceiling and don't plan around anything larger.

Should I self-host Kimi K2.6 or use the API?

For most teams the hosted API wins: kimi-k2.6 is about $0.95 per million input tokens on a cache miss and $4.00 per million output, cheap next to closed frontier coders, with no servers to babysit. Running a trillion-parameter mixture-of-experts model in-house takes a multi-GPU box and real serving engineering, worth it only at steady high volume or when data legally can't leave your walls. And watch the swarm: a 4,000-step run fans out into a lot of tokens, so meter an agentic job before you turn it loose.

Changelog

  • May 30, 2026 — Originally published. Version, license, pricing, context window, and the agentic and coding scores verified against Moonshot AI's official Hugging Face model card, the kimi.com blog, the forum.moonshot.ai announcement, and the platform.kimi.ai pricing page. All benchmark figures are labeled Moonshot-reported and not independently reproduced; the 256K context is confirmed as 262,144 tokens, not a million-token window.

References

  1. Moonshot AI, "Kimi-K2.6 model card," huggingface.co/moonshotai/Kimi-K2.6, last modified May 19, 2026.
  2. Moonshot AI, "Meet Kimi K2.6: Advancing Open-Source Coding," forum.moonshot.ai, April 21, 2026.
  3. Moonshot AI, "Kimi K2.6 blog," kimi.com/blog/kimi-k2-6, accessed May 2026.
  4. Moonshot AI, "Chat pricing (kimi-k2.6)," platform.kimi.ai/docs/pricing/chat-k26, accessed May 2026.
  5. Moonshot AI, home, moonshot.ai, accessed May 2026.