Kimi K2.7-Code API pricing: a coding specialist at $0.95/1M

Moonshot's K2.7-Code keeps Kimi K2.6's headline rate — $0.95/1M input, $4/1M output — on a model tuned for code. The pricing story isn't the sticker rate. It's the thinking-token efficiency and a Highspeed tier that doubles every number.

By the benchr team · · Figures verified against official sources, June 23, 2026 · View changelog

Input / 1Mcache miss · Moonshot AI
Output / 1MMoonshot AI
Cache hit / 1Mrepeated input
Context262,144 tokens

Pricing breakdown

kimi-k2.7-code and the Highspeed tier — official Moonshot AI pricing
TierK2.7-CodeK2.7-Code-Highspeed
Input (cache miss)$0.95$1.90
Input (cache hit)$0.19$0.38
Output$4.00$8.00
Throughputstandard~180 tok/s
Context window262,144262,144

All five numbers per million tokens, read off platform.kimi.ai on June 23, 2026. The Highspeed column is exactly double the base model on every billed line — Moonshot prices the speed tier as a flat 2× rather than a separate rate card.

Same sticker rate as K2.6 — the difference is thinking tokens

Put the two Kimis side by side and the rate card barely moves. Kimi K2.6 bills $0.95/$4.00 with a $0.16 cache hit; K2.7-Code bills $0.95/$4.00 with a $0.19 cache hit. The cache hit is a hair more expensive here, which matters if your pipeline reuses a large system prompt thousands of times a day. Everything else on the sticker is identical.

So why would you move? Moonshot's pitch for K2.7-Code is efficiency, not price: it says the model spends roughly 30% fewer thinking tokens on coding work than K2.6. Output tokens are where the bill lives at $4/1M, and reasoning models bury a lot of output inside their own thinking trace. If the 30% claim holds on your tasks, you pay the same per token but emit fewer of them — a real cut to the effective bill without a single price changing. Treat that as a vendor claim until you've metered it: log token counts on a representative batch of your own tickets before and after, and compare the totals, not the rates.

When the Highspeed tier earns its 2×

K2.7-Code-Highspeed is the same model on faster serving: about 180 tokens per second, up to roughly 260 in short-context runs, for double the price across the board. The math is blunt. You're paying $8/1M output instead of $4 to halve the wait. That's worth it for interactive work — a coding agent a developer is watching, a live refactor in an editor, anything where a human is blocked on the response. It's a waste on anything asynchronous. Batch jobs, overnight evals, CI pipelines, and bulk refactors don't care about latency, so route them to the standard tier and keep the other $4.

Cost scenarios

Take a coding-agent workload at 20M input + 8M output per month. On standard K2.7-Code that's $19 + $32 = $51/month; with a 90% cache hit on the input it drops to about $5.50 + $32 = $37.50/month. The same volume on Highspeed runs $38 + $64 = $102/month — exactly double, the price of speed. If Moonshot's thinking-token claim holds and your output drops ~30% to 5.6M, standard falls to roughly $19 + $22.40 = $41.40/month before caching. Output volume, not the rate, is the lever on this model.

Use-case fit

Best for: coding agents and refactor pipelines that lean on Kimi's open-weight coding strength; reasoning-heavy code work where the thinking-token efficiency shows up in the output bill; teams that want a Modified-MIT model they can later bring in-house.

Skip if: your work is general chat or writing rather than code — K2.6 covers the same price at broader strengths. Skip Highspeed entirely unless a human is waiting on the tokens.

Decision checklist

Meter the thinking-token claim before you bank on it: run a representative batch on K2.6 and K2.7-Code, log total output tokens (not just the rate), and see whether the ~30% materializes on your task mix. If it doesn't, there's no reason to switch off K2.6.

Split your traffic by latency: interactive sessions to Highspeed, everything asynchronous to the standard tier. Paying 2× across the board because a fraction of calls are interactive is the most common way to overspend on this model.

Frequently asked

How much does Kimi K2.7-Code cost?

$0.95 per 1M input on a cache miss, $0.19 on a cache hit, and $4.00 per 1M output, with a 256K (262,144-token) context — the same headline input and output rate as Kimi K2.6. The weights are open under a Modified MIT license, so self-hosting costs nothing in licensing. Verified on platform.kimi.ai, June 23, 2026.

What is the Kimi K2.7-Code-Highspeed tier, and is it worth it?

The same model served faster — Moonshot quotes ~180 tokens/second, up to ~260 in short-context runs — at exactly double the base price: $1.90/$0.38/$8.00 per million. Worth it only when latency is the bottleneck (interactive agents, live coding). For batch or overnight work, the standard tier is half the cost for the same output.

Is Kimi K2.7-Code cheaper to run than Kimi K2.6?

The rate card is identical ($0.95/$4.00), with cache hits at $0.19 versus K2.6's $0.16. The difference Moonshot claims is efficiency — roughly 30% fewer thinking tokens on coding tasks, which lowers the effective output bill on reasoning-heavy work at the same per-token price. Meter your own workload before assuming the saving.

Changelog

  • — Published. Pricing for kimi-k2.7-code and kimi-k2.7-code-highspeed verified on platform.kimi.ai/docs and the official Hugging Face model card; recorded in model-figures.json. No official release date is published by Moonshot.

Sources