Pricing breakdown
| Tier | K2.7-Code | K2.7-Code-Highspeed |
|---|---|---|
| Input (cache miss) | $0.95 | $1.90 |
| Input (cache hit) | $0.19 | $0.38 |
| Output | $4.00 | $8.00 |
| Throughput | standard | ~180 tok/s |
| Context window | 262,144 | 262,144 |
All five numbers per million tokens, read off platform.kimi.ai on June 23, 2026. The Highspeed column is exactly double the base model on every billed line — Moonshot prices the speed tier as a flat 2× rather than a separate rate card.
Same sticker rate as K2.6 — the difference is thinking tokens
Put the two Kimis side by side and the rate card barely moves. Kimi K2.6 bills $0.95/$4.00 with a $0.16 cache hit; K2.7-Code bills $0.95/$4.00 with a $0.19 cache hit. The cache hit is a hair more expensive here, which matters if your pipeline reuses a large system prompt thousands of times a day. Everything else on the sticker is identical.
So why would you move? Moonshot's pitch for K2.7-Code is efficiency, not price: it says the model spends roughly 30% fewer thinking tokens on coding work than K2.6. Output tokens are where the bill lives at $4/1M, and reasoning models bury a lot of output inside their own thinking trace. If the 30% claim holds on your tasks, you pay the same per token but emit fewer of them — a real cut to the effective bill without a single price changing. Treat that as a vendor claim until you've metered it: log token counts on a representative batch of your own tickets before and after, and compare the totals, not the rates.
When the Highspeed tier earns its 2×
K2.7-Code-Highspeed is the same model on faster serving: about 180 tokens per second, up to roughly 260 in short-context runs, for double the price across the board. The math is blunt. You're paying $8/1M output instead of $4 to halve the wait. That's worth it for interactive work — a coding agent a developer is watching, a live refactor in an editor, anything where a human is blocked on the response. It's a waste on anything asynchronous. Batch jobs, overnight evals, CI pipelines, and bulk refactors don't care about latency, so route them to the standard tier and keep the other $4.
Cost scenarios
Take a coding-agent workload at 20M input + 8M output per month. On standard K2.7-Code that's $19 + $32 = $51/month; with a 90% cache hit on the input it drops to about $5.50 + $32 = $37.50/month. The same volume on Highspeed runs $38 + $64 = $102/month — exactly double, the price of speed. If Moonshot's thinking-token claim holds and your output drops ~30% to 5.6M, standard falls to roughly $19 + $22.40 = $41.40/month before caching. Output volume, not the rate, is the lever on this model.
Use-case fit
Best for: coding agents and refactor pipelines that lean on Kimi's open-weight coding strength; reasoning-heavy code work where the thinking-token efficiency shows up in the output bill; teams that want a Modified-MIT model they can later bring in-house.
Skip if: your work is general chat or writing rather than code — K2.6 covers the same price at broader strengths. Skip Highspeed entirely unless a human is waiting on the tokens.
Decision checklist
Meter the thinking-token claim before you bank on it: run a representative batch on K2.6 and K2.7-Code, log total output tokens (not just the rate), and see whether the ~30% materializes on your task mix. If it doesn't, there's no reason to switch off K2.6.
Split your traffic by latency: interactive sessions to Highspeed, everything asynchronous to the standard tier. Paying 2× across the board because a fraction of calls are interactive is the most common way to overspend on this model.