Pricing Guide·June 2026

Llama 4 Scout pricing: 10M context — the largest available window at ~$0.11/1M

Q: What can you actually do with a 10 million token context?

10 million tokens holds approximately 7,500 pages of text, an entire large software repository (millions of lines of code), or years of document history. Practical use cases: loading a full enterprise codebase for architectural analysis; processing a multi-year corpus of regulatory filings; maintaining complete conversation history for long-running agents without summarization truncation; analyzing multiple books or datasets in a single context. No commercial closed-model API offers 10M context.

Q: Is Llama 4 Scout as capable as Llama 4 Maverick?

No — Scout is a lighter model optimized for context length and throughput, not reasoning depth. Maverick scores higher on general benchmarks including SWE-bench and reasoning tasks. Scout's strength is specifically the 10M context window and lower inference cost. For tasks that fit in 1M context and require stronger reasoning, Llama 4 Maverick is the better choice. Scout is the right pick specifically when 2M+ context is a hard requirement.

Q: How does Llama 4 Scout's 10M context compare to Gemini 3.1 Pro's 2M?

Llama 4 Scout offers 10M context vs Gemini 3.1 Pro's 2M — 5× larger. Via API, Llama 4 Scout costs approximately $0.11 per million input vs Gemini 3.1 Pro at $2/1M (for inputs under 128K) — 18× cheaper on input. For use cases that specifically need 2M+ context, Scout is the only available option and at dramatically lower cost. The tradeoff is reasoning quality — Gemini 3.1 Pro leads on most benchmark dimensions.

Llama 4 Scout holds a unique position: the largest context window in any available model — 10 million tokens — enough for entire software repositories, multi-year document archives, or any long-context workload that no other model can process in a single call. Free open weights from Meta, ~$0.11/1M via third-party APIs.

By the benchr team · Updated June 10, 2026 · Figures verified against official sources, June 6, 2026 · View changelog

License Costopen weights

Input / 1Mvia Together.ai

Contextmax window

Context ratiovs next largest

Pricing options

llama-4-scout — pricing options
Option	Cost
Self-hosted (Meta weights)	Infrastructure only
Together.ai input	~$0.11/1M
Together.ai output	~$0.34/1M
Context window	10,000,000 tokens

10M context: what it actually unlocks

Ten million tokens is approximately 7,500 pages of text. To put this in engineering terms: a 1M-line codebase with documentation fits comfortably. A decade of customer support transcripts. The complete works of a prolific author many times over. The second-largest commercial context window — Gemini 3.1 Pro at 2M — handles 1,500 pages. Scout handles 5× that. For applications where the constraint has always been context length — not reasoning quality — this is a fundamental capability unlock.

Practically: a software company could load its entire 800K-line production codebase plus 200K lines of tests and documentation into a single Scout context call. No chunking, no retrieval, no loss of cross-file reasoning from splitting. This is not a marginal improvement over 2M context — it enables workflows that were architecturally impossible at smaller context sizes.

Scout vs Maverick: the context-quality tradeoff

Llama 4 Maverick is the stronger reasoning model — better on SWE-bench, deeper on complex analysis, more capable on general benchmark tasks. Scout trades reasoning depth for context length and throughput. The decision is binary based on your context requirement: if your task fits in 1M tokens, use Maverick for better quality at slightly higher cost. If your task requires 2M–10M tokens, Scout is the only option — there is no alternative with this context range, open or closed.

Hardware for self-hosting

Scout's MoE architecture is lighter than Maverick, with fewer active parameters per inference step. Full precision fits on approximately 2× A100 40GB; with quantization, a single A100 80GB or H100 handles serving at reasonable throughput. For the 10M context window to be practically usable, you need sufficient KV cache memory — processing a 10M token context call requires substantial VRAM. In practice, most self-hosted deployments target the 1M–2M context range where memory requirements are more manageable.

Cost scenarios

At 5M input + 1M output per month (long-document processing): approximately $0.55 + $0.34 = ~$0.89/month via Together.ai. Gemini 3.1 Pro at the same volume (inputs under 128K): $10 + $12 = $22/month — 25× more expensive. The Scout economics are so favorable that the question becomes: is the quality gap on your specific task worth the 25× price difference? For tasks specifically requiring 2M+ context, there is no Gemini alternative — Scout is the only viable option.

Use-case fit

Best for: Full-repository code analysis; multi-year document archive processing; long-running agent sessions that must maintain complete context without summarization; any workflow where the 2M context ceiling of the next available option is a hard constraint; cost-sensitive long-context retrieval where chunking introduces coherence loss.

Skip if: Your tasks fit in 1M context and reasoning quality is more important than context ceiling — Llama 4 Maverick or Gemini 3.1 Pro are stronger reasoning models. Also skip for multimodal (image) tasks — Scout is text-only.

Decision checklist

Identify your actual p95 context length requirement. If it's under 500K tokens, Llama 4 Maverick (1M context, stronger reasoning) is the better choice. Only choose Scout when you have tasks that regularly need 1M+ tokens and the reasoning quality tradeoff is acceptable.

Verify provider support: contact your inference provider to confirm maximum supported context length before building architecture that depends on the full 10M window. Self-hosting with dedicated hardware is the reliable path to the full context advantage.

Frequently asked

What can you actually do with a 10 million token context?

7,500 pages of text, a full large software repository, multi-year document archives, or any long-context workload that no other model can process in a single call. No commercial closed-model API offers 10M context. Use cases: full-codebase analysis without chunking, multi-document synthesis, long-running agent sessions.

Is Llama 4 Scout as capable as Llama 4 Maverick?

No — Scout is optimized for context length and throughput, not reasoning depth. Maverick scores higher on reasoning and SWE-bench. Choose Scout specifically when you need 2M+ context; use Maverick when reasoning quality is the priority and 1M context is sufficient.

How does Llama 4 Scout's 10M context compare to Gemini 3.1 Pro's 2M?

Llama 4 Scout is 5× larger (10M vs 2M). At approximately $0.11 per million input via API, Llama 4 Scout is approximately 18× cheaper than Gemini 3.1 Pro at $2/1M. For tasks requiring 2M+ context, Scout is the only option. The reasoning quality tradeoff vs Gemini 3.1 Pro must be evaluated on your specific task distribution.

Changelog

June 10, 2026 — Expanded with 10M context analysis, Scout vs Maverick comparison, hardware requirements, and cost scenarios.
June 6, 2026 — Published.

Sources

Meta Llama 4 — llama.meta.com (verified June 6, 2026)
Together.ai pricing — together.ai/pricing (verified June 6, 2026)
benchr models.json — verified June 6, 2026