Archive
Every piece, in the order it ran. Reviews of individual models, head-to-head comparisons, and short essays on the practice of working with these tools.
May 2026
-
30 May·26
Claude Mythos: the model you can't use.
Anthropic built a frontier model, then said it won't sell it. Here's what Mythos Preview is, and why it's locked away.
-
30 May·26
Claude Cowork: the desktop agent that isn't for coders.
Give it a goal, point it at your files, let it work. Claude Code's engine, aimed at everyone who isn't a coder.
-
30 May·26
GPT-5.5, reviewed: is the upgrade off GPT-5 worth it.
OpenAI put the gains into agentic coding and computer use, at roughly double GPT-5's API price. Who moves, who waits.
-
30 May·26
Gemini 3.1 Pro, reviewed.
The reasoning leap is real. What to watch is the long-context bill and how fast a 3.5 Pro could supersede it.
-
30 May·26
Gemini 3.5 Flash, reviewed.
Cheap frontier for agent loops, fast on output, and priced to undercut Pro. Just don't confuse it with the old budget Flash.
-
30 May·26
Grok 4.3, reviewed.
The one model that reads X and the live web on its own. Where that wins outright, and where it doesn't.
-
30 May·26
DeepSeek-V4, reviewed.
An MIT-licensed model that codes like a paid one and costs nothing to download. The real decision is how you run it.
-
30 May·26
Qwen3.6, reviewed.
The value here isn't one model. It's a free, Apache-licensed family in two sizes that covers most jobs at zero licensing cost.
-
30 May·26
Kimi K2.6, reviewed.
An open-weight trillion-parameter model that runs a swarm of sub-agents across thousands of steps. Free to download, cheap on the API.
-
30 May·26
Llama 4, reviewed.
A 10-million-token context on open weights still turns heads. But Meta has moved on, and Llama 4 is the last open Llama.
-
30 May·26
Mistral Large 3, reviewed.
The largest open-weight model from a major lab, released under Apache-2.0. Frontier-scale weights you can download.
-
30 May·26
ChatGPT Images 2.0, reviewed.
The first image model that gets text right. Where GPT Image 2 nails dense layouts, and where it still fumbles.
-
30 May·26
Opus 4.8 vs GPT-5.5: the coder's flagship vs the daily driver.
Both charge $5 per million input tokens. After that they pull apart fast. Here's where each one wins and where it loses.
-
30 May·26
Gemini 3.1 Pro vs GPT-5.5: reasoning vs knowledge work.
These two flagships aim at different scoreboards. One chases hardest-mode reasoning, the other all-round professional work. Picking between them starts with that.
-
30 May·26
Grok 4.3 vs ChatGPT: when live context wins.
Grok wires into the live web and X. ChatGPT is the all-rounder. The choice is whether your question is about right now.
-
30 May·26
ChatGPT vs Claude vs Gemini: the everyday pick for 2026.
Three subscriptions priced within a dollar of each other, three different default models. Here's which one is worth yours.
-
30 May·26
Claude vs ChatGPT for long-form writing.
Before voice or style, one boring number decides a lot: how much can each model write in one pass?
-
30 May·26
AI search engines compared: Perplexity vs ChatGPT Search vs Google AI.
An AI answer is only as good as your ability to check it. The question is which one shows its work.
-
30 May·26
The best free coding model: DeepSeek vs Qwen vs Kimi.
Open weights, zero dollars, real code. Three families you can download or chat with for free, ranked by what they score.
-
30 May·26
The best AI for video in 2026: Veo on top, Sora on the way out.
The tool everyone expected to win is leaving the market. The one that leads brings its own soundtrack.
-
30 May·26
The best AI tools for social media.
Captions, hooks, and repurposing — which tool earns its place for each platform.
-
30 May·26
The best free AI for coding.
What you actually get at $0 from Copilot, Cursor, and friends, and the moment the meter starts.
-
30 May·26
The best AI for writing anything long.
Drafting, essays, and long-form, ranked by voice and how far each model holds a thread before the prose sags.
-
30 May·26
The best AI for students who want to actually learn.
Studying, summarizing, and problem-solving: what is fair game, what gets you in trouble, and the free picks worth using.
-
30 May·26
The best AI for resumes and cover letters.
Tailoring to the job, getting past the ATS, and the AI habits that quietly sink an application.
-
30 May·26
The best AI for email, built-in or standalone.
Drafting, replying, and clearing the inbox: when Gmail and Outlook's own AI is enough and when a separate tool wins.
-
30 May·26
The best AI for spreadsheets and the formulas you hate.
Excel Copilot, Sheets' Gemini, and pasting into a chat model: which one actually gets the formula right.
-
30 May·26
The best AI for research without the fake citations.
Literature review and summarizing sources, with the tools that cite honestly versus the ones that make references up.
-
30 May·26
The best AI for Arabic-English translation.
Which models move cleanly between Arabic and English both ways, and the places every one of them still breaks.
-
30 May·26
The best AI for Saudi and Gulf Arabic.
Where the models hold Khaleeji dialect and where they slide back into MSA or drift toward Egyptian.
-
30 May·26
The best AI for customer service at a real business.
Off-the-shelf resolution bots, platform agents, or build-your-own: what each costs and which fits your support volume.
-
30 May·26
The best free AI with no subscription.
The tools that are genuinely free with no credit card in 2026, and the exact point where each free tier taps out.
-
30 May·26
The AI agent that checks out for you.
How agentic shopping works, who is building it, and where it can go wrong.
-
30 May·26
Are AI hallucinations fixed yet?
What got better by 2026, what did not, and the setups that cut made-up answers.
-
30 May·26
Which AI providers train on your chats.
Who learns from your conversations by default, how to opt out, and what stays private.
-
30 May·26
Do AI text detectors actually work?
The false-positive problem, who gets wrongly flagged, and what to do instead.
-
30 May·26
Do you actually need a reasoning model?
When the extra cost and latency of a thinking model pays off, and when it's wasted.
-
30 May·26
How to get cited inside AI answers.
GEO and AEO tactics that get your pages quoted by ChatGPT, Perplexity, and AI Overviews.
-
30 May·26
When the model remembers you.
How persistent memory works across chats, what it buys you, and the privacy trade.
-
30 May·26
What zero-click search did to the web.
AI Overviews and chat answers keep users on the results page. The real numbers on clicks lost.
-
30 May·26
Claude Opus 4.8, reviewed.
Same price as 4.7, a small leaderboard bump, one benchmark it loses, and a real honesty gain that catches its own bugs.
-
30 May·26
Claude Sonnet 4.6, reviewed.
The $3/$15 daily-driver tier. When it's the right default, when to drop to Haiku, and when to pay for Opus.
-
30 May·26
Claude Haiku 4.5, reviewed.
The $1/$5 cost-control tier. Where the cheapest Claude is genuinely enough, and where cheap turns expensive.
-
30 May·26
Cutting your token bill.
Where AI token spend comes from, and the five levers that bring it down: routing, caching, batching, shorter output, lower effort.
-
21 May·26
Why the benchmarks stopped telling you anything.
MMLU is saturated, HumanEval is gamed. A field guide to what's left worth reading.
-
16 May·26
The million-token context was always a marketing number.
Most long-context workloads still belong in a retrieval system, with the narrow cases where the long window is worth the bill.
-
11 May·26
Voice models compared: ElevenLabs, Whisper, OpenAI, Cartesia.
Real latency numbers, Arabic narration tests, and the voice model worth shipping with right now.
-
8 May·26
The price-per-use-case table.
What you pay for AI in 2026 by workload — chat, RAG, agents, batch — with five commercial models compared.
-
5 May·26
Prompt engineering did not die. It got narrower.
Three techniques that still consistently improve outputs in 2026, with before-and-after examples.
-
4 May·26
AI for Arabic content: a working report on five models.
How Modern Standard, Saudi, Egyptian, and Levantine Arabic come out the other side of Claude, GPT-5, Gemini 3, Qwen 3, and Llama 4.
-
2 May·26
Multimodal capability ranking: twelve images, four models.
Vision tested across Claude, GPT-5, Gemini 3, and Llama 4. The winner is not the one in the marketing campaigns.
April 2026
-
28 Apr·26
GPT-5 vs Claude Opus 4.7: seven tasks, scored.
A refactor, a landing page, an obscure legal question, a recipe, a paper summary, a difficult email, and a broken script.
-
22 Apr·26
Claude Opus 4.7, reviewed.
A 1,200-line refactoring task, a 200-page PDF, a multilingual stress test, and what it costs to use the thing daily.
-
17 Apr·26
RAG vs fine-tuning, with the math.
Cost numbers across both approaches, and the three specific scenarios where fine-tuning still pays off.
March 2026
-
18 Mar·26
AI agents, eighteen months in.
A skeptic's field report on LangGraph, OpenAI Assistants v2, Anthropic's computer use, and Autogen.
-
7 Mar·26
Running models on your own machine.
Hardware, software, actual tokens-per-second on three quantizations, and when local is genuinely worth it.
-
1 Mar·26
Gemini 3 Pro, reviewed
Brilliant at one specific workflow, competent at most others, and strange in ways the model card does not explain.
February 2026
-
25 Feb·26
Small language models, in working use.
Phi-4, Gemma 3, and the workloads where sub-10B parameter models quietly win.
-
11 Feb·26
Context windows compared, across four frontier models.
When the million-token window pays off, and when it's just expensive retrieval done badly.
-
1 Feb·26
The coding assistants shootout: Cursor, Copilot, Windsurf, Cody.
Four assistants given the same feature on the same codebase. The bugs they shipped were not equally distributed.
January 2026
-
18 Jan·26
The open-weight tier right now: Llama 4, Mistral, Qwen, DeepSeek.
Where open weights have caught up to closed models, and the two categories where they still haven't.
-
4 Jan·26
GPT-5, reviewed.
Where GPT-5 differs from Claude in ways that matter — speed, breadth, and the cracks that show on niche technical questions.