Essay·May 2026

Do AI text detectors actually work?

The false-positive problem, who gets wrongly flagged, and what to do instead.

By the benchr team · Reviewed May 30, 2026 · View changelog · Figures verified against official sources, 30 May 2026

Picture an international student who wrote her essay herself, in her second language, the night before it was due. She runs it through nothing, hands it in, and a week later she's sitting across from a professor explaining why a piece of software is 98% sure a machine wrote her work. She didn't cheat. The detector just doesn't like how she writes.

That isn't a hypothetical edge case. It's the most documented failure mode these tools have, and the people it hits hardest are the ones least equipped to fight back. So before any school points a detector at a student, here's what the research actually shows.

The company that built the AI couldn't build the detector

Start with the most telling fact in the whole debate. OpenAI, the company behind the model everyone's worried about, built its own AI Text Classifier, launched it as a free beta in January 2023, and then quietly pulled the plug on July 20, 2023, citing its low rate of accuracy. They scuttled their own tool.

The numbers explain why. OpenAI's classifier correctly tagged only 26% of AI-written text as likely AI. So it missed roughly three out of four. And it falsely flagged 9% of genuinely human text as machine-written. It was also unreliable on anything under 1,000 characters, which is most short assignments. If the people with the most knowledge of how these models write couldn't get past one-in-four detection, you should be skeptical of any third-party tool promising near-certainty.

Caught26%of AI text OpenAI's own classifier flagged as likely AI

False alarms9%of human text it wrongly flagged as machine-written

Lifespan6 mofrom January launch to shutdown on July 20, 2023

This isn't a knock on OpenAI specifically. It's a knock on the premise. Research from Sadasivan and colleagues went further and proved a theoretical result: as language models get better at sounding human, the gap between human and AI writing shrinks, and even the best possible detector drifts toward random guessing. The thing detectors measure is disappearing on purpose. You can read more on how unreliable model output itself has gotten in benchr's look at whether AI hallucinations are fixed yet, because the same gap that makes text hard to detect makes it hard to trust.

The false-positive math nobody runs

Here's where the human cost shows up. Turnitin, the plagiarism checker most schools already pay for, launched its AI writing detector with a claimed 1% false-positive rate. One percent sounds like a rounding error. It isn't.

Vanderbilt did the multiplication out loud. The university submitted around 75,000 papers to Turnitin in 2022. At a 1% false-positive rate, that's roughly 750 student papers wrongly flagged as containing AI writing in a single year, at one school. Seven hundred and fifty students who might have to defend work they actually did. That number, plus documented bias and Turnitin's refusal to explain how the tool decides, is why Vanderbilt disabled the detector, with guidance published on August 16, 2023.

~750 Vanderbilt students who could be wrongly flagged in one year, if a "tiny" 1% false-positive rate hits 75,000 papers.

A 1% error rate is fine for a spam filter. It is not fine when each false positive is a person sitting in front of a misconduct board. The trouble with selling a detector on its accuracy rate is that the rate hides the headcount, and the headcount is what ruins someone's semester.

Who gets wrongly flagged

The false positives aren't spread evenly, either. They land on a specific group, and that's the part that should bother you most.

The mechanism matters because it tells you this won't get patched away. Most detectors score "perplexity," roughly how predictable each next word is. AI tends to pick common, expected words, so low perplexity reads as machine. But a writer working in a second language also tends to use simpler, more predictable vocabulary, not because a model wrote it, but because that's the range they've got. The detector can't tell humility from a hard drive. It penalizes a smaller vocabulary as if it were proof of cheating.

What detectors claim versus what the research documents, May 2026
Tool or claim	Marketed as	Documented reality
OpenAI AI Text Classifier	A classifier to spot AI text	Caught 26% of AI text, false-flagged 9% of human text; shut down July 20, 2023 for low accuracy
Turnitin AI detector	~1% false-positive rate	~750 of Vanderbilt's 75,000 papers could be wrongly flagged at that rate; Vanderbilt disabled it Aug 16, 2023
Seven GPT detectors (Liang et al.)	Reliable AI detection	61.22% average false-positive rate on non-native (TOEFL) essays; 97.80% flagged by at least one detector
Detectors broadly (Sadasivan et al.)	Catches AI writing	A light paraphraser defeats watermark, neural, zero-shot, and even retrieval-based detectors

They don't even catch the cheaters

Here's the punchline that should end the argument. While detectors are busy flagging honest non-native writers, the students actually using AI to cheat have an easy way out. You don't even need a special tool.

Sadasivan and colleagues showed that bolting a light paraphraser onto AI output breaks a whole range of detectors: watermarking, neural-network classifiers, zero-shot methods, and even retrieval-based ones fall to recursive paraphrasing. The Stanford team found the same thing with a single "rephrase this" prompt, which lets AI text walk straight past the same detectors that just flagged a TOEFL essay. Detection is way easier to evade than it is to trust.

A flag punishes the student who writes simply and waves through the one who paraphrases. That's backwards.

Turnitin more or less concedes the problem. The company has said its checker deliberately misses about 15% of AI-generated text in a document, on purpose, to keep from false-flagging human writing. Sit with that. To cut down on wrongly accusing people, the tool intentionally lets cheating slide, and it still false-flags non-native writers at high rates. You're paying for a smoke detector that's been told to ignore some of the smoke and panic at the toaster.

What teachers should do instead

So skip the detector as a verdict. If you run one at all, treat its output as a weak hint that prompts a conversation, never as evidence that ends one. A flag is a probability score generated by a tool the people who built the underlying model couldn't make work.

The better move is to build assignments around process you can see. Ask for drafts and version history. Do some writing in class. Have students talk through their own arguments out loud. A short conversation about the work tells you more than any perplexity score, and it doesn't care what someone's first language is. If your worry is that AI writing is generic, the fix is prompts and rubrics that reward the specific over the boilerplate, the same instinct that makes good prompt engineering matter for getting useful output in the first place.

For the bigger picture, the lesson here rhymes with what's happened to AI benchmarks. A single headline number gets marketed as truth, then falls apart the moment you look at who it's measured on and how easily it's gamed, which is exactly the problem benchr keeps hitting in why benchmarks stopped telling you much. A detector score is a benchmark with a person's record attached. Treat it with at least that much suspicion.

Go with process-based assessment if you care about fairness. Skip detector scores as proof, full stop. And if you're a student staring at a false accusation, the entire research record is on your side: ask the institution to show how the tool decides, point to the 61.22% false-positive rate on non-native writing, and remember that OpenAI itself couldn't make this work.

Frequently asked

Are AI text detectors accurate?

Not accurate enough to trust as proof. OpenAI's own AI Text Classifier correctly tagged only 26% of AI-written text and falsely flagged 9% of human text, and OpenAI shut it down on July 20, 2023 over its low rate of accuracy. Sadasivan et al. (2023) also showed that as language models improve, even the best possible detector trends toward a coin flip.

Do AI detectors give false positives on human writing?

More than the marketing admits. Turnitin claimed a 1% false-positive rate, but Vanderbilt noted that across the roughly 75,000 papers it submitted in 2022, even 1% works out to about 750 students wrongly flagged. That math was part of why the university disabled the tool, with guidance published August 16, 2023.

Do AI detectors flag non-native English writers?

Heavily. In a 2023 study in Patterns, Liang et al. ran seven GPT detectors over 91 TOEFL essays by non-native English writers and 88 essays by US 8th graders. The detectors were near-perfect on the native essays but flagged 97.80% of the TOEFL essays as AI by at least one detector, with 19.78% flagged unanimously by all seven. The average false-positive rate on those essays was 61.22%. The cause is perplexity scoring, which penalizes simpler, more predictable word choices.

Can a student fight a false AI accusation?

Yes, and the research is on the student's side. A detector flag is a probability score, not evidence. OpenAI retired its own classifier for low accuracy, Vanderbilt disabled Turnitin's tool partly because it wouldn't explain how it decides what's AI-written, and Turnitin admits it deliberately misses about 15% of AI writing to keep false accusations down. Ask for version history, drafts, and a conversation about the work instead of treating the score as a verdict.

What should teachers do instead of running an AI detector?

Treat detector output as a weak signal at most, never as proof. Sadasivan et al. (2023) showed a light paraphraser breaks essentially every detector type, so the tool punishes honest students while real cheaters paraphrase past it. Design assignments around process: drafts, version history, in-class writing, oral defenses, and conversations about the work hold up where a score does not.

Changelog

May 30, 2026 — Originally published. Accuracy, false-positive, and bias figures verified against OpenAI's classifier notice, the Liang et al. study in Patterns, Vanderbilt's guidance, and Sadasivan et al.

References

OpenAI, "New AI classifier for indicating AI-written text," openai.com, accessed May 2026.
TechCrunch, "OpenAI scuttles AI-written text detector over low rate of accuracy," techcrunch.com, accessed May 2026.
Liang et al., "GPT detectors are biased against non-native English writers," cell.com, Patterns, accessed May 2026.
Cell Press, press release for Liang et al., eurekalert.org, accessed May 2026.
Vanderbilt University, "Guidance on AI detection and why we're disabling Turnitin's AI detector," vanderbilt.edu, accessed May 2026.
University of San Diego Legal Research Center, "AI detection guide" (citing Turnitin), lawlibguides.sandiego.edu, accessed May 2026.
Sadasivan, Kumar, Balasubramanian, Wang & Feizi, "Can AI-Generated Text be Reliably Detected?," arxiv.org, accessed May 2026.