The AI writing leaderboard

LLMs scored by writers.

Rank (overall)	Model	Customer messages and persuasive copy	Headlines and titles	Naming things	Tone adherence and clarity	Overall writing score
1	Gemini 3.1 Pro	8	8	7.5	7	7.63
2	Fable 5	7	7	6	7.25	6.81
3	Opus 4.8	6.25	6.75	6.5	7.5	6.75
4	Gemini 3 Pro *	7	9.5	4.5	5.5	6.63
5	Claude Sonnet 4.5	6	6	8	6	6.5
6	Claude Opus 4.7	6.75	5	6.5	7.5	6.44
7	Claude Opus 4.6	5.5	8	5.75	5.5	6.19
8	Claude Sonnet 4.6	6.5	5	6	7	6.13
9	GPT-5.4 Thinking	5	6	6.5	7	6.13
10	Gemini 3.1 Flash Lite	5.5	6.5	5.5	6	5.88
11	GPT-5.1 *	5.25	5	5	7.5	5.69
12	Gemini 3 Flash	4.5	8.5	5.5	3.5	5.5
13	GPT-5.5 Thinking	6	4	4.5	7.5	5.5
14	GPT-5.3 Instant	7.5	5	3.5	6	5.5
15	GPT-5.2	4.5	6.5	5.5	5	5.38
16	Gemini 3.5 Flash	5	6.5	5	4	5.13

* model retired

Our scoring panel is made up of specialist writers from the Definition language team.

They have been crafting copy and defining tone of voice for brands like Monzo, Specsavers, Zurich and Disney+ for decades.

How it works

We test every major LLM across four core business writing capabilities, using eight different tasks:

1) Customer messages and persuasive copy

Delivering bad news to customers
Product description

2) Headlines and titles

Article headline generation
YouTube title generation

3) Naming things

Product names
Company names

4) Tone adherence and clarity

Adapting copy to match a specific tone of voice guide
Rewriting dense text for clarity

Our specialists review every output and grade the model out of 10 for each capability. A 10 means the copy is “client-ready”, perfectly on brief with zero edits needed.

We add new models to the table soon after their release.

Want access to all of the best writing models in one secure place?

Start a free Definition AI trial today