Rank
(overall)
Model Customer messages
and persuasive copy
Headlines
and titles
Naming
things
Tone adherence
and clarity
Overall
writing score
1 Gemini 3.1 Pro 8 8 7.5 7 7.63
2 Fable 5 7 7 6 7.25 6.81
3 Opus 4.8 6.25 6.75 6.5 7.5 6.75
4 Gemini 3 Pro * 7 9.5 4.5 5.5 6.63
5 Claude Sonnet 4.5 6 6 8 6 6.5
6 Claude Opus 4.7 6.75 5 6.5 7.5 6.44
7 Claude Opus 4.6 5.5 8 5.75 5.5 6.19
8 Claude Sonnet 4.6 6.5 5 6 7 6.13
9 GPT-5.4 Thinking 5 6 6.5 7 6.13
10 Gemini 3.1 Flash Lite 5.5 6.5 5.5 6 5.88
11 GPT-5.1 * 5.25 5 5 7.5 5.69
12 Gemini 3 Flash 4.5 8.5 5.5 3.5 5.5
13 GPT-5.5 Thinking 6 4 4.5 7.5 5.5
14 GPT-5.3 Instant 7.5 5 3.5 6 5.5
15 GPT-5.2 4.5 6.5 5.5 5 5.38
16 Gemini 3.5 Flash 5 6.5 5 4 5.13

* model retired

Our scoring panel is made up of specialist writers from the Definition language team.

They have been crafting copy and defining tone of voice for brands like Monzo, Specsavers, Zurich and Disney+ for decades.

How it works

We test every major LLM across four core business writing capabilities, using eight different tasks:

1) Customer messages and persuasive copy

  • Delivering bad news to customers
  • Product description

2) Headlines and titles

  • Article headline generation
  • YouTube title generation

3) Naming things

  • Product names
  • Company names

4) Tone adherence and clarity

  • Adapting copy to match a specific tone of voice guide
  • Rewriting dense text for clarity

 

Our specialists review every output and grade the model out of 10 for each capability. A 10 means the copy is “client-ready”, perfectly on brief with zero edits needed.

We add new models to the table soon after their release.

Want access to all of the best writing models in one secure place?

 

Start a free Definition AI trial today

Our language team panel: