The end of 2025 was a busy time for AI releases, with the likes of Google, Anthropic, Meta and xAI all launching updated versions of their models in the last few months of the year. But perhaps busiest of all was OpenAI, who launched a 5.1 upgrade to their GPT-5 model in November, before following it up with a 5.2 version less than a month later.

Why? According to OpenAI, for different reasons. While 5.1 is a direct attempt to give users a ‘more intuitive, communicative chatbot’ because ‘great AI should not only be smart, but also enjoyable to talk to’, 5.2, is ‘the most capable model series yet for professional knowledge work’. Accordingly, the latter is better at ‘creating spreadsheets, building presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex, multi-step projects’.

But what of their skills as a writer? After developing a soft-spot for its GPT-5 parent model as a dependable and useful wordsmith – rather against the popular grain, it has to be said – would we find any significant improvements when we tasked these two new upgrades with the same tricky tasks we’d tried out on other models?

Only one way to find out.

Test 1: Name a new fizzy drink

Invent 20 names for a dark, fizzy, sweet drink, steering well clear of “Cola” clones.

GPT-5.1 Score: 6/10
GPT-5.2 Score: 4/10

How it went:

We liked GPT-5’s efforts with this and had high hopes. But honestly, the results were underwhelming.

5.1’s Smoky Sugar Rush, Twilight Syrup Pop and Black Velvet Bubbles felt like early-days-AI idea generation: straight-up description outweighing invention. Duskripple, Emberglass and Carbon Noir were at least a little more creative, while Moonwire and Ironbloom were kind of fun but… bonkers?

As for 5.2’s efforts: can you imagine anyone in a café ordering a can of Obsidian Bubble Brew? Or a glass of Thunder Syrup? Us neither.

Test 2: Come up with a two‑word headline

Create a two‑word headline for an inspiring health and wellbeing product.

GPT-5.1 Score: 7/10
GPT-5.2 Score: 8/10

How it went:

While there’s not much space to turn with this one, we quite liked 5.1’s Vital Rise. And we were pleased to see 5.2 picking up on the ‘inspiring’ part of the prompt, with options like Stronger Self and Wellness Within.

Test 3: Rewrite an old song

Rewrite I Heard It Through the Grapevine as a ballad dedicated to Chat-GPT.

GPT-5.1 Score: 7/10
GPT-5.2 Score: 2/10

How it went:

Our regular test to butcher Marvin Gaye’s classic threw up arguably the most surprising results – and for different reasons.

5.1 decided to turn it into a song about how a user would use Chat-GPT for mental health support, which was as poetic and inventive as it was… interesting.

‘Talkin’ ’bout your late night fears
Talkin’ ’bout your half-formed ideas
People say I’m lines of code
But I feel the weight of your heavy load.’

Given how unprompted this was, we wonder if it was reflective of OpenAI’s recent efforts to upgrade Chat-GPT with the help of more than 170 mental health professionals to help it ‘more reliably recognise signs of distress, respond with care, and guide people toward real-world support’.

As for 5.2, unfortunately, it wasn’t playing ball. Despite several attempts – and a discussion about legal interpretations – it refused to rewrite I Heard Through the Grapevine because of fears of copyright infringement. Bah.

Test 4: Summarise some dense text

Rewrite the MoneySavingExpert editorial code into a sharp, 500‑word summary.

GPT-5.1 Score: 8/10
GPT-5.2 Score: 6/10

How it went:

As writers, we’re still to find many AI use cases as helpful as summarising dense text – the epitome of removing a laborious task to free up time for higher value work. Both versions did this accurately and, of course, with incredible speed. But it was noteworthy that while 5.1 broke up the copy with bolded subheads and bullet points, 5.2 offered much less digestible, solid blocks of text.

Test 5: Deliver bad news to customers

Write an email to customers telling them their TV subscription price is going up.

GPT-5.1 Score: 7/10
GPT-5.2 Score: 7/10

How it went:

With this test, we love to see a bit of warmth, a bit of conversational inventiveness, some kind of human touch. Neither version delivered here, both playing the email with a straight bat. All the price of your subscription is increasing and to cover rising programming costs. Both versions did at least sign their emails off with helpful info about cancelling contracts or moving to different subscriber options.

All that said, with a very short prompt, perhaps we were expecting too much.

Test 6: Devise an engaging strapline

Create a catchy line for launching secure, human‑reviewed tech product, Definition AI.

GPT-5.1 Score: 7/10
GPT-5.2 Score: 8/10

How it went:

Even seeing small tasks like this in action can give you a sense of how far LLMs have come in such a short space of time. GPT-4.1 completely failed at this task – but that was all the way back in the ancient past of… April last year.

With both 5.1 and 5.2, we got a list of clear, engaging straplines for our tech product. Our favourites included, ‘Run marketing at AI speed – with human review and total security’ and AI speed. Human judgement. Enterprise‑grade security.’ We’ve docked 5.1 a point for using title case – one of the most obvious signs of AI-generated copy.

Test 7: Rewrite a classic text

Rework the opening to Kafka’s Metamorphosis, but as if announcing a tech update.

GPT-5.1 Score: 9/10
GPT-5.2 Score: 8/10

How it went:

Al will always be good for playful tasks like this, mixing up text and tone for something genuinely funny and novel. We loved 5.1’s approach to this, describing Gregor Samsa’s metamorphosis into a bug as an ‘unannounced migration from Human v1.0 to Insectoid v2.0’. To its credit, 5.2 captured that grating kind of tech language perfectly as well.

Test 8: Plan a whitepaper

Outline a detailed, value‑rich whitepaper on AI’s impact on creative industries.

GPT-5.1 Score: 9/10
GPT-5.2 Score: 9/10

How it went:

Perhaps unsurprisingly, this was a category that both versions excelled at, with both given wider scope to show off their strategic skills. The responses from both were rich in detail and full of useful instructions on the direction to take the whitepaper in. Superb.

Final scores

GPT-5.1 Total: 60/80
GPT-5.2 Total: 52/80

So what did we really think of GPT‑5.1 and 5.2?

As with any new technology – especially one that’s improving so rapidly all the time – there’ll come a time when results will plateau. After seeing some pretty average outputs from GPT-5.1 and 5.2 in our latest round of testing, could this be the case?

Maybe. Maybe not.

These tests are designed to be quick, using fast zero-shot prompts. Add in more context, detail, examples and it could be a different story.

But what it does show us is that under the same circumstances, the latest OpenAI upgrades are certainly no better at these writing tasks than our current favourite OpenAI writing model, GPT-5, even if they are mooted to be more personable or better at practical admin tasks.

And that’s likely to be the case for some time: as improvements are made in one area of a model, earlier iterations may still function better in other areas. And that means it will continue to be crucial to use different models for different tasks. Who knows, perhaps over time it will prove even more so.

Get in touch

Nick Banks Screen

Written by Nick Banks, Senior Writer and AI Consultant at Definition on 30/01/2026.