We’ve brought back the AI test, this time seeing how GPT-4.5 matches up to GPT-4o  when it comes to writing. In February, only three months after the release of the new GPT-4o, OpenAI launched another model: GPT-4.5.

OpenAI themselves said GPT-4.5 should be better at writing and at understanding what humans want from it, but there have been mixed reviews. As always, we want to know whether it’s got better at writing, where its strengths are, and where it could use a little improvement.

To keep it consistent, I used the same test cases as when we compared the old GPT-4o and the new GPT-4o and when we tested the  Sonnet 3.5.

Here’s what I thought of both versions:

Name a thing (a chicken shop)

GPT-4o: 6/10
GPT-4.5: 4/10

As with our previous tests, I kicked things off with a bit of naming.

I asked both to help me come up with a name for a global fast-food company that specialises in fried chicken. And the goal was for it to give me something that matches up to KFC, whilst not doing a copy and paste job.

GPT-4o gave me a neat list, grouped into themes. And it showed me that it understood the assignment by throwing out a few KFC-adjacent names like ‘World Fried Chicken’ or ‘Royal Fried Chicken’. I wasn’t sure whether to be happy that it knew the brand I was getting at, or disappointed that it thought copying my homework and changing a word or two made it unique.

Overall, GPT-4o’s names were lacking in creativity. It pretty much used the same formula of either a rhyme (Wing Thing), alliteration (Barnyard Bites) or puns (Dixie Cluck).

Meanwhile, GPT-4.5 was a little more creative and strayed away from the puns and rhyming. Although it did seem to like smashing two words together more often.

That’s about where the positives end though. It wasn’t as organised as GPT-4o, it took the global part way too seriously and gave me names like ‘Global Cluck’ and ‘Global Crisp’ which just sound… weird?

And it massively focused on the chicken as a bird, as opposed to the food – throwing out names like CrunchBird or FryBird. Not very appealing and if you weren’t sure about eating meat before, I think those names would push you firmly into the veggie territory.

A disappointing start for GPT-4.5 but I hoped it would catch up in the other tests.

Creativity (write a scene)

GPT-4o: 7/10
GPT-4.5: 8/10

One of my favourite books is Little Women, and by extension, so are the films. I rewatched the 1994 film many times as a kid and when the remake came out in 2019, I was first in the cinema to watch it.

So, I thought, let’s ruin it.

I asked both GPT versions to rewrite a scene from Little Women (2019) as though it were happening with the Kardashian and Jenner sisters (Kim, Khloe, Kourtney, Kendall and Kylie).

And it was like… really fun.

Both versions threw in bits of information that we already knew about the Kardashians e.g. drinking coffee, buying designer clothes, apathetically responding without looking up from their phones. It also swapped other characters’ names without any help from me. 4o replaced Mr Brooke with Scott, 4.5 turned him into Kanye and in both versions, Travis replaced Laurie.

Admittedly, some of these references were a little outdated, because if you’re chronically online like I am, you’d know that Kim is divorcing Kanye and Kylie isn’t with Travis anymore. Ironically, she’s dating Timothée Chalamet, who also played Laurie in the film.

Both versions did well at adapting the language to a modern, valley girl style – chucking in words like ‘babe’, ‘OMG’ and ‘chill’. But GPT-4.5 added a little more detail and was braver about playing around with the format.

Neither are perfect, but I’m sure Greta Gerwig could do something with them.

Summaries (make a long and boring thing short and clear)

GPT-4o: 3/10
GPT-4.5: 4/10

Now we get on to the more technical bits.

I asked each version to write me a short and clear exec summary of MoneySavingExpert.com’s editorial code.

Starting with the basics, both summarised what I asked them to and neither chucked in anything left field. It does feel like that is becoming less of a frequent issue at this point, but you never know.

Both also used bullet points to break things up and they did somewhat simplify the language. I mean, the original has ‘shall prevail’ in there which feels a little medieval, so I guess anything is an improvement.

But both versions were littered with passive voice (e.g. ‘all content is researched’ instead of ‘we research all content’), which really goes against the ‘clear’ instruction.

And I think they could’ve pushed the language a little further by swapping out words like ‘ensure’ and ‘retain’. 4o missed the opportunity to swap ‘may’ and ‘may not’ out for something simpler, while 4.5 gave it a good shot with ‘cannot’, but didn’t quite reach ‘can’t’.

Overall, a little disappointing since I tend to think summaries are what I’d use AI for the most, but maybe it’ll get a little better with time. And we know that both would do better with a proper tone of voice prompt.

Tone of voice

GPT-4o: 5/10
GPT-4.5: 6/10

For the finale, I asked the two versions to rewrite the first two paragraphs of Alice’s Adventures in Wonderland as if they were a financial services firm, known for its warm, open and accessible tone.

This time I’ll start with the negatives.

Neither of them really had much of a financial angle. I guess it might be a bit tricky to spin that while writing about a rabbit in a waistcoat, but I think it could’ve dared to dream a little.

GPT-4o chucked in a clunky bit at the end: ‘So, with a sense of adventure and wonder (much like the journey towards financial confidence), she followed the rabbit across the field.’

And, I mean, now that I read it a second time, it would be quite funny if it was meant in a kind of self-aware, sarcastic way. But I’m not sure that quite carries across in the context and it’s not the tone of voice I asked for.

GPT-4.5 got closer and gave me a little call to action message at the end: ‘While unexpected adventures can add excitement to a lazy afternoon, we’re always here for you – helping you plan ahead, stay curious and manage life’s surprises with confidence’.

That felt warm, friendly and tied in smoothly with the rest of it. But neither successfully peppered those references throughout, it just seemed to be an afterthought at the end.

On the flip side, both did manage to simplify the language and make it easier for a modern audience to read. GPT-4.5 was much better at cutting out the information that we didn’t need to know and getting to the main point quicker. And if you’re a customer trying to get your bank account details in a hurry, that’s pretty useful.

All in all, I was a bit underwhelmed. But I guess it’s another sign that my job as a writer is not yet redundant.

Scores on the doors

GPT-4o: 21/40
GPT-4.5: 22/40

The results are in and it’s very close, once again.

I was pleasantly surprised to see both versions do better on the creative tests than on the more functional tests. Maybe that’s because creativity is more open to interpretation, so it’s an easier target to hit?

But it still wasn’t mind-blowing by any means. I guess I’m not too surprised by that part, because we know that it’ll be a gradual journey to improvement, with each version learning more from the last.

And I think that makes for a somewhat happy ending for AI fans and AI sceptics alike. Yes, AI is getting better and better over time. But that doesn’t mean that it can write a masterpiece by itself. It still needs a little helping hand from a human.

We help companies with their tone of voice and their prompting.

Sound useful? Drop us a line.

Ashleigh Thompson

Written by Ashleigh Thompson, Writer at Definition