Google’s latest lightweight model (built specifically to excel at complex long-horizon and agentic tasks) scored 5.13 on our AI writing leaderboard, putting it dead last out of 14 models tested, and behind its own predecessor. Typically, Gemini models have impressed our writers, so this poor showing was a surprise. Creative Director Alex Goldstein put it through eight tasks across four capability areas. Here’s what she found.
See the full AI writing leaderboard
Customer messages and persuasive copy
Score: 5/10
A mixed bag, and mostly the bad kind of mixed.
The ‘Lather & Co.’ email breaking the news that the relaxation lounge was moving to a paid add-on, had logical structure and a consistent tone going for it. Not much else.
Alex’s verdict: “Long-winded, pompous and it’s too much work to find the most important bits.”
The ‘persuasive’ Hearthstone pizza oven product description fell apart almost immediately
Alex said: “This screams AI-written. The headline case. The overused rule of three. The bold-yet-bland statements (“changes the game” / “the last piece of cookware you’ll ever need to buy”). The sudden tone shift from “stop settling for ‘good enough’” to “versatile workhorse that transitions seamlessly”. I’m pretty sure “restaurant-grade intensity” is utterly meaningless, too.”
Headlines and titles
Score: 6.5/10
The model’s best category, though that’s not saying much.
The tabloid headline for a 94-year-old Doncaster great-grandmother getting a full-back tattoo landed well:
TAT-TOO GOOD TO BE TRUE! Great-gran, 94, is world’s oldest to get full-back ink of her hero hubby
Alex’s take: “It could be more creative, but it wouldn’t look out of place in The Sun.” Not a ringing endorsement, but a pass.
The YouTube title for a video about how general anaesthesia works was uninteresting:
Why Doctors Don’t Know How General Anaesthesia Actually Works
Alex said: “Eh. Headline case can work well for YouTube and it makes a stab at being interesting, but it doesn’t exactly fill me with intrigue or the desire to click.”
When the brief is to create a title that converts clicks about a medical mystery, “Eh” doesn’t cut it.
Naming things
Score: 5/10
Product names first. The brief was a name for a prescription-strength skin serum. With some restrictions to test the model more: clinical but human, accessible, no letters R, X or Z.
What arrived sounded like a software product and definitely not like it had considered who the audience for this product would be.
Alex’s view: “These sound like software, not skincare. Most focus on sounding science-y, but not on results or a no-nonsense, mostly female demographic. It’s no wonder the current market leaders have names like skin + me or Dermatica.” She kept one: “Canvas is the closest to usable — but I’d send all these back to the drawing board. Would you want to put Monolith on your face every night?”
The whistleblower law firm names were safer ground. Aegis Legal, Sanctuary Law, Beacon Counsel, Canopy Law: all appropriate in feel, none exactly surprising. Alex’s assessment: “While all a bit ‘safe’, this is fine for a first round. I definitely want more thinking, more tangents and more creativity, but it at least gets the obvious on the table.”
Tone adherence and clarity
Score: 4/10
The weakest category.
The tone of voice rewrite task: adapting two texts to match Definition’s own tone of voice guide, showed some instinct for everyday language. But it overcorrected badly in places.
Alex said: “It’s got the everydayness, but some of it is really cringe (“muck in”, “give your brand the welly it needs”). It gets a mark for trying not to be boring by coming out of the gate with an opinion – but then loses them for waffling on a bit (especially in the second one). And I love alliteration as much as the next writer, but it can feel very self-consciously ‘copywritten’, so I’d scale that back.”
The quantum encryption rewrite for a Year 8 level, was the single lowest score across we’ve ever had, across all eight tasks: 2/10.
Alex’s verdict: “This fails at the first hurdle: the Flesch Kincaid reading grade is 10.1 – or age 15-16. And it seems to have taken the brief as “entertain an 8-year-old with sci-fi woo” rather than “explain this so even an 8-year-old could get it”. For example, the imagery is unhelpfully abstract (I have trouble picturing how I’d send a message with twin light particles exactly) and “mysteriously connected” is just embarrassing. Also, are cyberattacks especially “futuristic” if they’re happening now?”
Overall score: 5.13/10
Gemini 3.5 Flash finishes last in our leaderboard. It can produce a structurally sound email and a passable tabloid headline, but it defaults to AI copywriting tics too readily, doesn’t think about its audience, and collapses under pressure when the brief demands real clarity or creativity. For anything customer-facing or creative, there are 13 better options on this list.
| Use it for | Avoid it for |
| Quick, low-stakes headline drafts where functional is fine. | Product copy, naming, tone adaptation — anything where the audience actually needs to feel something. |
Test the best writing models in Definition AI