POV: Model benchmarking in its current state isn’t very useful for marketing teams.
I don’t know if I care yet. What are you talking about? Model benchmarking is a fancy term for how we compare all the different AI models.
I see. What good is it to me? I’ll show you. Let’s take a look at the benchmarks from Anthropic’s latest Claude model – 3.5 Sonnet:

Source: Anthropic, 2024
Mmm. That’s just lots of names and numbers, isn’t it? Well, maybe… You’re actually looking at a bunch of numbers that tell you how good Claude 3.5 Sonnet is at different stuff.
OK, and is it useful to me? Well, as a marketer, maybe not. Does its ability to do multilingual math help you decide whether it’s the best AI to write the emails for your next campaign? How about its undergraduate knowledge? Does that make it more or less suitable than GPT-4o for brainstorming? And what of HumanEval?
Sorry, what’s that? HumanEval checks how good an AI model is at code generation and understanding. Sonnet 3.5 gets a 92% zero shot rating. (Trust me, that’s super.)
Oh great. That sounds good. It is. But Sonnet 3.5 being great at that, might not be that useful if you need AI that helps you share your brand guidelines with thousands of colleagues around the world.
It’s a minefield. Yes! And it gets even worse with text to image models.
Eek. What are they? What it says on the tin. They’re AI models that make images from text prompts. Sounds simple. But with so many out there, how do you know which one will output the aspect ratios you need? Or the most photorealistic pictures? Should you plump for OpenAI’s DALL-E 3? Or do you actually need to sign up for a Midjourney account? What about Google’s Imagen? Oh yeah and which one’s easiest to fine-tune with your own existing image set?!
I’m really confused. Help! There are a couple of ways you can work all this out. You can talk to a team who’s experimented with all of this stuff and knows how to get the best text, images, video and audio out of the models.
That sounds good. It does, doesn’t it? If you want to talk to us about all this, we’ll happily explain why Claude 3.5 Sonnet is the best available model for copywriting. And why Imagen 3 and Flux.1 should be your go-to models for images. We’ll also fill you in on the latest and greatest speech to text and text to video models, so your social campaigns really pop.
What’s the other option? Well, you can also experiment yourself and then blend the best models for whatever job you’re doing in a single AI suite. If you want to test that out yourself, then sign up for one of our new demo AI suites, to access multiple AI models from a variety of vendors, in a safe and secure way.
As Teresa Heitsenrether (JPMorgan’s chief data and analytics officer) says: “Ultimately, we’d like to be able to move pretty fluidly across models depending on the use cases. The plan is not to be beholden to any one model provider.”
I couldn’t agree more Teresa. Horses for courses and all that.
Chat to our AI expertsWritten by Luke Budka, AI Director at Definition