jeudi 20 juin 2024

Anthropic has a fast new AI model — and a clever new way to interact with chatbots

Anthropic has a fast new AI model — and a clever new way to interact with chatbots
A screenshot of the Claude app showing 3.5 Sonnet selected.
GPT-4o, Gemini 1.5, and now Claude 3.5 Sonnet. | Image: Anthropic

The AI arms race continues apace: Anthropic is launching its newest model, called Claude 3.5 Sonnet, which it says can equal or better OpenAI’s GPT-4o or Google’s Gemini across a wide variety of tasks. The new model is already available to Claude users on the web and on iOS, and Anthropic is making it available to developers as well.

Claude 3.5 Sonnet will ultimately be the middle model in the lineup — Anthropic uses the name Haiku for its smallest model, Sonnet for the mainstream middle option, and Opus for its highest-end model. (The names are weird, but every AI company seems to be naming things in their own special weird ways, so we’ll let it slide.) But the company says 3.5 Sonnet outperforms 3 Opus, and its benchmarks show it does so by a pretty wide margin. The new model is also apparently twice as fast as the previous one, which might be an even bigger deal.

AI model benchmarks should always be taken with a grain of salt; there are a lot of them, it’s easy to pick and choose the ones that make you look good, and the models and products are changing so fast that nobody seems to have a lead for very long. That said, Claude 3.5 Sonnet does look impressive: it outscored GPT-4o, Gemini 1.5 Pro, and Meta’s Llama 3 400B in seven of nine overall benchmarks and four out of five vision benchmarks. Again, don’t read too much into that, but it does seem that Anthropic has built a legitimate competitor in this space.

A screenshot showing various benchmark scores for Claude 3.5 Sonnet and other AI models. Image: Anthropic
Claude 3.5’s benchmark scores do look impressive — but these things change so fast.

What does all that actually amount to? Anthropic says Claude 3.5 Sonnet will be far better at writing and translating code, handling multistep workflows, interpreting charts and graphs, and transcribing text from images. This new and improved Claude is also apparently better at understanding humor and can write in a much more human way.

Along with the new model, Anthropic is also introducing a new feature called Artifacts. With Artifacts, you’ll be able to see and interact with the results of your Claude requests: if you ask the model to design something for you, it can now show you what it looks like and let you edit it right in the app. If Claude writes you an email, you can edit the email in the Claude app instead of having to copy it to a text editor. It’s a small feature, but a clever one — these AI tools need to become more than simple chatbots, and features like Artifacts just give the app more to do.

A screenshot showing a preview of a document alongside an AI chat. Image: Anthropic
The new Artifacts feature is a hint at what a post-chatbot Claude might look like.

Artifacts actually seems to be a signal of the long-term vision for Claude. Anthropic has long said it is mostly focused on businesses (even as it hires consumer tech folks like Instagram co-founder Mike Krieger) and said in its press release announcing Claude 3.5 Sonnet that it plans to turn Claude into a tool for companies to “securely centralize their knowledge, documents, and ongoing work in one shared space.” That sounds more like Notion or Slack than ChatGPT, with Anthropic’s models at the center of the whole system.

For now, though, the model is the big news. And the pace of improvement here is wild to watch: Anthropic launched Claude 3 Opus in March, proudly saying it was as good as GPT-4 and Gemini 1.0, before OpenAI and Google released better versions of their models. Now, Anthropic has made its next move, and it surely won’t be long before its competition does so, too. Claude doesn’t get talked about as much as Gemini or ChatGPT, but it’s very much in the race.

Aucun commentaire:

Enregistrer un commentaire

Pegasus spyware maker NSO Group is liable for attacks on 1,400 WhatsApp users

Pegasus spyware maker NSO Group is liable for attacks on 1,400 WhatsApp users Photo by Amelia Holowaty Krales / The Verge NSO Group, the ...