Home Artificial Intelligence Anthropic's Claude 2.1 LLM turbocharges performance, offers beta tool use

by Sharon Machlis

Executive Editor, Data & Analytics

Anthropic’s Claude 2.1 LLM turbocharges performance, offers beta tool use

news

Nov 21, 20233 mins

Enterprise ApplicationsGenerative AI

Anthropic's Claude 2.1 large language model, which powers its Claude generative AI chatbot, raises the bar on how much information an LLM can ingest at once.

programming / coding elements / lines of code / development / developers / teamwork

Anthropic has upped the ante for how much information a large language model (LLM) can consume at once, announcing on Tuesday that its just-released Claude 2.1 has a context window of 200,000 tokens. That’s roughly the equivalent of 500,000 words or more than 500 printed pages of information, Anthropic said.

The latest Claude version also is more accurate than its predecessor, has a lower price, and includes beta tool use, the company said in its announcement.

The new model powers Anthropic’s Claude generative AI chatbot, so both free and paying users can take advantage of most of Claude 2.1’s improvements. However, the 200,000 token context window is for paying Pro users, while free users still have a 100,000 token limit — significantly higher than GPT-3.5’s 16,000.

Claude 2’s beta tool feature will allow developers to integrate APIs and defined functions with the Claude model, similar to what’s been available in OpenAI’s models.

Claude’s previous 100,000 token context window had been significantly ahead of OpenAI in that metric until last month, when OpenAI announced a preview version of GPT-4 Turbo with a 128,000 token context windows. However, only ChatGPT Plus customers with $20/month subscriptions can access that model in chatbot form. (Developers can pay per usage for access to the GPT-4 API.)

While a large context window — the amount of data it can process at a time — looks compelling if you have a large document or other information, it’s not clear that LLMs can process large amounts of data as well as info in a smaller chunk. Greg Kamradt, an AI practitioner and entrepreneur who’s been tracking this issue, has run what he calls “needle in a haystack” analysis to see if tiny pieces of info within a large document are actually found when the LLM is queried. He repeats the tests putting a random statement in various portions of a large document that’s fed into the LLM and queried.

“At 200K tokens (nearly 470 pages), Claude 2.1 was able to recall facts at some document depths,” he posted on X (formerly Twitter), noting that he had been granted early access to Claude 2.1. “Starting at ~90K tokens, performance of recall at the bottom of the document started to get increasingly worse.” GPT-4 did not have perfect recall at its largest context either.

Running the tests on Claude 2.1 cost about $1,000 in API calls (Anthropic offered credits so he could run the same tests he had done on GPT-4).

His conclusions: How you craft your prompts matters, don’t assume information will always be retrieved, and smaller inputs will yield better results.

In fact, many developers seeking to query information from large amounts of data create applications that split that data into smaller pieces in order to improve retrieval results, even if the context window would allow more.

Looking at the new model’s accuracy, in tests with what Anthropic called “a large set of complex, factual questions that probe known weaknesses in current models,” the company said Claude 2.1 featured a 2-times decrease in false statements compared with the previous version. The current model is more likely to say it doesn’t know instead of “hallucinating” or making something up, according to the Anthropic announcement. The company also cited “meaningful improvements” in comprehension and summarization.

by Sharon Machlis

Executive Editor, Data & Analytics

Sharon Machlis is Director of Editorial Data & Analytics at Foundry (the IDG, Inc. company that publishes websites including Computerworld and InfoWorld), where she analyzes data, codes in-house tools, and writes about data analysis tools and tips. She holds an Extra class amateur radio license and is somewhat obsessed with R. Her book Practical R for Mass Communication and Journalism was published by CRC Press.

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

Anthropic’s Claude 2.1 LLM turbocharges performance, offers beta tool use

Anthropic's Claude 2.1 large language model, which powers its Claude generative AI chatbot, raises the bar on how much information an LLM can ingest at once.

More from this author

Google Sheets power tips: Create an automatically updating spreadsheet

PDF to Excel conversion: Your ultimate guide to the best tools

Beginner’s guide to R: Syntax quirks you’ll want to know

Most popular authors

Show me more

The rise of AI-powered killer robot drones

Adobe brings AI image generation to Acrobat

How to use iCloud with Windows

Podcast: Apple joins the AI party with 'personal intelligence' tools

Podcast: Is the AI hype justified or will the bubble ‘burst’?

Podcast: Does age discrimination exist in the tech industry?

Apple joins the AI party with 'personal intelligence' tools

Is the AI hype justified or will the bubble 'burst'?

Does age discrimination exist in the tech industry?

Anthropic’s Claude 2.1 LLM turbocharges performance, offers beta tool use

Anthropic's Claude 2.1 large language model, which powers its Claude generative AI chatbot, raises the bar on how much information an LLM can ingest at once.

Related content

AR/VR headset sales decline is temporary: IDC

Apple's cautious AI strategy is absolutely right

Varjo wants you to create photorealistic VR ‘scenes’ with your phone

When it comes to AI, Apple is opening up for intelligence

From our editors straight to your inbox

More from this author

Google Sheets power tips: Create an automatically updating spreadsheet

PDF to Excel conversion: Your ultimate guide to the best tools

Beginner’s guide to R: Syntax quirks you’ll want to know

Most popular authors

Show me more

The rise of AI-powered killer robot drones

Adobe brings AI image generation to Acrobat

How to use iCloud with Windows

Podcast: Apple joins the AI party with 'personal intelligence' tools

Podcast: Is the AI hype justified or will the bubble ‘burst’?

Podcast: Does age discrimination exist in the tech industry?

Apple joins the AI party with 'personal intelligence' tools

Is the AI hype justified or will the bubble 'burst'?

Does age discrimination exist in the tech industry?