Anthropic's New Claude Caching Feature Will Save Developers a Fortune

Join our every day and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI coverage. Learn more

Anthropic It was introduced fast caching in your APIwhich remembers context between API calls and allows developers to avoid repeating prompts.

The fast buffering function is available in public beta in Claude 3.5 Sonnet and Claude 3 Haiku, but support for the hottest Claude model, Opus, is coming soon.

- Advertisement -

Fast buffering, described in this text from 2023.allows users to retain continuously used contexts across their sessions. Because models remember these prompts, users can add additional information in the background without increasing overhead. This is helpful in cases where someone desires to send a great amount of context in a prompt and then consult with it in different conversations with the model. It also allows developers and other users to raised tune the model’s responses.

Anthropic reported that early users “have seen significant speed and cost improvements with caching across a variety of use cases—from including a full knowledge base to 100-shot examples, as well as including every turn of conversation in a tooltip.”

The company said potential use cases include reducing costs and delays for long instructions and document uploads for conversational agents, faster auto-completion of codes, delivering multiple instructions to agent search tools and embedding entire documents in a prompt.

Anthropic (@AnthropicAI) just announced a groundbreaking change to its API: fast caching.
Imagine buffering like this: You’re at a coffee shop. The first time you go in, you have to inform the barista your entire order. But next time? Just say “the usual.”
This is fast… photo:twitter.com/ASB1nkdY4U
—Dan Shipper ? (@danshipper) August 14, 2024

Cached Prompt Prices

One of the advantages of message caching is lower per-token prices, with Anthropic stating that using cached messages “is significantly cheaper” than the base entry token price.

For Claude 3.5 Sonnet, writing a cached prompt will cost $3.75 per 1 million tokens (MTok), but using a cached prompt will cost $0.30 per MTok. The base price of input to Claude 3.5 Sonnet is $3/MTok, so by paying a little more up front, you may expect to avoid wasting 10x the next time you utilize a cached prompt.

We just implemented fast caching in the Anthropic API.
Reduces API input costs by as much as 90% and reduces latency by as much as 80%.
Here’s how it really works:
— Alex Albert (@alexalbert__) August 14, 2024

In terms of cost, the initial API call is barely costlier (resulting from the must cache the prompt), but all subsequent calls cost one tenth of the normal price. photo:twitter.com/3cPkz8c0rm
— Alex Albert (@alexalbert__) August 14, 2024

Claude 3 Haiku users pays $0.30/MTok for caching and $0.03/MTok when using saved prompts.

While prompt caching is not yet available for Claude 3 Opus, Anthropic has already published its pricing. Writing to the cache will cost $18.75/MTok, but accessing the cached prompt will cost $1.50/MTok.

However, as AI expert Simon Willison noted on X, the Anthropic cache only has a lifespan of 5 minutes and is refreshed after each use.

It looks much like Gemini context caching, but Anthropic’s pricing model is different
Gemini charges $4.50 per million tokens per hour to maintain the context cache warm
Anthropic fee for cache writes and “the cache has a 5-minute lifetime, refreshed every time the cached content is accessed” https://t.co/rfMQE2J3Rs
— Simon Willison (@simonw) August 14, 2024

Of course, this is not the first time Anthropic has tried to compete with other AI platforms through pricing. Before the release of the Claude 3 family of models, Anthropic lowered the prices of its tokens.

The company is currently engaged in a “race to the bottom” with rivals like Google and OpenAI when it involves offering inexpensive options for third-party developers building apps on its platform.

A highly requested feature

Other platforms offer a version of prompt caching. Lamina, an LLM inference system, uses KV caching to cut back the cost of GPUs. A cursory look at the OpenAI developer forums or GitHub will raise questions about tips on how to cache prompts.

Caching prompts is not the same as the ones in the memory of a large language model. For example, OpenAI’s GPT-4o offers a memory where the model remembers preferences or details. However, it does not store the actual prompts and responses, as in the case of caching prompts.

VB Daily

Stay up thus far! Get the latest news in your inbox every day

By subscribing, you conform to the VentureBeat Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occurred.

The 25-year-old founder of the police drone has just collected USD 75 million led by index

The Crypto Ripple startup buys a hidden road for USD 1.25 billion

Go to the stage in TechCrunch, disturbing 2025: submit a request to speak now

Why startups lose money using a record service employer

What I learned in my dream job that turned into a nightmare

How to write irresistible e -mail messages that require attention

5 tips on the construction of a company dealing with Color Me Mine CEO

Why your outdated network costs your business money

How to effectively delegate and unlock the full potential of the company

3 Tips on how to choose a trustworthy business partner each time

The balance between professional and private life is outdated. Here is a better approach.

CEO of the company’s 8-Figure says that you do not have to be an expert, that your company is developing-you only need this way...

This unconventional strategy is the secret to unprecedented business development

Your words only tell a fraction of history – here’s why the tone and body language really are more important

How to get a promotion using a 3-stage preparation strategy

Q1 Global Startup Funding will publish the strongest quarter from KW. 2 2022

Start funding is slowed down in February in connection with the uncertainty of the exit

The largest funding rounds of the week: Massive List of Saronic peaks

Nih funding uncertainty Spurs New Biotech Venture Fund

Cleantech Funding for a slow start in 2025

Anthropic’s New Claude Caching Feature Will Save Developers a Fortune

Cached Prompt Prices

A highly requested feature

Latest Posts

How to write irresistible e -mail messages that require attention

The 25-year-old founder of the police drone has just collected USD...

The balance between professional and private life is outdated. Here is...

Like intelligent glasses, the leaders connected without disturbing the focus

Reburn’s La Quimera FPS debuts on Steam on April 25

Minecraft movie goes on a global opening day (110) USD 110...

Super Agent Genspark raises the stake in the AI General Agent...

Sandboxaq adds USD 150 million from Google, Nvidia and others

Recomended

How to write irresistible e -mail messages that require attention

The 25-year-old founder of the police drone has just collected USD 75 million led by index

The balance between professional and private life is outdated. Here is a better approach.

Like intelligent glasses, the leaders connected without disturbing the focus

The Crypto Ripple startup buys a hidden road for USD 1.25 billion

Go to the stage in TechCrunch, disturbing 2025: submit a request to speak now