Anthropic’s New Claude Caching Feature Will Save Developers a Fortune

Anthropic’s New Claude Caching Feature Will Save Developers a Fortune

Join our every day and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI coverage. Learn more


Anthropic It was introduced fast caching in your APIwhich remembers context between API calls and allows developers to avoid repeating prompts.

The fast buffering function is available in public beta in Claude 3.5 Sonnet and Claude 3 Haiku, but support for the hottest Claude model, Opus, is coming soon.

- Advertisement -

Fast buffering, described in this text from 2023.allows users to retain continuously used contexts across their sessions. Because models remember these prompts, users can add additional information in the background without increasing overhead. This is helpful in cases where someone desires to send a great amount of context in a prompt and then consult with it in different conversations with the model. It also allows developers and other users to raised tune the model’s responses.

Anthropic reported that early users “have seen significant speed and cost improvements with caching across a variety of use cases—from including a full knowledge base to 100-shot examples, as well as including every turn of conversation in a tooltip.”

The company said potential use cases include reducing costs and delays for long instructions and document uploads for conversational agents, faster auto-completion of codes, delivering multiple instructions to agent search tools and embedding entire documents in a prompt.

Cached Prompt Prices

One of the advantages of message caching is lower per-token prices, with Anthropic stating that using cached messages “is significantly cheaper” than the base entry token price.

For Claude 3.5 Sonnet, writing a cached prompt will cost $3.75 per 1 million tokens (MTok), but using a cached prompt will cost $0.30 per MTok. The base price of input to Claude 3.5 Sonnet is $3/MTok, so by paying a little more up front, you may expect to avoid wasting 10x the next time you utilize a cached prompt.

Claude 3 Haiku users pays $0.30/MTok for caching and $0.03/MTok when using saved prompts.

While prompt caching is not yet available for Claude 3 Opus, Anthropic has already published its pricing. Writing to the cache will cost $18.75/MTok, but accessing the cached prompt will cost $1.50/MTok.

However, as AI expert Simon Willison noted on X, the Anthropic cache only has a lifespan of 5 minutes and is refreshed after each use.

Of course, this is not the first time Anthropic has tried to compete with other AI platforms through pricing. Before the release of the Claude 3 family of models, Anthropic lowered the prices of its tokens.

The company is currently engaged in a “race to the bottom” with rivals like Google and OpenAI when it involves offering inexpensive options for third-party developers building apps on its platform.

A highly requested feature

Other platforms offer a version of prompt caching. Lamina, an LLM inference system, uses KV caching to cut back the cost of GPUs. A cursory look at the OpenAI developer forums or GitHub will raise questions about tips on how to cache prompts.

Caching prompts is not the same as the ones in the memory of a large language model. For example, OpenAI’s GPT-4o offers a memory where the model remembers preferences or details. However, it does not store the actual prompts and responses, as in the case of caching prompts.

Latest Posts

Advertisement

More from this stream

Recomended