Caching

Anthropic prompt caching and Setu server-side caching in @ottocode/ai-sdk.

Anthropic Cache Control

By default, the SDK automatically injects cache_control: { type: "ephemeral" } on the first system block and the last message for Anthropic models. This saves ~90% on cached token costs.

// Default: auto caching (1 system + 1 message breakpoint)
createSetu({ auth });

// Disable completely
createSetu({ auth, cache: { anthropicCaching: false } });

// Manual: SDK won't inject cache_control — set it yourself in messages
createSetu({ auth, cache: { anthropicCaching: { strategy: "manual" } } });

// Custom breakpoint count and placement
createSetu({
  auth,
  cache: {
    anthropicCaching: {
      systemBreakpoints: 2,       // cache first 2 system blocks
      systemPlacement: "first",   // "first" | "last" | "all"
      messageBreakpoints: 3,      // cache last 3 messages
      messagePlacement: "last",   // "first" | "last" | "all"
    },
  },
});

// Full custom transform
createSetu({
  auth,
  cache: {
    anthropicCaching: {
      strategy: "custom",
      transform: (body) => {
        // modify body however you want
        return body;
      },
    },
  },
});

Options Reference

OptionDefaultDescription
strategy"auto""auto", "manual", "custom", or false
systemBreakpoints1Number of system blocks to cache
messageBreakpoints1Number of messages to cache
systemPlacement"first"Which system blocks: "first", "last", "all"
messagePlacement"last"Which messages: "first", "last", "all"
cacheType"ephemeral"The cache_control.type value

Setu Server-Side Caching

Provider-agnostic caching at the Setu proxy layer:

createSetu({
  auth,
  cache: {
    promptCacheKey: "my-session-123",
    promptCacheRetention: "in_memory", // or "24h"
  },
});

OpenAI / Google

  • OpenAI: Automatic server-side prefix caching — no configuration needed
  • Google: Requires pre-uploaded cachedContent at the application level