API ์š”์ฒญ ์‹œ ํ”„๋กฌํ”„ํŠธ์˜ ๋ฐ˜๋ณต๋˜๋Š” ์ ‘๋‘์‚ฌ(prefix)๋ฅผ ์บ์‹ฑํ•˜์—ฌ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์„ ์ค„์ด๋Š” ๊ธฐ๋Šฅ. ๊ฐ™์€ ๋‚ด์šฉ์„ ๋ฐ˜๋ณต ์ „์†กํ•  ๋•Œ ์ „์ฒด๋ฅผ ๋‹ค์‹œ ์ฒ˜๋ฆฌํ•˜์ง€ ์•Š๊ณ  ์บ์‹œ๋œ ๋ถ€๋ถ„์„ ์žฌ์‚ฌ์šฉํ•œ๋‹ค.

ํ•ด๋‹น ๊ฐœ๋…์ด ํ•„์š”ํ•œ ์ด์œ 

  • ๋น„์šฉ ์ ˆ๊ฐ: ์บ์‹œ ํžˆํŠธ ์‹œ ํ•ด๋‹น ํ† ํฐ์˜ ๋น„์šฉ์ด ๊ธฐ๋ณธ ์ž…๋ ฅ ๋น„์šฉ์˜ **10%**๋กœ ์ค„์–ด๋“ฆ (90% ์ ˆ๊ฐ)
  • ์‘๋‹ต ์†๋„ ํ–ฅ์ƒ: ์บ์‹œ๋œ ํ”„๋กฌํ”„ํŠธ๋Š” ์žฌ์ฒ˜๋ฆฌ๊ฐ€ ๋ถˆํ•„์š”ํ•˜์—ฌ time-to-first-token์ด ๊ฐœ์„ ๋จ
  • ๋ฐ˜๋ณต ์ž‘์—…์— ์ตœ์ : system prompt, tool ์ •์˜, ๊ธด ๋ฌธ์„œ ์ปจํ…์ŠคํŠธ ๋“ฑ ๋งค ์š”์ฒญ๋งˆ๋‹ค ๋™์ผํ•œ ๋‚ด์šฉ์„ ๋ณด๋‚ด๋Š” ๊ฒฝ์šฐ์— ํšจ๊ณผ์ 

AS-IS

sequenceDiagram
    autonumber
    participant Client as API Client
    participant API as Claude API

    Client->>API: ์š”์ฒญ 1: System(10๋งŒ ํ† ํฐ) + User("Mars ์„ค๋ช…ํ•ด์ค˜")
    Note over API: 10๋งŒ ํ† ํฐ ์ „์ฒด ์ฒ˜๋ฆฌ
    API-->>Client: ์‘๋‹ต

    Client->>API: ์š”์ฒญ 2: System(10๋งŒ ํ† ํฐ) + User("Jupiter ์„ค๋ช…ํ•ด์ค˜")
    Note over API: 10๋งŒ ํ† ํฐ ์ „์ฒด ๋‹ค์‹œ ์ฒ˜๋ฆฌ
    API-->>Client: ์‘๋‹ต

    Note over Client,API: ๋งค ์š”์ฒญ๋งˆ๋‹ค ๋™์ผํ•œ System 10๋งŒ ํ† ํฐ์„<br/>๋ฐ˜๋ณต ์ฒ˜๋ฆฌ โ†’ ๋น„์šฉ + ์ง€์—ฐ์‹œ๊ฐ„ ๋‚ญ๋น„

TO-BE

sequenceDiagram
    autonumber
    participant Client as API Client
    participant Cache as Prompt Cache
    participant API as Claude API

    Client->>API: ์š”์ฒญ 1: System(10๋งŒ ํ† ํฐ, cache_control) + User("Mars ์„ค๋ช…ํ•ด์ค˜")
    API->>Cache: System 10๋งŒ ํ† ํฐ ์บ์‹œ ์ €์žฅ (cache write)
    Note over API: ์ „์ฒด ์ฒ˜๋ฆฌ (์ตœ์ดˆ 1ํšŒ)
    API-->>Client: ์‘๋‹ต

    Client->>API: ์š”์ฒญ 2: System(10๋งŒ ํ† ํฐ, cache_control) + User("Jupiter ์„ค๋ช…ํ•ด์ค˜")
    API->>Cache: ์บ์‹œ ํ™•์ธ โ†’ ํžˆํŠธ!
    Note over API: System 10๋งŒ ํ† ํฐ์€ ์บ์‹œ์—์„œ ์ฝ๊ธฐ (10% ๋น„์šฉ)<br/>User ๋ฉ”์‹œ์ง€๋งŒ ์ƒˆ๋กœ ์ฒ˜๋ฆฌ
    API-->>Client: ์‘๋‹ต

    Note over Client,API: 10๋งŒ ํ† ํฐ ร— $3/MTok = $0.30 โ†’ $0.03๋กœ ์ ˆ๊ฐ (Sonnet ๊ธฐ์ค€)

๋™์ž‘ ์›๋ฆฌ

  1. cache_control ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํฌํ•จ๋œ ์š”์ฒญ์„ ๋ณด๋ƒ„
  2. ์‹œ์Šคํ…œ์ด ํ•ด๋‹น ํ”„๋กฌํ”„ํŠธ ์ ‘๋‘์‚ฌ(prefix)๊ฐ€ ์ด๋ฏธ ์บ์‹œ์— ์žˆ๋Š”์ง€ ํ™•์ธ
  3. ์บ์‹œ ํžˆํŠธ: ์บ์‹œ๋œ ๋ฒ„์ „์„ ์žฌ์‚ฌ์šฉ โ†’ ๋น„์šฉ ์ ˆ๊ฐ + ์†๋„ ํ–ฅ์ƒ
  4. ์บ์‹œ ๋ฏธ์Šค: ์ „์ฒด ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ , ์‘๋‹ต ์‹œ์ž‘ ์‹œ ์ ‘๋‘์‚ฌ๋ฅผ ์บ์‹œ์— ์ €์žฅ

์บ์‹œ๋Š” ์ ‘๋‘์‚ฌ ๊ธฐ๋ฐ˜์ด๋‹ค. tools โ†’ system โ†’ messages ์ˆœ์„œ๋กœ ์ฒ˜์Œ๋ถ€ํ„ฐ cache_control ์ง€์ ๊นŒ์ง€์˜ ๋ชจ๋“  ๋‚ด์šฉ์ด ์บ์‹œ ๋‹จ์œ„๊ฐ€ ๋œ๋‹ค.

๋น„์šฉ ๊ตฌ์กฐ

๊ธฐ๋ณธ ์ž…๋ ฅ5๋ถ„ ์บ์‹œ Write1์‹œ๊ฐ„ ์บ์‹œ Write์บ์‹œ Hit (Read)
๋ฐฐ์œจ1x1.25x2x0.1x
Sonnet ๊ธฐ์ค€$3/MTok$3.75/MTok$6/MTok$0.30/MTok
Haiku ๊ธฐ์ค€$1/MTok$1.25/MTok$2/MTok$0.10/MTok
  • ์ฒซ ์š”์ฒญ: cache write ๋น„์šฉ ๋ฐœ์ƒ (๊ธฐ๋ณธ์˜ 1.25๋ฐฐ)
  • ์ดํ›„ ์š”์ฒญ: cache hit ์‹œ ๊ธฐ๋ณธ์˜ 10% ๋น„์šฉ๋งŒ ๋ฐœ์ƒ
  • ์บ์‹œ ๊ฐฑ์‹ : ์บ์‹œ๋œ ๋‚ด์šฉ์„ ์‚ฌ์šฉํ•  ๋•Œ๋งˆ๋‹ค TTL์ด ๋ฌด๋ฃŒ๋กœ ๊ฐฑ์‹ ๋จ

์บ์‹œ ์ˆ˜๋ช… (TTL)

TTL๋น„์šฉ์šฉ๋„
5๋ถ„ (๊ธฐ๋ณธ)1.25x write๋นˆ๋ฒˆํ•œ ์š”์ฒญ (5๋ถ„ ์ด๋‚ด ๋ฐ˜๋ณต)
1์‹œ๊ฐ„2x write๊ฐ„ํ—์  ์š”์ฒญ (5๋ถ„~1์‹œ๊ฐ„ ๊ฐ„๊ฒฉ)

์บ์‹œ๋Š” ์‚ฌ์šฉํ•  ๋•Œ๋งˆ๋‹ค ๋ฌด๋ฃŒ๋กœ ๊ฐฑ์‹ ๋œ๋‹ค. 5๋ถ„ TTL ์บ์‹œ๋ฅผ 4๋ถ„๋งˆ๋‹ค ์‚ฌ์šฉํ•˜๋ฉด ์˜์›ํžˆ ์œ ์ง€.

2๊ฐ€์ง€ ์บ์‹ฑ ๋ฐฉ๋ฒ•

1. Automatic Caching (๊ถŒ์žฅ, ๊ฐ„๋‹จ)

์š”์ฒญ body์˜ ์ตœ์ƒ์œ„์— cache_control์„ ์ถ”๊ฐ€. ์‹œ์Šคํ…œ์ด ๋งˆ์ง€๋ง‰ ์บ์‹œ ๊ฐ€๋Šฅ ๋ธ”๋ก์— ์ž๋™์œผ๋กœ breakpoint๋ฅผ ์„ค์ •ํ•œ๋‹ค:

{
  "model": "claude-opus-4-6",
  "max_tokens": 1024,
  "cache_control": {"type": "ephemeral"},
  "system": "You are a helpful assistant.",
  "messages": [...]
}

๋ฉ€ํ‹ฐํ„ด ๋Œ€ํ™”์—์„œ ์บ์‹œ ํฌ์ธํŠธ๊ฐ€ ์ž๋™์œผ๋กœ ์ด๋™:

์š”์ฒญ์บ์‹œ ๋™์ž‘
์š”์ฒญ 1: System + User:A + Asst:B + User:C์ „์ฒด ์บ์‹œ ์ €์žฅ
์š”์ฒญ 2: โ€ฆ + Asst:D + User:ESystem~User:C๋Š” ์บ์‹œ ์ฝ๊ธฐ, Asst:D+User:E๋Š” ์ƒˆ๋กœ ์ €์žฅ
์š”์ฒญ 3: โ€ฆ + Asst:F + User:GSystem~User:E๋Š” ์บ์‹œ ์ฝ๊ธฐ, Asst:F+User:G๋Š” ์ƒˆ๋กœ ์ €์žฅ

2. Explicit Cache Breakpoints (์„ธ๋ฐ€ํ•œ ์ œ์–ด)

๊ฐœ๋ณ„ content block์— cache_control์„ ์ง์ ‘ ๋ฐฐ์น˜. ์ตœ๋Œ€ 4๊ฐœ breakpoint ์‚ฌ์šฉ ๊ฐ€๋Šฅ:

{
  "tools": [
    {"name": "search", "...", "cache_control": {"type": "ephemeral"}}
  ],
  "system": [
    {"type": "text", "text": "์ง€์นจ...", "cache_control": {"type": "ephemeral"}},
    {"type": "text", "text": "๋ฌธ์„œ...", "cache_control": {"type": "ephemeral"}}
  ],
  "messages": [
    {"role": "user", "content": [
      {"type": "text", "text": "์งˆ๋ฌธ", "cache_control": {"type": "ephemeral"}}
    ]}
  ]
}

๋ณ€๊ฒฝ ๋นˆ๋„๊ฐ€ ๋‹ค๋ฅธ ์„น์…˜์„ ๋…๋ฆฝ์ ์œผ๋กœ ์บ์‹ฑํ•  ๋•Œ ์œ ์šฉ:

  • Breakpoint 1: Tool ์ •์˜ (๊ฑฐ์˜ ์•ˆ ๋ณ€ํ•จ)
  • Breakpoint 2: System ์ง€์นจ (๊ฑฐ์˜ ์•ˆ ๋ณ€ํ•จ)
  • Breakpoint 3: RAG ๋ฌธ์„œ (๋งค์ผ ๋ณ€๊ฒฝ)
  • Breakpoint 4: ๋Œ€ํ™” ํžˆ์Šคํ† ๋ฆฌ (๋งค ํ„ด ๋ณ€๊ฒฝ)

์บ์‹œ ๋ฌดํšจํ™” ์กฐ๊ฑด

์บ์‹œ๋Š” tools โ†’ system โ†’ messages ๊ณ„์ธต ๊ตฌ์กฐ๋ฅผ ๋”ฐ๋ฅธ๋‹ค. ์ƒ์œ„ ๋ ˆ๋ฒจ ๋ณ€๊ฒฝ ์‹œ ํ•˜์œ„ ๋ ˆ๋ฒจ ์บ์‹œ๋„ ๋ฌดํšจํ™”๋œ๋‹ค:

๋ณ€๊ฒฝ ์‚ฌํ•ญTools ์บ์‹œSystem ์บ์‹œMessages ์บ์‹œ
Tool ์ •์˜ ๋ณ€๊ฒฝ๋ฌดํšจ๋ฌดํšจ๋ฌดํšจ
Web search ํ† ๊ธ€์œ ์ง€๋ฌดํšจ๋ฌดํšจ
Tool choice ๋ณ€๊ฒฝ์œ ์ง€์œ ์ง€๋ฌดํšจ
์ด๋ฏธ์ง€ ์ถ”๊ฐ€/์ œ๊ฑฐ์œ ์ง€์œ ์ง€๋ฌดํšจ

์ตœ์†Œ ์บ์‹œ ํ† ํฐ ์ˆ˜

๋ชจ๋ธ๋ณ„๋กœ ์บ์‹ฑ ๊ฐ€๋Šฅํ•œ ์ตœ์†Œ ํ† ํฐ ์ˆ˜๊ฐ€ ๋‹ค๋ฅด๋‹ค. ์ด๋ณด๋‹ค ์งง์œผ๋ฉด cache_control์„ ์„ค์ •ํ•ด๋„ ์บ์‹ฑ๋˜์ง€ ์•Š๋Š”๋‹ค:

๋ชจ๋ธ์ตœ์†Œ ํ† ํฐ
Claude Opus 4.6 / 4.54,096
Claude Sonnet 4.6 / 4.5 / 4.1 / 41,024
Claude Haiku 4.54,096
Claude Haiku 3.5 / 32,048

Claude Code์—์„œ์˜ Prompt Caching

Claude Code๋Š” ๋ชจ๋“  ๋ฐฐํฌ ์˜ต์…˜(Anthropic ์ง์ ‘, Bedrock, Vertex AI, Foundry)์—์„œ Prompt Caching์ด ๊ธฐ๋ณธ ํ™œ์„ฑํ™”๋˜์–ด ์žˆ๋‹ค.

  • ๋ณ„๋„ ์„ค์ • ๋ถˆํ•„์š” (๊ธฐ๋ณธ ํ™œ์„ฑํ™”)
  • ๋น„ํ™œ์„ฑํ™”๊ฐ€ ํ•„์š”ํ•œ ๊ฒฝ์šฐ: export DISABLE_PROMPT_CACHING=1
  • ์ผ๋ถ€ ๋ฆฌ์ „์—์„œ ์‚ฌ์šฉ ๋ถˆ๊ฐ€๋Šฅํ•  ์ˆ˜ ์žˆ์Œ

์ฐธ๊ณ  ๋ฌธ์„œ