์‹œ๋ฆฌ์ฆˆ: LLM Tool Calling ๋‚ด๋ถ€ ์›๋ฆฌ๋ถ€ํ„ฐ ์—์ด์ „ํŠธ ์ง์ ‘ ๊ตฌํ˜„๊นŒ์ง€

์ด ์‹œ๋ฆฌ์ฆˆ๋Š” ์‚ฌ์šฉ์ž์˜ ์ž์—ฐ์–ด ํ•œ ์ค„์ด tool ์‹คํ–‰์œผ๋กœ ๋ฐ”๋€Œ๋Š” ๋‚ด๋ถ€ ์ฒ˜๋ฆฌ ๊ณผ์ •์„ ๋‹จ๊ณ„๋ณ„๋กœ ํ•ด๋ถ€ํ•˜๊ณ , ์ตœ์ข…์ ์œผ๋กœ ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ + ์ž์ฒด middleware๋กœ ๋‚˜๋งŒ์˜ ์—์ด์ „ํŠธ๋ฅผ ์ง์ ‘ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ๊นŒ์ง€ ๋„๋‹ฌํ•˜๋Š” ๊ณผ์ •์ด๋‹ค.

ํŽธ๋‚ด์šฉํ•ต์‹ฌ
1ํŽธ์ „์ฒด ์กฐ๊ฐ๋„์ž์—ฐ์–ด โ†’ tool ์‹คํ–‰๊นŒ์ง€ 5๊ฐœ ๋ ˆ์ด์–ด์˜ ์กด์žฌ๋ฅผ ํ™•์ธ
2ํŽธChat TemplateJSON์ด ๋ชจ๋ธ์— ์ง์ ‘ ๋“ค์–ด๊ฐ€์ง€ ์•Š๋Š”๋‹ค
3ํŽธTokenization๋ชจ๋ธ์€ ํ…์ŠคํŠธ๋ฅผ ์ฝ์ง€ ๋ชปํ•œ๋‹ค - ํ† ํฐ ID์™€ control token
4ํŽธ๋ชจ๋ธ ์ถ”๋ก โ€tool์„ ์“ธ๊นŒ ๋ง๊นŒโ€ ํŒ๋‹จ๊ณผ constrained decoding
5ํŽธTool ์‹คํ–‰tool_use๋ฅผ ๋ฐ›์€ ํด๋ผ์ด์–ธํŠธ์˜ ์‹คํ–‰ ๋ฃจํ”„
6ํŽธ (๋ณธ๋ฌธ)Native vs Non-native๊ฐ™์€ ๊ธฐ๋Šฅ, ๋‹ค๋ฅธ ๊ตฌ์กฐ โ†’ Middleware
7ํŽธMiddleware ๋งŒ๋“ค๊ธฐํ”„๋กฌํ”„ํŠธ ์กฐ๋ฆฝ + ์ถœ๋ ฅ ํŒŒ์‹ฑ + ์‹คํ–‰ ๋ฃจํ”„ ์ง์ ‘ ๊ตฌํ˜„
8ํŽธ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ ๋กœ์ปฌ ๊ตฌ์ถ•Ollama/vLLM์œผ๋กœ ๋กœ์ปฌ LLM ์„œ๋น™
9ํŽธ๋‚˜๋งŒ์˜ ์—์ด์ „ํŠธ๋ชจ๋ธ + Middleware = ์—์ด์ „ํŠธ ์™„์„ฑ

  • Native vs Non-native๋Š” 2~5ํŽธ์—์„œ ๋‹ค๋ฃฌ tool calling ๋ ˆ์ด์–ด(Chat Template, Tokenization, ๋ชจ๋ธ ์ถ”๋ก , Constrained Decoding, ์ถœ๋ ฅ ํŒŒ์‹ฑ, Tool ์‹คํ–‰ ๋ฃจํ”„)๊ฐ€ ๋ชจ๋ธ/API์— ๋‚ด์žฅ๋˜์–ด ์žˆ๋Š”์ง€ ์—ฌ๋ถ€์— ๋”ฐ๋ฅธ ๊ตฌ์กฐ์  ์ฐจ์ด
  • Claude, GPT ๊ฐ™์€ ์ƒ์šฉ ๋ชจ๋ธ์€ ์ด ๋ ˆ์ด์–ด๊ฐ€ API ์„œ๋ฒ„์— ๋ชจ๋‘ ๋‚ด์žฅ๋˜์–ด ์žˆ์ง€๋งŒ, ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ(Llama, Gemma ๋“ฑ)์€ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋งŒ ์กด์žฌํ•˜๊ณ  ๋‚˜๋จธ์ง€ ๋ ˆ์ด์–ด๊ฐ€ ์—†๋Š” ๊ตฌ์กฐ
  • ์ด ๊ฒฉ์ฐจ๋ฅผ ์™ธ๋ถ€์—์„œ ๋ฉ”์šฐ๋Š” ๊ฒƒ์ด Middleware์ด๋ฉฐ, harness ์„ค๊ณ„๊ฐ€ ๋ชจ๋ธ ์„ฑ๋Šฅ์— ๊ฒฐ์ •์  ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค๋Š” ๊ฒƒ์ด ์ตœ๊ทผ ์—ฐ๊ตฌ์˜ ํ•ต์‹ฌ ๋ฐœ๊ฒฌ

ํ•ด๋‹น ๊ฐœ๋…์ด ํ•„์š”ํ•œ ์ด์œ 

  • 5ํŽธ๊นŒ์ง€ tool calling ํŒŒ์ดํ”„๋ผ์ธ์˜ ๋ชจ๋“  ๋ ˆ์ด์–ด๋ฅผ ํ™•์ธํ–ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋ ˆ์ด์–ด๋“ค์ด ํ•ญ์ƒ ์กด์žฌํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋‹ค
  • ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ์„ ๋กœ์ปฌ์—์„œ ์‹คํ–‰ํ•  ๋•Œ โ€œtool calling์ด ์•ˆ ๋œ๋‹คโ€๋Š” ๊ฒƒ์ด ๊ตฌ์ฒด์ ์œผ๋กœ ์–ด๋–ค ๋ ˆ์ด์–ด๊ฐ€ ์—†๋‹ค๋Š” ๋œป์ธ์ง€ ์ดํ•ดํ•ด์•ผ, 7ํŽธ์—์„œ middleware๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค
  • ๋˜ํ•œ, harness(tool ํฌ๋งท) ์„ค๊ณ„๋งŒ ๋ฐ”๊ฟ”๋„ ๋ชจ๋ธ ์„ฑ๋Šฅ์ด 10๋ฐฐ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค์€ ๊ฐœ๋ฐœ์ž์—๊ฒŒ ์‹ค์งˆ์  ๊ฐ€์น˜๊ฐ€ ์žˆ๋‹ค

AS-IS

sequenceDiagram
    autonumber
    box ๊ฐœ๋ฐœ์ž ์˜์—ญ
        participant Dev as ๊ฐœ๋ฐœ์ž
    end
    box AI ์„œ๋น„์Šค ์˜์—ญ
        participant API as API ์„œ๋ฒ„
        participant CT as Chat Template
        participant TK as Tokenizer
        participant LLM as ๋ชจ๋ธ
        participant CD as Constrained Decoding
        participant P as ์ถœ๋ ฅ ํŒŒ์„œ
    end

    Dev->>API: tools + ์งˆ๋ฌธ
    API->>CT: JSON โ†’ ํ…์ŠคํŠธ
    CT->>TK: ํ…์ŠคํŠธ โ†’ ํ† ํฐ
    TK->>LLM: ์ถ”๋ก 
    LLM->>CD: valid JSON ๊ฐ•์ œ
    CD->>P: ํŒŒ์‹ฑ
    P->>API: structured JSON
    API-->>Dev: tool_use ์‘๋‹ต
    Note over API,P: ๋ชจ๋“  ๋ ˆ์ด์–ด๊ฐ€ ๋‚ด์žฅ๋˜์–ด ์žˆ๋‹ค

TO-BE

sequenceDiagram
    autonumber
    box ๊ฐœ๋ฐœ์ž ์˜์—ญ
        participant Dev as ๊ฐœ๋ฐœ์ž
    end
    box ???
        participant Q as ๋ˆ„๊ฐ€ ์ฒ˜๋ฆฌํ•˜์ง€?
    end
    box ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ
        participant LLM as ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋งŒ ์กด์žฌ
    end

    Dev->>Q: tools + ์งˆ๋ฌธ
    Note over Q: Chat Template?<br/>Tokenizer?<br/>Constrained Decoding?<br/>์ถœ๋ ฅ ํŒŒ์‹ฑ?
    Q->>LLM: ???
    LLM-->>Q: ???
    Q-->>Dev: ???

Native Tool Calling ๋ชจ๋ธ์˜ ๊ตฌ์กฐ

Claude, GPT ๊ฐ™์€ ์ƒ์šฉ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ๋•Œ, ๊ฐœ๋ฐœ์ž๋Š” API์— JSON์„ ๋ณด๋‚ด๋ฉด ๋œ๋‹ค. ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ API ์„œ๋ฒ„๊ฐ€ ์ฒ˜๋ฆฌํ•œ๋‹ค:

flowchart LR
    subgraph API์„œ๋ฒ„["API ์„œ๋ฒ„ (์ž…๋ ฅ ์ฒ˜๋ฆฌ)"]
        CT[Chat Template]
        TK[Tokenizer]
    end
    subgraph ๋ชจ๋ธ
        LLM["๋ชจ๋ธ ์ถ”๋ก <br/>(tool call fine-tuning ์™„๋ฃŒ)"]
    end
    subgraph API์„œ๋ฒ„2["API ์„œ๋ฒ„ (์ถœ๋ ฅ ์ฒ˜๋ฆฌ)"]
        CD[Constrained Decoding]
        P[์ถœ๋ ฅ ํŒŒ์„œ]
    end

    CT --> TK --> LLM --> CD --> P

๊ฐ ๋ ˆ์ด์–ด์˜ ์—ญํ•  (2~5ํŽธ ์š”์•ฝ):

  • Chat Template: tool JSON โ†’ ์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ ํ…์ŠคํŠธ ๋ณ€ํ™˜
  • Tokenizer: ํ…์ŠคํŠธ โ†’ ํ† ํฐ ID + control token ์ฒ˜๋ฆฌ
  • ๋ชจ๋ธ ์ถ”๋ก : fine-tuning์œผ๋กœ ํ•™์Šต๋œ tool call ํŒจํ„ด ๊ธฐ๋ฐ˜ ํŒ๋‹จ
  • Constrained Decoding: JSON schema์— ๋งž๋Š” ํ† ํฐ๋งŒ ํ—ˆ์šฉ
  • ์ถœ๋ ฅ ํŒŒ์„œ: ์ƒ์„ฑ๋œ ํ† ํฐ โ†’ structured JSON ๋ณ€ํ™˜

Non-native ๋ชจ๋ธ์— ์—†๋Š” ๊ฒƒ

์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ(Llama, Gemma, Mistral ๋กœ์ปฌ ๋“ฑ)์„ HuggingFace์—์„œ ๋‹ค์šด๋กœ๋“œํ•˜๋ฉด, ๋ฐ›๋Š” ๊ฒƒ์€ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜ ํŒŒ์ผ์ด๋‹ค. ์œ„ ๋ ˆ์ด์–ด๋“ค์€ ํฌํ•จ๋˜์–ด ์žˆ์ง€ ์•Š๋‹ค:

๋ ˆ์ด์–ดNative (Claude, GPT)Non-native (Llama ๋กœ์ปฌ)
Chat TemplateAPI ์„œ๋ฒ„์— ๋‚ด์žฅ์—†์Œ. ๊ฐœ๋ฐœ์ž๊ฐ€ ์ง์ ‘ ํ”„๋กฌํ”„ํŠธ ์กฐ๋ฆฝ
Control Tokenvocabulary์— ํฌํ•จ๋ชจ๋ธ์— ๋”ฐ๋ผ ๋‹ค๋ฆ„. ์—†์„ ์ˆ˜ ์žˆ์Œ
Tool call fine-tuning์™„๋ฃŒ๋จ์—†๊ฑฐ๋‚˜ ๋ถˆ์™„์ „. ๋ฒ”์šฉ ๋ชจ๋ธ์€ tool call ํŒจํ„ด ๋ฏธํ•™์Šต
Constrained DecodingAPI ์„œ๋ฒ„์— ๋‚ด์žฅ์—†์Œ. ๋ชจ๋ธ์ด ์•„๋ฌด ํ† ํฐ์ด๋‚˜ ์ƒ์„ฑ ๊ฐ€๋Šฅ
์ถœ๋ ฅ ํŒŒ์„œAPI ์„œ๋ฒ„์— ๋‚ด์žฅ์—†์Œ. ๋ชจ๋ธ ์ถœ๋ ฅ์—์„œ tool call์„ ์ง์ ‘ ์ถ”์ถœํ•ด์•ผ ํ•จ
Tool ์‹คํ–‰ ๋ฃจํ”„SDK์— ๋‚ด์žฅ (toolRunner ๋“ฑ)์—†์Œ. ์ง์ ‘ ๊ตฌํ˜„

ํ•ต์‹ฌ: Non-native ๋ชจ๋ธ์—์„œ tool calling์„ ํ•˜๋ ค๋ฉด, ์ด ๋ชจ๋“  ๋ ˆ์ด์–ด๋ฅผ ์™ธ๋ถ€์—์„œ ๊ตฌํ˜„ํ•ด์•ผ ํ•œ๋‹ค. ์ด๊ฒƒ์„ ๋‹ด๋‹นํ•˜๋Š” ๊ฒƒ์ด Middleware๋‹ค.

Middleware - ์—†๋Š” ๋ ˆ์ด์–ด๋ฅผ ์™ธ๋ถ€์—์„œ ๋ฉ”์šด๋‹ค

sequenceDiagram
    autonumber
    box ๊ฐœ๋ฐœ์ž ์˜์—ญ
        participant Dev as ๊ฐœ๋ฐœ์ž
    end
    box Middleware (์™ธ๋ถ€ ๊ตฌํ˜„)
        participant CT as ํ”„๋กฌํ”„ํŠธ ์กฐ๋ฆฝ
        participant P as ์ถœ๋ ฅ ํŒŒ์‹ฑ
    end
    box ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ
        participant LLM as ๋ชจ๋ธ ๊ฐ€์ค‘์น˜
    end

    Dev->>CT: tools ์ •์˜ + "์„œ์šธ ๋‚ ์”จ ์•Œ๋ ค์ค˜"
    CT->>LLM: tool ์ •์˜๋ฅผ ํฌํ•จํ•œ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ
    LLM-->>P: ํ…์ŠคํŠธ ์ถœ๋ ฅ (XML ๋“ฑ ํŒจํ„ด ํฌํ•จ)
    P->>P: tool call ํŒจํ„ด ๊ฐ์ง€ ๋ฐ JSON ์ถ”์ถœ
    P-->>Dev: tool_use: get_weather("Seoul")
    Note over Dev: tool ์‹คํ–‰ ํ›„ ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜
    Dev->>CT: tool_result: "15ยฐC, ๋ง‘์Œ"
    CT->>LLM: ๊ฒฐ๊ณผ ํฌํ•จ ํ”„๋กฌํ”„ํŠธ
    LLM-->>P: ์ตœ์ข… ํ…์ŠคํŠธ ์‘๋‹ต
    P-->>Dev: "์„œ์šธ์˜ ํ˜„์žฌ ๋‚ ์”จ๋Š” 15ยฐC์ž…๋‹ˆ๋‹ค"

Middleware๊ฐ€ ๋‹ด๋‹นํ•˜๋Š” 3๊ฐ€์ง€ ์—ญํ• :

  1. ํ”„๋กฌํ”„ํŠธ ์กฐ๋ฆฝ: tool ์ •์˜๋ฅผ ๋ชจ๋ธ์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜ (Chat Template ๋Œ€์ฒด)
  2. ์ถœ๋ ฅ ํŒŒ์‹ฑ: ๋ชจ๋ธ์ด ์ƒ์„ฑํ•œ ํ…์ŠคํŠธ์—์„œ tool call ํŒจํ„ด์„ ๊ฐ์ง€ํ•˜๊ณ  JSON์œผ๋กœ ์ถ”์ถœ
  3. ์‹คํ–‰ ๋ฃจํ”„: tool ์‹คํ–‰ โ†’ ๊ฒฐ๊ณผ๋ฅผ ๋ชจ๋ธ์— ํ”ผ๋“œ๋ฐฑ โ†’ ์ถ”๊ฐ€ tool call ํ™•์ธ โ†’ ๋ฐ˜๋ณต

์‹ค์ œ ๊ตฌํ˜„์ฒด: ai-sdk-tool-call-middleware

Vercel AI SDK์˜ middleware ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜๋ฉด, native tool calling์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ๋ชจ๋ธ์—๋„ tool calling์„ ๋ถ™์ผ ์ˆ˜ ์žˆ๋‹ค:

import { wrapLanguageModel } from "ai";
import { toolCallMiddleware } from "@ai-sdk-tool/parser";
 
// ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ์„ middleware๋กœ ๊ฐ์‹ธ๊ธฐ
const wrappedModel = wrapLanguageModel({
  model: ollamaModel,  // tool calling ๋ฏธ์ง€์› ๋ชจ๋ธ
  middleware: toolCallMiddleware({
    protocol: "xml",  // XML ํ˜•์‹์œผ๋กœ tool call ์ธ์ฝ”๋”ฉ
  }),
});
 
// ์ด์ œ native tool calling์ฒ˜๋Ÿผ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
const result = await generateText({
  model: wrappedModel,
  tools: { /* tool ์ •์˜ */ },
  prompt: "์„œ์šธ ๋‚ ์”จ ์•Œ๋ ค์ค˜",
});

middleware๊ฐ€ ๋‚ด๋ถ€์ ์œผ๋กœ ํ•˜๋Š” ์ผ:

  1. tools JSON์„ XML ํ˜•์‹์˜ ์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ๋กœ ๋ณ€ํ™˜ (Chat Template ๋Œ€์ฒด)
  2. ๋ชจ๋ธ ์ถœ๋ ฅ์—์„œ <tool_call> ํŒจํ„ด์„ ๊ฐ์ง€ํ•˜์—ฌ ํŒŒ์‹ฑ (์ถœ๋ ฅ ํŒŒ์„œ ๋Œ€์ฒด)
  3. ํŒŒ์‹ฑ๋œ ๊ฒฐ๊ณผ๋ฅผ AI SDK์˜ tool_use ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜

๋ณด์•ˆ: control token์ด ์—†๋‹ค๋Š” ๊ฒƒ์˜ ์˜๋ฏธ

3ํŽธ์—์„œ Native ๋ชจ๋ธ์€ control token์œผ๋กœ tool ์ •์˜ ์˜์—ญ์„ ๋ณดํ˜ธํ•œ๋‹ค๊ณ  ์„ค๋ช…ํ–ˆ๋‹ค. Middleware ๋ฐฉ์‹์—์„œ๋Š” control token์ด ์—†๋‹ค. tool ์ •์˜๊ฐ€ ์ผ๋ฐ˜ ํ…์ŠคํŠธ๋กœ ํ”„๋กฌํ”„ํŠธ์— ์‚ฝ์ž…๋˜๋ฏ€๋กœ, ์ด๋ก ์ ์œผ๋กœ ์‚ฌ์šฉ์ž๊ฐ€ ํ”„๋กฌํ”„ํŠธ์— <tool_call> ๊ฐ™์€ ํŒจํ„ด์„ ์‚ฝ์ž…ํ•˜์—ฌ injectionํ•  ์œ„ํ—˜์ด Native๋ณด๋‹ค ๋†’๋‹ค.

๋‹ค๋งŒ ์™„์ „ํžˆ ๋ฌด๋ฐฉ๋น„๋Š” ์•„๋‹ˆ๋‹ค. middleware๋Š” ์ถœ๋ ฅ ์ธก์—์„œ ์ •ํ•ด์ง„ XML ํŒจํ„ด๋งŒ ๊ฐ์ง€ํ•˜๊ณ , ์ž…๋ ฅ์˜ ์‚ฌ์šฉ์ž ๋ฉ”์‹œ์ง€ ์˜์—ญ๊ณผ ์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ ์˜์—ญ์„ ๋ถ„๋ฆฌํ•˜์—ฌ ์ฒ˜๋ฆฌํ•œ๋‹ค. ํ•˜์ง€๋งŒ Native ๋ชจ๋ธ์˜ control token ์ˆ˜์ค€์˜ ๋ณด์•ˆ์€ ์ œ๊ณตํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.

๋ณด์•ˆ์ด ํŠนํžˆ ์ค‘์š”ํ•œ B2B ํ™˜๊ฒฝ์ด๋ผ๋ฉด, ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ์„ ์˜จํ”„๋ ˆ๋ฏธ์Šค๋กœ ๋ฐฐํฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ์™ธ๋ถ€ ๋ฒค๋” ์„œ๋ฒ„๋กœ ์ „์†ก๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ, injection ์œ„ํ—˜๊ณผ ๋ณ„๊ฐœ๋กœ ๋ฐ์ดํ„ฐ ์œ ์ถœ ์ž์ฒด๋ฅผ ์›์ฒœ ์ฐจ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค.

Vercel AI SDK์˜ middleware๋„ ์ด ๊ตฌ์„ฑ์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค. middleware๋Š” npm ํŒจํ‚ค์ง€(ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ)์ผ ๋ฟ์ด๋ฏ€๋กœ, ์–ด๋””์„œ ์‹คํ–‰๋˜๋А๋ƒ๋Š” ๊ฐœ๋ฐœ์ž๊ฐ€ ๊ฒฐ์ •ํ•œ๋‹ค. ๊ณ ๊ฐ์‚ฌ ์„œ๋ฒ„์—์„œ ์‹คํ–‰ํ•˜๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ Vercel ์„œ๋ฒ„๋กœ ์ „์†ก๋˜์ง€ ์•Š๋Š”๋‹ค.

๊ฐ ๊ตฌ์„ฑ์š”์†Œ์˜ ์‹คํ–‰ ์œ„์น˜

ํ˜ผ๋™ํ•˜๊ธฐ ์‰ฌ์šด ์ : โ€œVercel AI SDKโ€๋ผ๋Š” ์ด๋ฆ„ ๋•Œ๋ฌธ์— Vercel ํด๋ผ์šฐ๋“œ๋ฅผ ๊ฒฝ์œ ํ•œ๋‹ค๊ณ  ์˜คํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ SDK๋„, middleware๋„, vLLM๋„ ๋ชจ๋‘ ์ž์ฒด ์„œ๋ฒ„์—์„œ ์‹คํ–‰๋˜๋Š” ์†Œํ”„ํŠธ์›จ์–ด๋‹ค.

๊ตฌ์„ฑ์š”์†Œ์ •์ฒด์„ค์น˜ ๋ฐฉ๋ฒ•๋ฐ์ดํ„ฐ๊ฐ€ ์™ธ๋ถ€๋กœ ๋‚˜๊ฐ€๋Š”๊ฐ€
vLLM๋กœ์ปฌ ์„œ๋น™ ํ”„๋ ˆ์ž„์›Œํฌpip install vllm โ†’ ์ž์ฒด ์„œ๋ฒ„์—์„œ ์‹คํ–‰๋‚˜๊ฐ€์ง€ ์•Š์Œ. localhost์—์„œ API ์ œ๊ณต
Vercel AI SDKnpm ํŒจํ‚ค์ง€ (ํด๋ผ์ด์–ธํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ)npm install ai โ†’ ์ž์ฒด ์ฝ”๋“œ์—์„œ import๋‚˜๊ฐ€์ง€ ์•Š์Œ. Vercel ํด๋ผ์šฐ๋“œ์™€ ๋ฌด๊ด€
@ai-sdk-tool/parsernpm ํŒจํ‚ค์ง€ (middleware)npm install @ai-sdk-tool/parser๋‚˜๊ฐ€์ง€ ์•Š์Œ. ๋กœ์ปฌ์—์„œ ํ”„๋กฌํ”„ํŠธ ๋ณ€ํ™˜/ํŒŒ์‹ฑ

๋ฐ์ดํ„ฐ๊ฐ€ ์™ธ๋ถ€๋กœ ๋‚˜๊ฐ€๋А๋ƒ๋Š” SDK ์ž์ฒด๊ฐ€ ์•„๋‹ˆ๋ผ, SDK์— ์–ด๋–ค ๋ชจ๋ธ์„ ์—ฐ๊ฒฐํ•˜๋А๋ƒ์— ๋”ฐ๋ผ ๊ฒฐ์ •๋œ๋‹ค:

anthropic("claude-opus-4-6")                                   โ†’ Anthropic ์„œ๋ฒ„๋กœ ์š”์ฒญ (์™ธ๋ถ€ ๋‚˜๊ฐ)
ollama("qwen3:8b")                                             โ†’ localhost Ollama๋กœ ์š”์ฒญ (์™ธ๋ถ€ ์•ˆ ๋‚˜๊ฐ)
createOpenAICompatible({ baseURL: "http://๋‚ด์„œ๋ฒ„:8000/v1" })     โ†’ ์ž์ฒด vLLM์œผ๋กœ ์š”์ฒญ (์™ธ๋ถ€ ์•ˆ ๋‚˜๊ฐ)

์˜จํ”„๋ ˆ๋ฏธ์Šค ๊ตฌ์„ฑ ์˜ˆ์‹œ

์ž์ฒด ์„œ๋ฒ„ (๋‚ด๋ถ€๋ง)
โ”œโ”€โ”€ vLLM (๋ชจ๋ธ ์„œ๋น™)             โ† pip install, localhost:8000
โ”œโ”€โ”€ Vercel AI SDK (npm ํŒจํ‚ค์ง€)   โ† npm install, ๊ฐ™์€ ์„œ๋ฒ„์—์„œ ์‹คํ–‰
โ”œโ”€โ”€ @ai-sdk-tool/parser (npm)   โ† npm install, ๊ฐ™์€ ์„œ๋ฒ„์—์„œ ์‹คํ–‰
โ””โ”€โ”€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ฝ”๋“œ              โ† tool ์ •์˜ + ์‹คํ–‰ ํ•จ์ˆ˜

๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ ๊ฐ์‚ฌ ๋‚ด๋ถ€๋ง์—์„œ ์ฒ˜๋ฆฌ, ์™ธ๋ถ€ ํ†ต์‹  ์—†์Œ.

Harness ์„ค๊ณ„๊ฐ€ ์„ฑ๋Šฅ์„ ๊ฒฐ์ •ํ•œ๋‹ค

AI harness๋Š” โ€œ๋ชจ๋ธ์„ โ€˜๋งํ•˜๋Š” ๋‡Œโ€™์—์„œ โ€˜์ผํ•˜๋Š” ์‹œ์Šคํ…œโ€™์œผ๋กœ ๋งŒ๋“œ๋Š” ์ „์ฒด ์‹คํ–‰ ํ”„๋ ˆ์ž„(์Šค์บํด๋”ฉ)โ€œ์ด๋‹ค. Orchestration, Memory, Tools, Verification, Artifacts๋ฅผ ๋ชจ๋‘ ํฌํ•จํ•œ๋‹ค (Anthropic, The Harness Problem ๋ชจ๋‘ ๋™์ผํ•œ ๋„“์€ ์˜๋ฏธ๋กœ ์ •์˜).

The Harness Problem ๋ธ”๋กœ๊ทธ๋Š” ์ด harness์˜ ๊ตฌ์„ฑ์š”์†Œ ์ค‘ Tools ๋ ˆ์ด์–ด์˜ ํ˜•์‹ ์„ค๊ณ„๋ฅผ ์‹คํ—˜์œผ๋กœ ๊ฒ€์ฆํ–ˆ๋‹ค. โ€œ์ฝ”๋“œ ํŽธ์ง‘โ€์ด๋ผ๋Š” ๊ณผ์ œ์—์„œ, tool์ด ์ˆ˜์ • ์‚ฌํ•ญ์„ ์–ด๋–ค ํ˜•์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋А๋ƒ์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์ด ๊ทน์ ์œผ๋กœ ๋‹ฌ๋ผ์ง„๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.

harness (์ „์ฒด ์‹คํ–‰ ํ”„๋ ˆ์ž„)
โ”œโ”€โ”€ Orchestration (๊ณ„ํš โ†’ ์‹คํ–‰ โ†’ ๊ด€์ฐฐ ๋ฃจํ”„)
โ”œโ”€โ”€ Memory (์ƒํƒœ ๋ณด์กด, ์„ธ์…˜ ๋ธŒ๋ฆฌ์ง•)
โ”œโ”€โ”€ Tools โ† The Harness Problem์ด ์‹คํ—˜ํ•œ ์˜์—ญ
โ”‚   โ””โ”€โ”€ ์ฝ”๋“œ ํŽธ์ง‘ tool์˜ ํ˜•์‹
โ”‚       โ”œโ”€โ”€ patch
โ”‚       โ”œโ”€โ”€ str_replace
โ”‚       โ””โ”€โ”€ hashline
โ”œโ”€โ”€ Verification (ํ…Œ์ŠคํŠธ, ๊ฒ€์ฆ)
โ””โ”€โ”€ Artifacts (๋กœ๊ทธ, git history)

16๊ฐœ ๋ชจ๋ธ์„ ๋Œ€์ƒ์œผ๋กœ ์ฝ”๋“œ ํŽธ์ง‘ tool์˜ 3๊ฐ€์ง€ ํ˜•์‹์„ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ:

tool ํ˜•์‹๋™์ž‘ ๋ฐฉ์‹๋Œ€ํ‘œ์  ๋ฌธ์ œ
patchdiff ํ˜•์‹์œผ๋กœ ์ˆ˜์ • ํ‘œํ˜„ (- ์‚ญ์ œ, + ์ถ”๊ฐ€)Grok 4: 50.7% ์‹คํŒจ. diff ๋ฌธ๋ฒ• ์ƒ์„ฑ ์ž์ฒด์—์„œ ์˜ค๋ฅ˜ ๋ฐœ์ƒ
str_replace์›๋ณธ ํ…์ŠคํŠธ๋ฅผ ์ •ํ™•ํžˆ ๋งค์นญํ•˜์—ฌ ๊ต์ฒด๊ณต๋ฐฑ/๋“ค์—ฌ์“ฐ๊ธฐ ๋ถˆ์ผ์น˜๋กœ โ€œString to replace not foundโ€ ๋นˆ๋ฒˆ
hashline๋ผ์ธ๋ณ„ content hash๋กœ ์œ„์น˜๋ฅผ ์ฐธ์กฐ๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์—์„œ str_replace์™€ ๋™๋“ฑํ•˜๊ฑฐ๋‚˜ ์šฐ์„ธ

ํ•ต์‹ฌ ์ธ์‚ฌ์ดํŠธ:

โ€œ๋ชจ๋ธ์ด ๊ณผ์ œ๋ฅผ ์ดํ•ดํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋‹ค. harness(์˜ tool ํ˜•์‹ ์„ค๊ณ„)์—์„œ ์‹คํŒจํ•˜๋Š” ๊ฒƒ์ด๋‹ค.โ€

Grok Code Fast ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, tool ํ˜•์‹๋งŒ ๋ณ€๊ฒฝํ•˜์—ฌ 6.7% โ†’ 68.3% (10๋ฐฐ ํ–ฅ์ƒ)๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค. ์ด๊ฒƒ์€ ๋ชจ๋ธ ์ž์ฒด๋ฅผ ์—…๊ทธ๋ ˆ์ด๋“œํ•œ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, harness์˜ tool ํ˜•์‹ ์„ค๊ณ„๋งŒ ๋ฐ”๊พผ ๊ฒฐ๊ณผ๋‹ค.

hashline์˜ ๋™์ž‘ ๋ฐฉ์‹

hashline์€ ์œ„ ํ‘œ์—์„œ ๋ณด๋“ฏ patch, str_replace์™€ ๊ฐ™์€ ๋ ˆ๋ฒจ์˜ ์ฝ”๋“œ ํŽธ์ง‘ tool ํ˜•์‹ ์ค‘ ํ•˜๋‚˜๋‹ค. ํŒŒ์ผ์„ ์ฝ์„ ๋•Œ ๊ฐ ๋ผ์ธ์— content hash๋ฅผ ๋ถ€์—ฌ:

1:a3|function hello() {
2:f1|  return "world";
3:0e|}

๋ชจ๋ธ์€ ์ •ํ™•ํ•œ ํ…์ŠคํŠธ๋ฅผ ์žฌํ˜„ํ•  ํ•„์š” ์—†์ด, hash๋กœ ๋ผ์ธ์„ ์ฐธ์กฐ:

  • โ€œ๋ผ์ธ 2:f1์„ ๊ต์ฒดํ•ด์ค˜โ€
  • โ€œ๋ผ์ธ 1:a3๋ถ€ํ„ฐ 3:0e๊นŒ์ง€ ๋ฒ”์œ„๋ฅผ ๊ต์ฒดํ•ด์ค˜โ€
  • hash๊ฐ€ ๋ถˆ์ผ์น˜ํ•˜๋ฉด ํŒŒ์ผ์ด ๋ณ€๊ฒฝ๋œ ๊ฒƒ์ด๋ฏ€๋กœ ์ž๋™์œผ๋กœ ๊ฐ์ง€

vLLM - ์„œ๋น™ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์—ญํ• 

์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜๋ฉด ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋งŒ ์กด์žฌํ•œ๋‹ค. ์ด๊ฒƒ์„ OpenAI API ํ˜ธํ™˜ HTTP ์„œ๋ฒ„๋กœ ๊ฐ์‹ธ์„œ, Claude/GPT API์ฒ˜๋Ÿผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ๊ฒƒ์ด vLLM ๊ฐ™์€ ์„œ๋น™ ํ”„๋ ˆ์ž„์›Œํฌ๋‹ค.

vLLM์ด ์ œ๊ณตํ•˜๋Š” ๊ฒƒ:

  • tool parser: ๋ชจ๋ธ๋ณ„ tool call ์ถœ๋ ฅ ํ˜•์‹์„ ๊ฐ์ง€ํ•˜์—ฌ OpenAI API ํ˜ธํ™˜ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ (Hermes, Mistral, Llama ๋“ฑ ์ „์šฉ ํŒŒ์„œ)
  • guided decoding: constrained decoding์„ ์„œ๋น™ ๋ ˆ๋ฒจ์—์„œ ์ง€์› (4ํŽธ์—์„œ ๋‹ค๋ฃฌ JSON schema ๊ฐ•์ œ)
  • chat template: ๋ชจ๋ธ์˜ tokenizer_config.json์—์„œ chat template์„ ์ž๋™ ๋กœ๋“œ (2ํŽธ์—์„œ ๋‹ค๋ฃฌ ๋ณ€ํ™˜ ์ž๋™ํ™”)

๋‹ค์Œ ํŽธ: ๊ทธ๋Ÿฌ๋ฉด ์ง์ ‘ ๋งŒ๋“ค์–ด๋ณด์ž

์ด ๊ธ€์—์„œ Native ๋ชจ๋ธ๊ณผ Non-native ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์  ์ฐจ์ด๋ฅผ ํ™•์ธํ–ˆ๋‹ค. Non-native ๋ชจ๋ธ์—๋Š” 5๊ฐœ ๋ ˆ์ด์–ด ์ค‘ ๋Œ€๋ถ€๋ถ„์ด ์—†๊ณ , ์ด๋ฅผ ์™ธ๋ถ€์—์„œ ๋ฉ”์šฐ๋Š” ๊ฒƒ์ด Middleware๋‹ค.

๋‹ค์Œ ํŽธ์—์„œ๋Š” ์ด Middleware๋ฅผ ์ง์ ‘ ๋งŒ๋“ค์–ด๋ณธ๋‹ค. ํ”„๋กฌํ”„ํŠธ ์กฐ๋ฆฝ, ์ถœ๋ ฅ ํŒŒ์‹ฑ, ์‹คํ–‰ ๋ฃจํ”„์˜ 3๊ฐ€์ง€ ์—ญํ• ์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•œ๋‹ค.

์ฐธ๊ณ  ๋ฌธ์„œ