LLM Tool Calling - 나만의 에이전트

시리즈: LLM Tool Calling 내부 원리부터 에이전트 직접 구현까지

이 시리즈는 사용자의 자연어 한 줄이 tool 실행으로 바뀌는 내부 처리 과정을 단계별로 해부하고, 최종적으로 오픈소스 모델 + 자체 middleware로 나만의 에이전트를 직접 구현하는 것까지 도달하는 과정이다.

편	내용	핵심
1편	전체 조감도	자연어 → tool 실행까지 5개 레이어의 존재를 확인
2편	Chat Template	JSON이 모델에 직접 들어가지 않는다
3편	Tokenization	모델은 텍스트를 읽지 못한다 - 토큰 ID와 control token
4편	모델 추론	”tool을 쓸까 말까” 판단과 constrained decoding
5편	Tool 실행	tool_use를 받은 클라이언트의 실행 루프
6편	Native vs Non-native	같은 기능, 다른 구조 → Middleware
7편	Middleware 만들기	프롬프트 조립 + 출력 파싱 + 실행 루프 직접 구현
8편	오픈소스 모델 로컬 구축	Ollama/vLLM으로 로컬 LLM 서빙
9편 (본문)	나만의 에이전트	모델 + Middleware = 에이전트 완성

나만의 에이전트는 8편에서 추천한 Qwen3 + Ollama + Vercel AI SDK 조합을 실제로 조립하여 tool calling이 동작하는 완성된 시스템
1~8편에서 개별적으로 이해한 구성요소(엔진, 통역사, 진행자, 실무자, 두뇌)를 하나로 연결하는 설치 가이드
MCP server 연결까지 포함하여, baseURL 변경만으로 상용 모델(Claude/GPT)과 동일한 인터페이스로 동작하는 로컬 에이전트

해당 개념이 필요한 이유

8편까지 각 구성요소를 개별적으로 이해했다. 하지만 아직 조립하지 않았다
“나만의 에이전트”를 만든다는 것은 이 구성요소들을 하나의 동작하는 시스템으로 연결하는 것이다
이 가이드를 따라하면 ~20분 안에 tool calling이 동작하는 로컬 에이전트를 실행할 수 있다

AS-IS

sequenceDiagram
    autonumber
    participant User as 사용자
    participant Parts as 개별 구성요소들

    User->>Parts: 1~8편에서 배운 것들
    Note over Parts: 엔진(Ollama) ✓<br/>통역사(Middleware) ✓<br/>진행자(SDK) ✓<br/>실무자(Tool) ✓<br/>두뇌(Qwen3) ✓
    Parts-->>User: 각각 이해했지만<br/>조립하지 않았다

TO-BE

sequenceDiagram
    autonumber
    participant User as 사용자
    box 애플리케이션
        participant SDK as Vercel AI SDK<br/>(진행자)
        participant Tool as Tool 함수<br/>(실무자)
    end
    box Ollama (엔진 + 통역사)
        participant API as Ollama API
        participant LLM as Qwen3<br/>(두뇌)
    end

    User->>SDK: "서울과 도쿄 날씨 비교해줘"
    SDK->>API: tools + 질문
    API->>LLM: 추론
    API-->>SDK: tool_calls: get_weather("Seoul")
    SDK->>Tool: get_weather("Seoul") 실행
    Tool-->>SDK: "15°C, 맑음"
    SDK->>API: tool 결과 + 재요청
    API->>LLM: 추론
    API-->>SDK: tool_calls: get_weather("Tokyo")
    SDK->>Tool: get_weather("Tokyo") 실행
    Tool-->>SDK: "22°C, 흐림"
    SDK->>API: tool 결과 + 재요청
    API->>LLM: 추론
    API-->>SDK: "서울은 15°C, 도쿄는 22°C. 도쿄가 7도 더 따뜻합니다."
    SDK-->>User: 최종 응답

Step 1: 환경 준비

Ollama 설치 및 모델 다운로드

# macOS
brew install ollama
 
# 모델 다운로드 (~5GB)
ollama pull qwen3:8b
 
# 동작 확인
ollama run qwen3:8b "안녕하세요"

ollama serve는 설치 시 자동 실행된다. localhost:11434에 API 서버가 준비된다.

Node.js 프로젝트 생성

mkdir my-agent && cd my-agent
npm init -y
npm install ai ollama-ai-provider-v2 zod
npx tsc --init

tsconfig.json에서 설정:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ES2022",
    "moduleResolution": "bundler",
    "esModuleInterop": true,
    "strict": true
  }
}

Step 2: 기본 에이전트 - tool 없이 대화

먼저 모델 연결만 확인한다. tool 없이 단순 대화:

// agent.ts
import { generateText } from "ai";
import { ollama } from "ollama-ai-provider-v2";
 
const result = await generateText({
  model: ollama("qwen3:8b"),
  prompt: "서울의 유명한 관광지 3개를 알려줘",
});
 
console.log(result.text);

npx tsx agent.ts

Ollama API에 연결되어 응답이 출력되면 성공. 8편에서 설명한 흐름이 동작하고 있다:

ollama("qwen3:8b") → localhost:11434/v1에 OpenAI 호환 API 요청
Ollama가 Qwen3 모델로 추론 → 응답 반환

Step 3: Tool 추가 - 날씨 조회

이제 tool을 추가한다. 5편에서 다뤘던 tool 실행 루프가 여기서 동작한다:

// agent-with-tool.ts
import { generateText, tool, stepCountIs } from "ai";
import { ollama } from "ollama-ai-provider-v2";
import { z } from "zod";
 
const result = await generateText({
  model: ollama("qwen3:8b"),
  tools: {
    get_weather: tool({
      description: "Get the current weather in a given location",
      inputSchema: z.object({
        location: z.string().describe("City name"),
      }),
      execute: async ({ location }) => {
        // 실무자: 실제 날씨 API 호출 (여기서는 mock)
        return {
          location,
          temperature: Math.round(Math.random() * 30),
          condition: "맑음",
        };
      },
    }),
  },
  stopWhen: stepCountIs(5), // 진행자: 최대 5단계까지 자동 반복
  prompt: "서울 날씨 알려줘",
});
 
console.log(result.text);
console.log("Steps:", result.steps.length);

이 코드에서 각 구성요소의 역할 (8편 비유):

엔진 + 통역사 (Ollama): 모델 실행 + 프롬프트 조립 + 출력 파싱
진행자 (Vercel AI SDK): stopWhen으로 tool_calls 감지 → 실행 → 피드백 반복 자동화
실무자 (execute 함수): get_weather 실제 실행
두뇌 (Qwen3): 다음 토큰 예측 → tool call 패턴 생성

Step 4: 여러 Tool + 멀티턴 대화

tool을 2~3개로 늘리면, 모델이 질문에 따라 적절한 tool을 선택한다. 1편에서 Claude Code가 Read, Bash, Grep 중 하나를 고르는 것과 같은 원리다:

// agent-multi-tool.ts
import { generateText, tool, stepCountIs } from "ai";
import { ollama } from "ollama-ai-provider-v2";
import { z } from "zod";
import { readFileSync } from "fs";
 
const result = await generateText({
  model: ollama("qwen3:8b"),
  tools: {
    get_weather: tool({
      description: "Get the current weather in a given location",
      inputSchema: z.object({
        location: z.string().describe("City name"),
      }),
      execute: async ({ location }) => ({
        location,
        temperature: Math.round(Math.random() * 30),
        condition: "맑음",
      }),
    }),
    read_file: tool({
      description: "Read the contents of a file",
      inputSchema: z.object({
        path: z.string().describe("File path to read"),
      }),
      execute: async ({ path }) => {
        try {
          return { content: readFileSync(path, "utf-8") };
        } catch (e) {
          return { error: `File not found: ${path}` };
        }
      },
    }),
    calculate: tool({
      description: "Perform a mathematical calculation",
      inputSchema: z.object({
        expression: z.string().describe("Math expression to evaluate"),
      }),
      execute: async ({ expression }) => {
        try {
          return { result: Function(`return ${expression}`)() };
        } catch (e) {
          return { error: `Invalid expression: ${expression}` };
        }
      },
    }),
  },
  stopWhen: stepCountIs(5),
  prompt: "서울과 도쿄의 날씨를 비교하고, 온도 차이를 계산해줘",
});
 
console.log(result.text);
 
// 어떤 tool이 호출되었는지 확인
for (const step of result.steps) {
  for (const toolResult of step.toolResults) {
    console.log(`Tool: ${toolResult.toolName}, Args:`, toolResult.args);
  }
}

모델이 자동으로:

get_weather("Seoul") 호출
get_weather("Tokyo") 호출
calculate("22 - 15") 호출
결과를 종합하여 자연어 응답 생성

이 전체 과정을 stopWhen: stepCountIs(5)가 자동으로 반복한다.

Step 5: MCP Server 연결

지금까지 tool을 애플리케이션 코드에 직접 구현했다. MCP server를 사용하면 tool 정의와 실행을 외부 서버로 분리할 수 있다. 5편에서 다뤘던 MCP의 핵심 가치다.

MCP 클라이언트 설치

npm install @modelcontextprotocol/sdk

MCP Server의 Tool을 SDK에 연결

// agent-mcp.ts
import { generateText, stepCountIs } from "ai";
import { ollama } from "ollama-ai-provider-v2";
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
 
// MCP 서버 연결
const transport = new StdioClientTransport({
  command: "npx",
  args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
});
const mcpClient = new Client({ name: "my-agent", version: "1.0.0" });
await mcpClient.connect(transport);
 
// MCP 서버에서 tool 목록 가져오기
const mcpTools = await mcpClient.listTools();
 
// MCP tool을 AI SDK tool 형식으로 변환
const tools: Record<string, any> = {};
for (const mcpTool of mcpTools.tools) {
  tools[mcpTool.name] = {
    description: mcpTool.description ?? "",
    parameters: mcpTool.inputSchema,
    execute: async (args: any) => {
      // MCP 서버에 tool 실행 위임
      const result = await mcpClient.callTool({
        name: mcpTool.name,
        arguments: args,
      });
      return result.content;
    },
  };
}
 
const result = await generateText({
  model: ollama("qwen3:8b"),
  tools,
  stopWhen: stepCountIs(5),
  prompt: "/tmp 디렉토리의 파일 목록을 보여줘",
});
 
console.log(result.text);
await mcpClient.close();

이 코드의 흐름 (5편의 MCP 다이어그램과 동일):

MCP 서버에서 tool 목록을 가져온다 (listTools)
각 tool의 execute 함수에서 MCP 서버에 실행을 위임한다 (callTool)
Vercel AI SDK가 실행 루프를 자동 처리한다 (stopWhen)

개발자가 tool 실행 함수를 직접 구현하는 대신, MCP 서버가 실무자 역할을 대신한다.

Step 6: 에러 핸들링

7편에서 다뤘듯이, tool calling은 실패할 수 있다. 주요 에러와 대응:

const result = await generateText({
  model: ollama("qwen3:8b"),
  tools: {
    get_weather: tool({
      description: "Get weather",
      inputSchema: z.object({ location: z.string() }),
      execute: async ({ location }) => {
        try {
          const res = await fetch(`https://api.weather.com?city=${location}`);
          if (!res.ok) throw new Error(`API error: ${res.status}`);
          return await res.json();
        } catch (e) {
          // 에러를 모델에 피드백 → 모델이 다른 방법을 시도하거나 사용자에게 알림
          return { error: `날씨 정보를 가져올 수 없습니다: ${e.message}` };
        }
      },
    }),
  },
  stopWhen: stepCountIs(5),
  prompt: "서울 날씨 알려줘",
  // 전체 실행 타임아웃
  abortSignal: AbortSignal.timeout(30_000),
});

에러를 throw하지 않고 에러 메시지를 반환하면, 모델이 이를 받아서 “날씨 정보를 가져올 수 없습니다”라고 자연스럽게 응답한다. Vercel AI SDK의 tool-error 처리와 같은 원리다.

상용 모델과 비교 - baseURL만 변경

8편에서 설명한 OpenAI 호환 API의 이점이 여기서 빛난다. 같은 코드로 상용 모델에 연결할 수 있다:

import { generateText, tool, stepCountIs } from "ai";
import { ollama } from "ollama-ai-provider-v2";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";
 
// 모델만 교체 — 나머지 코드 동일
const model = process.env.USE_CLOUD
  ? anthropic("claude-sonnet-4-5")  // 상용 모델 (API 키 필요)
  : ollama("qwen3:8b");             // 로컬 모델 (무료)
 
const result = await generateText({
  model,
  tools: {
    get_weather: tool({
      description: "Get the current weather",
      inputSchema: z.object({ location: z.string() }),
      execute: async ({ location }) => ({
        location,
        temperature: 15,
        condition: "맑음",
      }),
    }),
  },
  stopWhen: stepCountIs(5),
  prompt: "서울 날씨 알려줘",
});

로컬 개발은 Ollama(무료)로 빠르게 테스트하고, 프로덕션에서는 Claude API나 vLLM으로 전환할 수 있다. Provider만 교체하면 된다.

실제 사례: Manus AI

이 시리즈에서 만든 것과 동일한 구조로 $2-3B 가치의 에이전트가 만들어졌다.

Manus AI가 해결하고자 하는 문제

“챗봇이 아닌, 실제로 일하는 AI”. 기존 LLM은 텍스트를 생성할 뿐 실제 작업을 수행하지 않는다. Manus는 사용자가 “이 데이터로 보고서 만들어줘”라고 하면 직접 코드를 실행하고, 웹을 검색하고, 파일을 만들어서 완성하는 자율 에이전트를 구축했다.

핵심 기술 결정: Fine-tuning이 아닌 In-context Learning

Manus 공식 블로그에서 밝힌 핵심 선택:

“We chose in-context learning over fine-tuning. Ship improvements in hours instead of weeks.”

4편에서 다뤘던 Fine-tuning vs In-context learning 비교에서, Manus는 in-context learning을 선택했다. 자체 LLM을 학습하지 않고, 기존 모델(Claude) 위에 시스템 프롬프트와 context 설계로 에이전트를 만들었다. 빠른 반복이 가능하기 때문이다.

아키텍처 - 이 시리즈와 동일한 구조

Manus도 자체 LLM을 만들지 않았다. 기존 모델 위에 도구 + 오케스트레이션을 얹은 구조다:

구성요소	Manus AI	이 시리즈에서 만든 것	8편 비유
두뇌 (모델)	Claude 3.5/3.7 Sonnet + Fine-tuned Qwen	Qwen3 8B	두뇌
엔진 (서빙)	Anthropic API + 자체 vLLM	Ollama	엔진
진행자 (실행 루프)	자체 오케스트레이션 (analyze → plan → execute → observe)	Vercel AI SDK	진행자
실무자 (Tool)	29개+ tool (브라우저, 코드 실행, 파일, API)	get_weather, read_file 등	실무자
통역사 (Middleware)	불필요 (Native 모델 사용)	불필요 (Ollama 내장)	통역사

기술 스택

영역	Manus AI
모델	Claude 3.5/3.7 Sonnet (주력), Fine-tuned Qwen (보조)
서빙	Anthropic API + 자체 vLLM (prefix caching 활성화)
샌드박스	사용자당 격리된 Ubuntu Linux VM (Zero Trust)
브라우저	Headless browser (Playwright 계열)
코드 실행	Python 3.10, Node.js 20, Shell (sudo 권한)
Tool 형식	Hermes format (Auto/Required/Specified 3가지 모드)
메모리	파일 기반 (`todo.md` 실시간 업데이트) + 벡터 DB (RAG)

Tool 카테고리

카테고리	Tool
웹	web_search, browser_navigate, browser_click, browser_scroll
코드 실행	Python 스크립트, Node.js, Shell 명령
파일	파일 읽기/쓰기, 디렉토리 생성
API	날씨, 금융 등 사전 승인된 데이터 소스
사용자 통신	notify (상태 알림), ask (질문 대기)

Tool 이름에 일관된 접두사 사용 (browser_, shell_) — 이것은 KV-cache 최적화를 위한 설계다.

Context Engineering - Manus의 핵심 최적화

Manus 공식 블로그에서 밝힌 가장 중요한 지표:

“KV-cache hit rate is the single most important metric for a production-stage AI agent.”

Manus는 입력:출력 토큰 비율이 ~100:1이다. 캐시된 토큰은 10배 저렴하므로, KV-cache 적중률이 비용과 속도를 결정한다:

최적화 기법	설명
프롬프트 접두사 고정	한 토큰만 바뀌어도 캐시 무효화 → 접두사를 안정적으로 유지
Append-only context	삽입/수정 없이 항상 뒤에 추가 → 캐시 재사용률 극대화
파일 기반 확장 context	`todo.md`를 매 단계 업데이트하여 목표를 context 끝에 반복 (lost-in-the-middle 방지)
에러 유지	실패한 시도를 context에 남김 → 모델이 같은 실수를 반복하지 않도록 학습
Tool logit masking	Tool을 동적으로 제거하지 않고 logit masking으로 선택 제한 → 캐시 유지

오픈소스로 재현하려면

Manus 기술 분석에서 제시한 오픈소스 대체 스택:

Manus 구성요소	오픈소스 대체
LLM	CodeActAgent (Mistral 7B), Claude API
오케스트레이션	LangChain, CrewAI
샌드박스	Docker 컨테이너
브라우저	Playwright
벡터 DB (RAG)	FAISS
추론 서버	vLLM, FastChat

핵심 인사이트: 모델을 직접 만드는 것이 아니라, 기존 모델 위에 tool + 오케스트레이션을 얹는 것이 에이전트의 핵심이다. 이 시리즈에서 만든 구조가 Manus AI와 동일한 아키텍처다.

시리즈 마무리

1편에서 “자연어 한 줄이 tool 실행으로 바뀌는 과정은 블랙박스”라고 했다. 9편에 걸쳐 그 블랙박스를 열었다:

편	열어본 것
1편	블랙박스 안에 5개 레이어가 있다
2편	JSON이 텍스트로 변환된다 (Chat Template)
3편	텍스트가 숫자로 변환된다 (Tokenization)
4편	숫자로 다음 토큰을 예측한다 (모델 추론)
5편	tool_use를 받아 실행하고 결과를 돌려준다 (Tool 실행)
6편	이 레이어가 없는 모델이 있다 (Native vs Non-native)
7편	없는 레이어를 직접 만들었다 (Middleware)
8편	모델을 직접 띄웠다 (Ollama/vLLM)
9편	모든 것을 조립하여 에이전트를 완성했다

참고 문서

Ollama 공식 문서 - 설치, 모델 다운로드
Ollama AI Provider (Vercel AI SDK) - ollama-ai-provider-v2 설정
Vercel AI SDK - Tool Calling - generateText, tool, stopWhen
MCP - Build an MCP Client - MCP 클라이언트 구현
MCP Filesystem Server - 파일시스템 MCP 서버
Context Engineering for AI Agents: Lessons from Building Manus - Manus 공식 블로그, KV-cache 최적화, in-context learning 선택
Manus AI Technical Architecture Analysis - Manus 기술 분석, 29개 tool, 샌드박스, 오픈소스 대체 스택
Manus AI - Claude + 29개 tool + 샌드박스 에이전트

Explorer

날짜별 보기

2026년 (148)

2025년 (8)