Core Concepts
Capstan serves two roles: (1) an AI agent framework for building intelligent agents with durable execution, self-evolution, and production robustness, and (2) a full-stack web framework for building typed HTTP/MCP/A2A/OpenAPI applications. Both share the same Bun-native runtime. This document covers the agent framework first (Part 1) because it is the primary use case, then the web framework (Part 2).
Part 1: Smart Agent
createSmartAgent
createSmartAgent is the central API. It takes a configuration object and returns a SmartAgent with two methods: run(goal) and resume(checkpoint, message).
Here is a production agent in 30 lines:
import { createSmartAgent } from "@zauso-ai/capstan-ai";
import { anthropicProvider } from "@zauso-ai/capstan-agent";
const agent = createSmartAgent({
llm: anthropicProvider({
apiKey: process.env.ANTHROPIC_API_KEY!,
model: "claude-sonnet-4-20250514",
}),
tools: [readFile, writeFile, runCommand, searchCode],
skills: [debuggingSkill, refactoringSkill],
evolution: {
store: myEvolutionStore,
capture: "every-run",
distillation: "post-run",
},
maxIterations: 200,
contextWindowSize: 200_000,
fallbackLlm: openaiProvider({ apiKey: process.env.OPENAI_API_KEY!, model: "gpt-4o" }),
llmTimeout: { chatTimeoutMs: 120_000, streamIdleTimeoutMs: 90_000 },
});
const result = await agent.run("Fix the failing test in src/parser.test.ts");
console.log(result.status); // "completed" | "max_iterations" | "fatal" | ...
console.log(result.iterations); // how many loop iterations it took
console.log(result.toolCalls); // full tool call traceSmartAgentConfig Reference
| Property | Type | Required | Description |
|---|---|---|---|
llm | LLMProvider | Yes | Primary language model |
tools | AgentTool[] | Yes | Operations the agent can invoke |
skills | AgentSkill[] | No | Strategic guidance the agent can activate |
evolution | EvolutionConfig | No | Self-evolution: experience capture, distillation |
memory | SmartAgentMemoryConfig | No | Scoped memory with pluggable backend |
maxIterations | number | No | Max loop iterations (default: 10) |
contextWindowSize | number | No | Context window size for compression decisions |
fallbackLlm | LLMProvider | No | Backup model when primary fails |
tokenBudget | number | TokenBudgetConfig | No | Output token budget with nudge + force-complete |
toolResultBudget | ToolResultBudgetConfig | No | Per-result and aggregate truncation limits |
llmTimeout | LLMTimeoutConfig | No | Watchdog timeouts for chat and streaming |
hooks | SmartAgentHooks | No | Lifecycle hooks (before/after tool calls, etc.) |
compaction | Partial<CompactionConfig> | No | Compression tuning (snip, microcompact, autocompact) |
stopHooks | StopHook[] | No | Quality gates on final responses |
The Agent Loop
runSmartLoop is the engine inside createSmartAgent. It implements an 8-phase iteration cycle.
Phase 1: Initialization
Before the first iteration:
- Create engine state from config (tools, messages, counters)
- Build tool catalog (inline all tools, or defer large sets behind a
discover_toolmeta-tool) - Inject
activate_skillsynthetic tool if skills are configured - Retrieve relevant memories from memory store
- Compose system prompt (base prompt + tool descriptions + skill catalog + memories + strategies)
- Set initial messages: [system prompt, user goal]
Phase 2: Main Loop
Each iteration runs these steps:
- Compression Check -- if tokens exceed 60%, run snip + microcompact; if 85%, run autocompact (LLM-driven summarization)
- Control Check -- operator can return
"pause"or"cancel"viagetControlStatehook - Model + Tool Execution -- call LLM, parse tool calls, validate arguments (JSON Schema + custom validate), execute tools with timeout and concurrency
- Error Handling -- on context limit error: autocompact recovery then reactive compact then fatal; on other error: try fallbackLlm
- Token Budget -- at nudge threshold inject wrap-up message; at 100% force-complete
- Result Processing -- error withholding (retry once), tool result budgeting, memory event hook
- Dynamic Context Enrichment -- every 5 iterations, query memory for fresh relevant context
- Continuation Decision -- run stop hooks; if rejected, inject feedback and continue (max 3 rejections); otherwise complete
Phase 3: Post-Loop
If the loop exits due to maxIterations, the last assistant message becomes the result with status "max_iterations".
Tools
Tools are operations with defined inputs and outputs -- reading files, running commands, calling APIs. They are validated in two phases before execution: JSON Schema validation (parameters) and custom validation (validate).
import type { AgentTool } from "@zauso-ai/capstan-ai";
const readFile: AgentTool = {
name: "read_file",
description: "Read the contents of a file at the given path",
parameters: {
type: "object",
properties: {
path: { type: "string", description: "Absolute file path" },
offset: { type: "integer", description: "Line to start reading from" },
limit: { type: "integer", description: "Max lines to read" },
},
required: ["path"],
},
validate(args) {
if ((args.path as string).includes(".."))
return { valid: false, error: "Path traversal not allowed" };
return { valid: true };
},
timeout: 10_000,
isConcurrencySafe: true,
failureMode: "soft",
async execute(args) {
const content = await Bun.file(args.path as string).text();
return { content, lines: content.split("\n").length };
},
};Tool result budgeting: large results are truncated and optionally persisted to disk. The agent gets a read_persisted_result tool automatically to retrieve the full data.
Concurrent execution: tools marked isConcurrencySafe: true can execute in parallel. Configure max parallelism with streaming: { maxConcurrency: 4 }.
Skills
Skills are strategies, not operations. They provide high-level guidance for how to approach a class of problems. When activated, a skill's prompt is injected into the conversation.
| Aspect | Tool | Skill |
|---|---|---|
| What it is | An operation with I/O | A strategy with guidance text |
| Invocation | Model calls it with arguments | Model activates it by name |
| Result | Concrete data (file contents, etc.) | Injected guidance prompt |
| Side effects | Yes (reads/writes/network) | No (read-only prompt injection) |
| Source | Developer-defined | Developer-defined or auto-evolved |
import { defineSkill } from "@zauso-ai/capstan-ai";
const debuggingSkill = defineSkill({
name: "debugging",
description: "Systematic debugging methodology",
trigger: "When encountering bugs, test failures, or unexpected behavior",
prompt: `## Debugging Strategy
1. REPRODUCE: Confirm the failure by running the exact failing test.
2. ISOLATE: Narrow down to the smallest reproducing case.
3. HYPOTHESIZE: Form a specific hypothesis about the root cause.
4. VERIFY: Test the hypothesis with targeted reads/searches.
5. FIX: Apply the minimal fix that addresses the root cause.
6. CONFIRM: Re-run the original failing test to verify.`,
tools: ["read_file", "run_command", "search_code"],
});At runtime: skills are listed in the system prompt, the runtime injects a synthetic activate_skill tool, and the agent calls it when needed.
Memory
The memory system provides scoped, searchable memory that persists across agent runs.
const agent = createSmartAgent({
// ...
memory: {
store: new BuiltinMemoryBackend(),
scope: { type: "project", id: "my-app" },
readScopes: [{ type: "global", id: "shared" }],
maxMemoryTokens: 4000,
saveSessionSummary: true,
},
});Features: initial retrieval before the first iteration, staleness annotations (age-based freshness notes), dynamic enrichment every 5 iterations, session summary auto-save. Backends: BuiltinMemoryBackend (in-memory), SqliteMemoryBackend (persistent), or custom.
Self-Evolution
Self-evolution enables agents to learn from their runs and improve over time:
Experience (run trajectory) --> Strategy (distilled pattern) --> Skill (promoted guidance)- Run 1-5: Raw experience capture. Each run records goal, outcome, tool call trajectory, iterations, duration.
- Run 3+: Strategy distillation. The LLM-driven distiller analyzes trajectories and extracts generalizable strategies.
- Run 10+: Strategy refinement. Consolidator merges overlapping strategies, resolves contradictions. Utility scores: +0.1 success, -0.05 failure.
- Run 50+: Skill promotion. Strategies reaching utility >= 0.7 after >= 5 applications are auto-promoted to reusable skills.
evolution: {
store: new SqliteEvolutionStore("./agent-evolution.db"),
capture: "every-run", // "every-run" | "on-failure" | "on-success" | custom
distillation: "post-run", // "post-run" | "manual"
pruning: { maxStrategies: 50, minUtility: 0.2 },
skillPromotion: { minUtility: 0.7, minApplications: 5 },
}Production Robustness
The agent loop includes nine robustness mechanisms:
| Mechanism | Description |
|---|---|
| Model fallback | When primary LLM fails, strip thinking blocks and retry with fallbackLlm |
| Reactive compression | 3-phase: autocompact -> reactive compact -> fatal |
| Token budget | Nudge at 80% + force-complete at 100% |
| LLM watchdog | Chat timeout (120s), stream idle timeout (90s), stall warning (30s) |
| Tool timeout | Per-tool configurable timeout via Promise.race |
| Error withholding | Retry failed tools once before exposing error to LLM |
| Message normalization | Merge adjacent same-role messages, filter empties |
| Input validation | Two-layer: JSON Schema + custom validate hook |
| Abort handling | Blocked tool calls produce synthetic results so the LLM can adjust |
Lifecycle Hooks
Hooks provide fine-grained control over agent execution. All hooks are optional and non-fatal.
| Hook | When | Purpose |
|---|---|---|
beforeToolCall | Before each tool execution | Gate: return { allowed: false } to block |
afterToolCall | After each tool execution | Observe results. Status is "success" or "error" |
onCheckpoint | At initialization, tool_result, completion | Save or modify checkpoints |
getControlState | Before LLM, before/after tools | Operator control: pause or cancel |
onRunComplete | Once at end of run | Final notification/logging |
afterIteration | After each iteration | Progress monitoring |
Checkpoints and Resume
Every agent run produces checkpoints that can be used to resume interrupted runs:
const result = await agent.run("Deploy the new feature");
if (result.status === "paused") {
const resumed = await agent.resume(result.checkpoint!, "Approved. Continue deployment.");
}
if (result.status === "approval_required") {
const { tool, args, reason } = result.pendingApproval!;
// After human approval:
const resumed = await agent.resume(result.checkpoint!, "Approved. Proceed.");
}Stop Hooks (Guardrails)
Stop hooks are quality gates evaluated when the model produces a final response. If any hook fails, the response is rejected with feedback and the agent continues. After 3 consecutive rejections, the agent is force-completed.
Part 2: Full-Stack Web
defineAPI()
defineAPI() is the central building block for web endpoints. A single call defines a typed handler projected to HTTP, MCP, A2A, and OpenAPI.
import { defineAPI } from "@zauso-ai/capstan-core";
import { z } from "zod";
export const POST = defineAPI({
input: z.object({
title: z.string().min(1).max(200),
priority: z.enum(["low", "medium", "high"]).default("medium"),
}),
output: z.object({
id: z.string(),
title: z.string(),
status: z.string(),
}),
description: "Create a new ticket",
capability: "write",
resource: "ticket",
policy: "requireAuth",
async handler({ input, ctx }) {
return { id: crypto.randomUUID(), title: input.title, status: "open" };
},
});Multi-Protocol Projection
One defineAPI() call simultaneously creates endpoints across four protocols:
defineAPI() --> CapabilityRegistry
|-- HTTP JSON API (Hono)
|-- MCP Tools (@modelcontextprotocol/sdk)
|-- A2A Skills (Google Agent-to-Agent)
+-- OpenAPI 3.1 Spec- HTTP -- Input from query params (GET) or JSON body (POST/PUT/PATCH/DELETE)
- MCP -- Each route becomes a tool.
GET /ticketsbecomesget_tickets - A2A -- Each route becomes a skill via JSON-RPC
tasks/send - OpenAPI -- Each route becomes an operation with full schema generation
Auto-Generated Endpoints
| Endpoint | Protocol | Description |
|---|---|---|
GET /.well-known/capstan.json | Capstan | Agent manifest with all capabilities |
GET /.well-known/agent.json | A2A | Agent card with skills list |
POST /.well-known/a2a | A2A | JSON-RPC task handler |
POST /.well-known/mcp | MCP | Streamable HTTP MCP endpoint |
GET /openapi.json | OpenAPI | OpenAPI 3.1 specification |
GET /capstan/approvals | Capstan | Approval workflow management |
File-Based Routing
Routes live in app/routes/. The router scans the directory tree and maps files to URL patterns.
| File Pattern | Route Type | Description |
|---|---|---|
*.api.ts | API | API handler (exports HTTP methods) |
*.page.tsx | Page | React page component (SSR) |
_layout.tsx | Layout | Wraps nested routes via <Outlet> |
_middleware.ts | Middleware | Runs before handlers in scope |
_loading.tsx | Loading | Suspense fallback for pages |
_error.tsx | Error | Error boundary for pages |
Dynamic segments use [param], catch-all uses [...param], route groups use (name) (transparent in URL).
definePolicy()
Policies define permission rules evaluated before route handlers.
import { definePolicy } from "@zauso-ai/capstan-core";
export const requireAuth = definePolicy({
key: "requireAuth",
title: "Require Authentication",
effect: "deny",
async check({ ctx }) {
if (!ctx.auth.isAuthenticated) {
return { effect: "deny", reason: "Authentication required" };
}
return { effect: "allow" };
},
});Policy Effects
| Effect | Behavior |
|---|---|
allow | Request proceeds normally |
deny | Request is rejected with 403 Forbidden |
approve | Request is held for human approval (returns 202 with approval ID) |
redact | Request proceeds but response data may be filtered |
When multiple policies apply, all are evaluated and the most restrictive effect wins: allow < redact < approve < deny.
defineModel (Database)
Capstan uses Drizzle ORM for data modeling. defineModel() creates typed table definitions with auto-generated CRUD route helpers.
import { defineModel } from "@zauso-ai/capstan-db";
import { text, integer } from "drizzle-orm/sqlite-core";
export const ticket = defineModel("ticket", {
title: text("title").notNull(),
priority: text("priority").default("medium"),
status: text("status").default("open"),
});Features: migrations, vector search, and generated CRUD endpoints that integrate with defineAPI() and the multi-protocol registry.
Verification Loop
capstan verify --json runs an 8-step cascade against your application:
| Step | Checks |
|---|---|
structure | Required files exist |
config | Config file loads and has a valid export |
routes | API files export handlers, write endpoints have policies |
models | Model definitions valid |
typecheck | tsc --noEmit passes |
contracts | Models match routes, policy references valid |
manifest | Agent manifest matches live routes |
protocols | HTTP, MCP, A2A, OpenAPI schema consistency |
Output includes repairChecklist with fixCategory and autoFixable flags, enabling an AI self-repair loop.
AI in Web Handlers
The AI toolkit integrates with web handlers via the request context:
export const POST = defineAPI({
// ...
async handler({ input, ctx }) {
const analysis = await ctx.think(input.message, {
schema: z.object({ intent: z.string(), confidence: z.number() }),
});
await ctx.remember(`User asked about: ${analysis.intent}`);
const history = await ctx.recall(input.message);
return { analysis, relatedHistory: history };
},
});think() returns structured data via Zod schema parsing. generate() returns raw text. Both have streaming variants.