Post

Agent Tool Interfaces Part 1: How Agents Connect to Tools — The Complete Interface Landscape

Agent Tool Interfaces Part 1: How Agents Connect to Tools — The Complete Interface Landscape

Every architectural decision in an agent system starts with one question: how does it touch the outside world?

Agent Tool Interfaces: From Landscape to Orchestration

This is Part 1 of a 2-part series on Agent Tool Interfaces.


TL;DR: MCP and CLI get all the attention, but agents actually connect to the world through at least six distinct interfaces — each with different cost, reliability, and security profiles. The biggest shift in 2026 isn’t MCP vs CLI. It’s the rise of Code-as-Tool-Use, which bypasses MCP’s schema bloat entirely by letting agents write and execute code against typed SDKs. This post maps the complete landscape, benchmarks the Big Three, and provides a decision framework for choosing the right interface for each system your agent touches.


The Question Nobody Asked Until It Broke

Your agent needs to list open pull requests. Two paths:

Path A — CLI:

1
gh pr list --repo owner/repo --state open --json title,number

Path B — MCP:

1
2
3
Connect to GitHub MCP server → OAuth handshake → Load 43 tool schemas
→ Call list_pull_requests({owner: "owner", repo: "repo", state: "open"})
→ Parse structured JSON response

Both return the same data. But Path A costs 1,365 tokens. Path B costs 44,026 tokens — 32x more — before a single result comes back. Path A succeeds 100% of the time. Path B fails 28% of the time due to TCP timeouts.

So why does MCP exist? And why is it winning adoption anyway?

Because tool connection is an architectural decision, not a feature toggle. And in 2026, the choice is no longer binary. There are at least six distinct approaches competing for your agent’s attention — and the most disruptive one isn’t MCP or CLI at all.


The Tool Interface Landscape (2026)

Before diving into any single interface, let’s see the full map.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
┌──────────────────────────────────────────────────────────────────┐
│               Tool Interface Landscape (2026)                    │
│                                                                  │
│  Structured ◄──────────────────────────────────► Unstructured    │
│                                                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌──────────────────────────┐ │
│  │  Native     │  │    MCP      │  │         CLI              │ │
│  │  Function   │  │  Protocol   │  │    Shell / Subprocess    │ │
│  │  Calling    │  │  (JSON-RPC) │  │    (stdin/stdout)        │ │
│  └──────┬──────┘  └──────┬──────┘  └────────────┬─────────────┘ │
│         │                │                       │               │
│  ┌──────┴──────┐  ┌──────┴──────┐  ┌────────────┴─────────────┐ │
│  │   Code      │  │  Browser /  │  │     API-Direct           │ │
│  │  Execution  │  │  Computer   │  │   (HTTP generation)      │ │
│  │  (sandbox)  │  │    Use      │  │                          │ │
│  └─────────────┘  └─────────────┘  └──────────────────────────┘ │
│                                                                  │
│  ──── Supporting Infrastructure ────                             │
│  Message Queues │ Framework Abstractions │ WebAssembly Sandbox   │
└──────────────────────────────────────────────────────────────────┘

Maturity Matrix

ApproachMaturity2026 TrendBest For
CLIMatureCLI renaissance (SKILL.md)Local dev tools, git, testing
MCPMaturing97M+ downloads/monthOAuth services, audit, discovery
Code ExecutionMature + acceleratingMost important shiftMulti-step workflows, data transforms
Native Function CallingMatureConverging across providersTight app control, single-turn
Browser/Computer UseEarly-midWeb approaching productionNo-API legacy systems
API-DirectMature (mediated)Consolidating via platformsOpenAPI-spec’d services
Message QueueInfra matureAgent standardization neededAsync enterprise pipelines
WebAssemblyEmergingMS/NVIDIA momentumMaximum sandbox security

The rest of this post focuses on the Big Three — the interfaces that handle 90%+ of production tool calls — then surveys the remaining five.


The Big Three

1. CLI: The Unix Philosophy, Reborn

The CLI tool interface is dead simple. The agent spawns a subprocess, passes arguments, and reads the output.

1
2
3
4
5
6
7
8
9
10
11
12
Agent                    Shell                    System
  │                        │                        │
  │  "git diff HEAD~1"     │                        │
  │───────────────────────▶│                        │
  │                        │  fork + exec           │
  │                        │───────────────────────▶│
  │                        │                        │
  │                        │◀───────────────────────│
  │   stdout (diff text)   │   exit code 0          │
  │◀───────────────────────│                        │
  │                        │                        │
  │  (LLM interprets diff) │                        │

No schema. No handshake. No OAuth. Just a command, a result, and a model that knows what to do with both.

Why LLMs Are Unreasonably Good at CLI

LLMs didn’t learn tool use from protocol specifications. They learned it from billions of real terminal sessions — Stack Overflow answers, GitHub Actions logs, tutorial transcripts, man pages.

When an LLM generates git log --oneline -10 | grep "fix", it’s not following a schema. It’s pattern-matching against millions of similar commands it saw during training.

  • Self-correction: Agent runs --help when uncertain, reads error messages, retries with different flags
  • Pipe composition: Models naturally chain curl | jq | grep without being taught
  • Novel combinations: LLMs improvise pipe chains they’ve never seen, because they understand the grammar of shell composition

MCP has zero training data. Every MCP tool call is a cold-start inference from schema descriptions alone.

Strengths and Limitations

StrengthDetail
Token efficiency~1,365 tokens for a GitHub PR list vs ~44,026 for MCP
100% reliability25/25 in benchmarks. No TCP timeouts.
Unix composabilityPipes, redirects, subshells — small tools into complex workflows
Self-documenting--help, man, error messages are natural language LLMs already understand
Ubiquitousgit, docker, kubectl, curl, jq, python — every tool has a CLI
LimitationDetail
Unstructured outputPlain text requires LLM interpretation; no guaranteed schema
Security surfaceBroad shell permissions; prompt injection → malicious commands
No discoveryNo standard way to enumerate tools or parameters at runtime
StatelessEnvironment resets between subprocess calls

Who Uses CLI as Primary Interface

  • Claude Code — 8 built-in tools; Bash is the universal adapter; ~135K GitHub commits/day (Feb 2026) [Anthropic Engineering]
  • Devin — Cloud sandbox with Bash + VS Code + Chrome [Devin Agents 101]
  • Open Interpreter — Natural language → Python/JS/shell via exec() [GitHub]
  • Codex CLI — Adopted SKILL.md from Claude Code [OpenAI]
  • Aider — Git-aware CLI coding assistant [GitHub]

The CLI Output Problem — and RTK’s Fix

CLI’s biggest weakness — unstructured output flooding the context window — has spawned its own optimization layer. RTK (Rust Token Killer) is a CLI proxy written in Rust that intercepts command output and compresses it before it reaches the agent [GitHub, 18.6k stars].

CommandRaw tokensAfter RTKReduction
cargo test~4,823~1199%
git diff HEAD~1~21,500~1,25994%
pytest -v~756~2496%

A typical 30-minute Claude Code session drops from ~150,000 tokens to ~45,000 (70% savings). RTK processes commands through a six-phase pipeline (parse → route → execute → filter → print → track) with 12 filtering strategies — stats extraction, error-only filtering, pattern grouping, deduplication, and more.

The integration is invisible: a PreToolUse hook rewrites shell commands to rtk equivalents. The agent never knows the compression happened. This is harness-level optimization — no model changes, no prompt changes, just better infrastructure between the agent and the shell.

RTK is part of a broader agent infrastructure ecosystem:

  • ICM — Persistent memory for agents via MCP-native knowledge graphs with typed relationships (depends_on, contradicts, refines) [GitHub]
  • Grit — Git for parallel agents with AST-level locking to prevent merge conflicts across 50+ concurrent agents [GitHub]

2. MCP: The USB-C of AI

MCP (Model Context Protocol) was created by Anthropic in November 2024, inspired by the Language Server Protocol (LSP). It uses JSON-RPC 2.0 as the wire format.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
┌─────────────────────────────────────────────────────────────┐
│                        Host                                 │
│                  (Claude Desktop, Cursor, VS Code)          │
│                                                             │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                 │
│  │ MCP      │  │ MCP      │  │ MCP      │                 │
│  │ Client 1 │  │ Client 2 │  │ Client 3 │  ← 1:1 mapping  │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘                 │
└───────┼──────────────┼──────────────┼───────────────────────┘
        │              │              │
        ▼              ▼              ▼
   ┌─────────┐   ┌─────────┐   ┌─────────┐
   │ GitHub  │   │  Slack  │   │Database │   ← MCP Servers
   │ Server  │   │ Server  │   │ Server  │
   │(43 tools)│  │(8 tools)│  │(5 tools)│
   └─────────┘   └─────────┘   └─────────┘

Three primitives per server:

  • Tools: Actions the agent can invoke (e.g., create_issue, send_message)
  • Resources: Data the agent can read (e.g., file contents, database rows)
  • Prompts: Reusable prompt templates for common operations

Two transports: stdio (local, low latency) and Streamable HTTP (remote, SSE)

The 2025-11-25 Spec (1-Year Anniversary)

Major upgrades in the anniversary release [MCP Blog]:

  • Tasks Primitive: Long-running operation tracking (workingcompletedfailed)
  • Simplified Auth: OAuth 2.1 with URL-based client registration
  • Extensions Framework: Composable additions outside the core spec
  • Sampling with Tools: Servers execute agentic loops using client-provided LLM tokens

Strengths and Limitations

StrengthDetail
StandardizationOne protocol for all integrations — no custom adapters
Self-descriptionServers declare capabilities; agents discover tools at runtime
Enterprise securityOAuth 2.1, per-server scoping, audit logging
BidirectionalServer-initiated notifications, progress updates
Ecosystem97M+ monthly SDK downloads, 10K+ public servers
LimitationDetail
Schema bloatGitHub MCP = 43 tools = ~44,000 tokens before any work
Reliability gap72% success vs CLI’s 100%. TCP timeouts.
Security immaturity30+ CVEs early 2026; 82% vulnerable to path traversal
Zero training dataLLMs never saw MCP patterns during pre-training

Ecosystem Scale

  • 97M+ monthly SDK downloads
  • 10,000+ public MCP servers; ~2,000 registry entries (407% growth since Sep 2025)
  • Governance: Donated to AAIF (Linux Foundation) Dec 2025 — co-founded by Anthropic, OpenAI, Google, Microsoft, AWS, Block [Anthropic]
  • Spec: modelcontextprotocol.io/specification/2025-11-25

3. Code Execution: The Third Way

This is the part most people miss. The biggest shift in 2026 isn’t MCP vs CLI. It’s Code-as-Tool-Use.

Instead of calling tools one at a time (sequential tool calling), the agent writes code that orchestrates multiple tools in a single execution. The code runs in a sandboxed environment and returns only the final result.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
┌────────────────────────────────────────────────────────────┐
│           Sequential Tool Calling vs Code Execution        │
│                                                            │
│   Traditional (N round-trips)      Code-as-Tool-Use        │
│   ──────────────────────────       ──────────────────       │
│   call tool_1(args)                sandbox.run("""          │
│   ← result_1                         r1 = tool_1(args)     │
│   call tool_2(result_1)              r2 = tool_2(r1)       │
│   ← result_2                         r3 = tool_3(r2)       │
│   call tool_3(result_2)              return r3              │
│   ← result_3                       """)                    │
│                                    ← result_3              │
│                                                            │
│   Round-trips: 3                   Round-trips: 1          │
│   Schema overhead: 3x             Schema overhead: 1x      │
│   Token cost: HIGH                Token cost: LOW           │
└────────────────────────────────────────────────────────────┘

Key Implementations

Cloudflare Code Mode [Blog]

The most radical implementation. Instead of 2,500 individual tool schemas, expose just 2 meta-tools: search() and execute(). The agent writes TypeScript against a typed SDK, executed in a V8 isolate (Dynamic Worker).

1
2
3
4
5
6
7
Before: 2,500 API endpoints as individual tools
  Total schema cost: ~1,170,000 tokens

After: 2 tools (search + execute)
  Total schema cost: ~1,000 tokens

  Token reduction: 99.9%

Dynamic Workers boot in milliseconds, use megabytes of memory — claimed 100x faster and 100x more memory-efficient than containers [Blog].

Anthropic Programmatic Tool Calling (PTC) [Docs]

Claude writes a Python script that orchestrates multiple tool calls in a single sandboxed execution. The script processes intermediate results internally — Claude only sees the final output.

  • 37% reduction in round-trips
  • Up to 37% token savings on multi-step workflows
  • Combined with Anthropic’s code execution tool [Engineering]

E2B Sandboxes [e2b.dev]

Open-source, Firecracker microVM-based sandboxes purpose-built for AI agents:

  • Boot in <200ms, 3-5MB memory overhead per instance
  • ~15 million sessions/month (March 2025)
  • ~50% of Fortune 500 using it
  • SDKs for Python and JavaScript

OpenAI Code Interpreter [Docs]

Runs Python in sandboxed containers via the Responses API. Handles data analysis, file transforms, chart generation, iterative debugging. Internet access disabled during execution.

Why Code Execution Changes Everything

The traditional tool-calling model — LLM decides action, system executes, LLM reads result, repeat — has a fundamental O(n) round-trip problem. Each tool call requires a full LLM inference pass to decide the next action.

Code execution collapses this to O(1). The LLM plans the entire workflow upfront, writes it as code, and executes it in a single pass. The sandbox handles the sequential logic that would otherwise burn N inference calls.

This matters most when:

  • Multi-step data transformations (filter → aggregate → format → send)
  • Large API surfaces (Cloudflare’s 2,500 endpoints, GitHub’s 43 tools)
  • Cost-sensitive operations (code is 99.9% cheaper than N sequential MCP calls)

The Benchmark: Three-Way Comparison

Benchmarks from Scalekit [Blog], Cloudflare [Blog], and Anthropic [Docs]:

Token Cost

1
2
3
4
5
6
7
8
Task: List pull requests for a repository

CLI (gh pr list --json)         ██  1,365 tokens
Code Mode (search + execute)    █  ~600 tokens
MCP (GitHub MCP Server)         ████████████████████████████████████████████  44,026 tokens

                                CLI is 32x cheaper than MCP
                                Code Mode is 73x cheaper than MCP

Monthly Cost at Scale (10,000 operations)

InterfaceTokens/opMonthly costReliability
Code Mode~600~$1.50~98%
CLI1,365$3.20100%
MCP44,026$55.2072%

Why These Numbers Are Partially Misleading

The benchmarks compare CLI-accessible operations. MCP’s value isn’t replacing CLIs — it’s connecting to services that have no CLI: Figma, Notion, Salesforce, internal APIs.

And Code Mode’s value isn’t replacing simple CLI calls — it’s replacing multi-step MCP workflows where N sequential tool calls become 1 code execution.

The right question: what’s the right interface for each system your agent touches?

Perplexity publicly removed MCP support, citing token cost and reliability. [jannikreinhard.com]


The Remaining Five

These interfaces handle specific niches where the Big Three don’t reach.

4. Native Function Calling — The Engine Under MCP

This is the model API’s own structured tool-use mechanism — the layer MCP sits on top of.

  • OpenAI: Function calling via Responses API; strict: true for schema-guaranteed arguments [Docs]
  • Anthropic: tool_use blocks in Messages API; server-side tools (web_search, code_execution) [Docs]
  • Google: Gemini function calling; multi-tool combination in Gemini 3 [Docs]

Relationship to MCP: Function calling is vendor-specific tight control. MCP is standardized portability. OpenAI treats them as complementary [OpenAI Agents SDK].

Use when: You need maximum control over a small, fixed set of tools within a single provider’s API.

5. Browser/Computer Use — The Last Resort

When there’s no API, no CLI, no MCP server — the agent uses its eyes and hands.

1
2
3
4
5
6
7
8
9
10
11
12
Agent                    Screen                   Application
  │                        │                        │
  │  take_screenshot()     │                        │
  │───────────────────────▶│                        │
  │   ◀── image bytes ─────│                        │
  │                        │                        │
  │  (LLM: "I see a       │                        │
  │   login form...")      │                        │
  │                        │                        │
  │  click(x=340, y=220)  │                        │
  │───────────────────────▶│───────────────────────▶│
  │                        │                        │
  • Anthropic Computer Use: Screenshots → mouse/keyboard actions. Beta since Oct 2024. [Docs]
  • OpenAI CUA/Operator: GPT-4o vision + RL-trained GUI interaction. 87% on WebVoyager, 38% on OSWorld. [Blog]
  • Stagehand v3 (Browserbase): act(), extract(), observe(). Talks directly to Chrome DevTools Protocol, 44% faster. [stagehand.dev]
  • Browserbase: Cloud browser infra. $40M Series B, 50M sessions in 2025. [browserbase.com]

Use when: The target has a visual interface but no API. Web automation is approaching production; desktop remains experimental.

6. API-Direct — LLM-Generated HTTP

The agent generates raw HTTP requests from OpenAPI specs.

  • Composio: 1,000+ pre-built connectors, managed OAuth, type-safe interfaces [composio.dev]
  • GPT Actions: OpenAPI-spec-defined API calls in Custom GPTs — being replaced by MCP Apps
  • Cloudflare Code Mode: The typed SDK approach is essentially API-Direct via generated code

Use when: An OpenAPI spec exists and you want maximum flexibility without building a tool wrapper. Typically mediated through a framework rather than raw generation.

7. Message Queue / Event-Driven

Agents communicate with tools via Kafka, Redis Streams, or Pulsar.

  • Tool invocations published as events; consumers execute and publish results
  • Inherently async: perfect for long-running agent tasks
  • Durable: messages survive crashes; exactly-once semantics
  • Scalable: millions of events/second
  • Patterns documented by Red Hat [Blog]

Use when: Enterprise multi-agent systems where durability and async processing matter more than latency. Overkill for single-agent synchronous workflows.

8. WebAssembly Sandboxing

AI-generated code compiled to Wasm modules and executed in maximally isolated runtimes.

  • Microsoft Wassette: WebAssembly Components via MCP; modules fetched from OCI registries [Microsoft]
  • NVIDIA: Wasm for sandboxing agentic AI workflows [NVIDIA]
  • Wasm modules are inert by default — zero host access unless explicitly granted
  • WebAssembly 3.0: 64-bit memory, garbage collection, exception handling

Use when: Maximum sandbox security is non-negotiable. The safest execution environment available, but still early-stage for agent use cases.


The Protocol Ecosystem Map (2026)

These interfaces don’t exist in isolation. A layered protocol stack is emerging:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
┌───────────────────────────────────────────────────────────────┐
│                   Protocol Stack (2026)                       │
│                                                               │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │  Layer 4: Agent Commerce                                │  │
│  │  ACP / UCP — Payment, procurement, transactions         │  │
│  │  Status: Early stage                                    │  │
│  └─────────────────────────────────────────────────────────┘  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │  Layer 3: Agent-to-Web                                  │  │
│  │  WebMCP — Standardized web interaction (replaces scrape)│  │
│  │  Status: Emerging                                       │  │
│  └─────────────────────────────────────────────────────────┘  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │  Layer 2: Agent-to-Agent                                │  │
│  │  A2A (Google) — Discovery, delegation, coordination     │  │
│  │  Status: v1.0 (gRPC + JSON-RPC, signed Agent Cards)     │  │
│  └─────────────────────────────────────────────────────────┘  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │  Layer 1: Agent-to-Tool                                 │  │
│  │  MCP (Anthropic → AAIF) — Tool/data/API access          │  │
│  │  Status: Dominant (97M+ downloads/month)                │  │
│  └─────────────────────────────────────────────────────────┘  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │  Layer 0: Local Execution                               │  │
│  │  CLI + Code Execution — Direct OS/sandbox access        │  │
│  │  Status: Universal                                      │  │
│  └─────────────────────────────────────────────────────────┘  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │  Substrate: Execution Environments                      │  │
│  │  E2B (microVM) │ Dynamic Workers (V8) │ Wasm (isolate)  │  │
│  │  Status: Mature (E2B) / Emerging (Wasm)                 │  │
│  └─────────────────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────────────────┘

Key relationships:

  • MCP + A2A: MCP connects agents to tools; A2A connects agents to each other. IBM’s ACP merged into A2A Aug 2025 [Google]
  • AAIF governance: All protocols under the Linux Foundation [AAIF]
  • Function Calling + MCP: OpenAI treats them as complementary — function calling for tight control, MCP for portability

Security: Three Models Compared

Each interface has a fundamentally different security profile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
┌────────────────────────────────────────────────────────────────┐
│                  Security Model Comparison                     │
│                                                                │
│  CLI                  MCP                   Code Execution     │
│  ────                 ────                  ──────────────      │
│  Threat:              Threat:               Threat:             │
│  Prompt injection     Tool poisoning,       Sandbox escape,     │
│  → shell command      path traversal,       arbitrary code      │
│                       token theft                               │
│                                                                │
│  Mitigation:          Mitigation:           Mitigation:         │
│  ┌───────────────┐    ┌───────────────┐    ┌───────────────┐   │
│  │ Seatbelt/     │    │ OAuth 2.1     │    │ Firecracker   │   │
│  │ bubblewrap    │    │ per-server    │    │ microVM       │   │
│  │ (OS sandbox)  │    │ scoping       │    │ (hardware     │   │
│  │               │    │               │    │  isolation)   │   │
│  │ Command       │    │ Audit logging │    │               │   │
│  │ allowlist     │    │               │    │ No network    │   │
│  │               │    │ AAIF working  │    │ by default    │   │
│  │ Human-in-     │    │ groups        │    │               │   │
│  │ the-loop      │    │               │    │ V8 isolate    │   │
│  └───────────────┘    └───────────────┘    └───────────────┘   │
│                                                                │
│  Maturity:            Maturity:             Maturity:           │
│  Decades of Unix      30+ CVEs in 2026     E2B: production     │
│  security practice    OWASP MCP Top 10     Wasm: emerging      │
│                       82% path traversal                       │
└────────────────────────────────────────────────────────────────┘

CLI is a known risk with known mitigations. MCP’s attack surface is novel and poorly understood. Code execution sandboxes (E2B, Wasm) offer the strongest isolation but are newest.

References:


The Decision Framework

Use this flowchart when choosing a tool interface:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
                    ┌─────────────────────────────┐
                    │  Does a CLI tool exist for   │
                    │  this operation?              │
                    └──────┬──────────────┬────────┘
                           │ Yes          │ No
                           ▼              ▼
                    ┌──────────┐   ┌─────────────────────────┐
                    │ Is it a  │   │ Is it a multi-step       │
                    │ single   │   │ workflow against an API?  │
                    │ command? │   └──────┬──────────┬────────┘
                    └──┬───┬──┘          │ Yes      │ No
                       │   │             ▼          ▼
                       │   │      ┌──────────┐  ┌────────────────┐
                       │Yes│No    │Use Code  │  │ Does it need   │
                       ▼   ▼      │Execution │  │ OAuth / audit? │
                 ┌──────┐ ┌────┐  │(sandbox) │  └──┬──────────┬──┘
                 │ Use  │ │Use │  └──────────┘     │ Yes      │ No
                 │ CLI  │ │Code│                    ▼          ▼
                 │      │ │Exec│             ┌──────────┐ ┌──────────┐
                 └──────┘ │    │             │ Use MCP  │ │ Visual-  │
                          └────┘             │          │ │ only UI? │
                                             └──────────┘ └──┬────┬──┘
                                                             │Yes │ No
                                                             ▼    ▼
                                                       ┌───────┐┌──────┐
                                                       │Browser││ API- │
                                                       │  Use  ││Direct│
                                                       └───────┘└──────┘

Rules of Thumb

  1. Start with CLI. Cheapest, most reliable, LLM already knows it.
  2. Use Code Execution for multi-step workflows. Collapses N round-trips to 1.
  3. Add MCP when you need auth, audit, or API-only services. Not for things with good CLIs.
  4. Browser/Computer Use is the last resort. Only when no API exists at all.
  5. Lazy-load everything. Don’t pay 44,000 tokens for 43 schemas when you’ll use 2.
  6. Monitor token costs. MCP’s 32x overhead is invisible until you check your bill.
  7. Compress CLI output. Tools like RTK cut CLI token cost by 60-90% — the cheapest interface gets even cheaper.

Token Optimization: The War on Schema Bloat

MCP’s schema bloat problem has spawned an entire subfield:

StrategyToken ReductionComplexityReference
Lazy Loading (Claude Code v2.1.7)108K → 5K (95%)Low[Claude Code Docs]
Code Mode (Cloudflare)1.17M → 1K (99.9%)Medium[Cloudflare Blog]
Dynamic Toolsets44K → 3K (93%)Medium[Speakeasy]
MCP Gateway90% (schema filtering)Medium[StackOne]
SKILL.md Pattern33% fewer tool callsLow[Claude Code → Codex CLI → Gemini CLI]
RTK (CLI output compression)60-90% CLI tokensLow[GitHub]
RAG-MCP>50% + 3x accuracyHigh[arxiv:2505.03275]

The last entry — RAG-MCP — is a bridge to Part 2, where we explore how GraphRAG enables intelligent tool discovery that makes schema bloat a non-issue.


What’s Next: From Landscape to Orchestration

Knowing the interfaces is step one. The harder question is how to combine them into a well-designed harness:

  • How does an agent decide which interface to use for each step?
  • How is state maintained when switching between CLI (stateless), MCP (session), and code execution (sandbox)?
  • When one interface fails (MCP timeout), how do you fall back to another (CLI)?
  • With 50+ tools available, how does the agent find the right one without loading all schemas?

These questions are the subject of Part 2: Orchestrating Tool Interfaces — From Harness Design to GraphRAG.


References

Specifications and Official Docs

  1. MCP Specification (2025-11-25)modelcontextprotocol.io/specification/2025-11-25
  2. One Year of MCP: Anniversary Spec Releaseblog.modelcontextprotocol.io
  3. Anthropic: Donating MCP to AAIFanthropic.com
  4. Google A2A Protocoldevelopers.googleblog.com
  5. OpenAI Function Callingplatform.openai.com
  6. Claude Tool Useplatform.claude.com
  7. Gemini Function Callingai.google.dev

Benchmarks and Analysis

  1. MCP vs CLI: Benchmarking Cost & Reliabilityscalekit.com
  2. Why CLI Tools Are Beating MCPjannikreinhard.com
  3. MCP vs CLI for AI-Native Developmentcircleci.com
  4. CLI vs MCP, or CLI + MCPshubhdeepchhabra.in
  5. AI Agent Protocol Ecosystem Map 2026digitalapplied.com

Code Execution and Sandboxing

  1. Cloudflare Code Modeblog.cloudflare.com
  2. Cloudflare Dynamic Workersblog.cloudflare.com
  3. Anthropic Programmatic Tool Callingplatform.claude.com
  4. Anthropic Advanced Tool Useanthropic.com/engineering
  5. E2B Sandboxese2b.dev
  6. OpenAI Code Interpreterdevelopers.openai.com
  7. Microsoft Wassette (WebAssembly)opensource.microsoft.com
  8. NVIDIA Wasm for Agentic AIdeveloper.nvidia.com

Browser and Computer Use

  1. Anthropic Computer Useplatform.claude.com
  2. OpenAI CUA / Operatoropenai.com
  3. Stagehand v3stagehand.dev

Security

  1. OWASP MCP Top 10owasp.org
  2. MCP Security: 30 CVEs in 60 Daysheyuan110.com
  3. Claude Code Sandboxinganthropic.com/engineering
  4. Simon Willison: MCP Prompt Injectionsimonwillison.net

Architecture and Patterns

  1. Inside Claude Code Architecturepenligent.ai
  2. AI Agent CLI + MCP Hybrid Architecturestackone.com
  3. MCP Token Optimization: 4 Approachesstackone.com
  4. Reducing MCP Token Usage by 100xspeakeasy.com
  5. RAG-MCP: Mitigating Prompt Bloatarxiv:2505.03275
  6. The Protocol Warstheregister.com

CLI Optimization

  1. RTK: Rust Token Killer (18.6k stars) — github.com/rtk-ai/rtk
  2. RTK Architecturegithub.com/rtk-ai/rtk/ARCHITECTURE.md
  3. ICM: Persistent Memory for Agentsgithub.com/rtk-ai/icm
  4. Grit: Git for Parallel Agentsgithub.com/rtk-ai/grit
This post is licensed under CC BY 4.0 by the author.