Random Thoughts

When you give an AI assistant the ability to act on the outside world, there are two dominant ways to wire it up: expose capabilities through a Model Context Protocol (MCP) server, or simply let the model invoke command-line tools (CLIs). They solve the same problem — letting a model do things it can't do with text alone — but they make opposite trade-offs in cost, structure, and safety.

What Is MCP?

The Model Context Protocol is an open standard that lets an LLM discover and call external tools through a uniform interface. Instead of hard-coding every integration, a host application connects to one or more MCP servers, each of which advertises a set of tools, resources, and prompts. The model can then query what's available and invoke those capabilities through structured, schema-defined calls.

What Is a CLI?

A command-line tool is a program the model runs as a subprocess — pytest, git, eslint, curl, and the thousands of other utilities a developer already uses. There's no schema to load and no server to connect to: the model writes a command, the shell runs it, and the output comes back as text on stdout.

How MCP Works

At its core, MCP lets a model ask external tools for help completing a task. Imagine you tell an assistant:

"Find the latest sales report in our database and email it to my manager."

The model can't touch a database or send mail on its own, so it works through MCP:

Discovery. The model uses the MCP client to look for relevant tools and finds two registered on MCP servers: a database_query tool and an email_sender tool.
First invocation. It generates a structured request to database_query with the report name. The client routes that request to the right MCP server.
External action. The server translates the request into a secure SQL query, retrieves the sales report, formats it, and returns it to the model.
Second invocation. Now holding the report, the model calls email_sender with the manager's address and the report contents. The server sends the email and confirms.
Final response. The model replies: "I found the latest sales report and emailed it to your manager."

Security

Because MCP lets a model take real actions and touch real data, its security model matters as much as its convenience.

User consent and control. Users should clearly understand and approve every action and data access the model performs, ideally through simple authorization screens that let them control what is shared and what is run.
Data privacy. Hosts must get explicit permission before exposing user data to a server. Sensitive data needs proper access controls, encryption, and strict rules to prevent accidental leaks.
Tool safety. Tools can execute code. Tool descriptions shouldn't be trusted unless they come from a reliable server, and users should approve a tool — and understand what it does — before it runs.
Secure output handling. Outputs from MCP interactions must be sanitized before being shown to users to avoid injection attacks like XSS. Clean inputs, filter outputs, and keep sensitive data out of prompts.
Supply chain security. Servers and the external tools they reach are part of your supply chain. Every link should be vetted to prevent breaches, biased results, or failures.
Monitoring and auditing. Strong logging of model activity and tool usage helps detect misuse and supports incident response by tracking how data and tools move.

MCP Architecture

An MCP deployment has four moving parts:

Host application. The app housing the LLM that talks to users and initiates connections — Claude Desktop, AI-enhanced IDEs like Cursor, or a web chat interface.
MCP client. Embedded in the host, it manages connections to servers and translates between the host's needs and the protocol.
MCP server. Adds context and capabilities, exposing specific functions to the model. Each server usually focuses on one integration — GitHub for repositories, PostgreSQL for databases.
Transport layer. The channel between client and server. MCP supports two transports: STDIO for local integrations where the server runs alongside the client, and HTTP + SSE for remote connections (HTTP for requests, Server-Sent Events for streaming responses).

All communication uses JSON-RPC 2.0 as the message standard, giving requests, responses, and notifications a consistent structure.

The Connection Flow

Initial connection. When an MCP client like Claude Desktop starts, it connects to the MCP servers configured on your device.
Capability discovery. The client asks each server what it offers, and each responds with its available tools, resources, and prompts.
Registration. The client registers those capabilities, making them available to the model during your conversation.

MCP vs CLI: At a Glance

Dimension	MCP	CLI
Context window cost	High (schema loaded on connect)	Near zero
Setup complexity	Medium to high	Low
Authentication handling	Centralized at server level	Manual or per-tool
Output format	Consistent JSON	Text / stdout (varies)
Cross-system coordination	Native	Complex (piping, scripting)
Audit logging	Built-in	None built-in
Model familiarity	Varies by server	High (from training data)
Best for	External system coordination	Fast local iteration

When MCP Wins

Authentication. External systems need credentials. An MCP server handles auth once, at the server level, instead of every developer configuring tokens and managing expiry for every CLI that touches a provider.
Structured responses. Build logs, test results, and deployment status are far easier for an assistant to act on as structured JSON than as raw log text. MCP servers return data the model can use directly.
Discovery. On connecting to a server, the assistant learns what operations exist and can decide what to call without explicit instruction — query failure logs, check pipeline status, surface flaky tests. The server describes what's possible.
Session state. CLIs are stateless; each command runs in isolation. MCP servers can carry state across a session, which matters for multi-step workflows — like correlating a failing test with the deployment that introduced it.

When CLI Wins

Token efficiency. Every token spent on MCP schema is one the model can't spend reasoning about the actual code. In tight iteration loops that budget matters, and CLIs run as subprocesses with zero schema overhead.
Training familiarity. Models have absorbed enormous amounts of shell usage from docs, Stack Overflow, GitHub, and tutorials. An assistant already knows pytest, cargo test, and eslint without discovering their schemas at runtime.
Composability. Piping output through standard Unix tools is natural. Capping noisy output with pytest --tb=short 2>&1 | head -50 is one line; the same through MCP needs server-side support or a chain of structured calls.

Reducing Redundant Token Usage

MCP's biggest cost is schema bloat: servers inject definitions for all their tools into every conversation, regardless of what the task needs. A server with 43 tools means the agent carries 43 schemas on every call even when it uses one or two.

The fix is schema filtering at a gateway layer. Instead of passing all 43 definitions, a gateway returns only the 2–3 tools relevant to the current request — cutting MCP token usage by roughly 90% and bringing it close to CLI efficiency while preserving MCP's authorization model.

Do You Need an MCP Gateway?

A gateway sits between your agents and upstream MCP servers and solves the three problems that make direct connections expensive and unreliable:

Schema filtering — returns only the tool definitions relevant to the task, cutting token overhead ~90%.
Connection pooling — keeps persistent connections to upstream servers so individual agent sessions don't absorb handshake and timeout costs.
Centralized auth — handles OAuth refresh, scope enforcement, and audit logging in one place rather than per-agent.

If you're building a developer tool for your own use, you don't need a gateway. If you're building a multi-tenant product where agents act on behalf of customers, a gateway is what makes MCP's authorization model economically viable.

Without a gateway — every session pays the handshake, and the server ships all 43 schemas:

With a gateway — connections are pooled and the schema is filtered to the 2 tools that matter:

Takeaway

Reach for a CLI when the model is iterating fast on local tasks it already understands — running tests, linting, scripting — where token efficiency and shell familiarity dominate. Reach for MCP when you need to coordinate external systems with centralized auth, structured outputs, discovery, and audit logging. And once MCP is in play at scale, a gateway recovers most of the token cost that makes people reach for CLIs in the first place.