MCP's Achilles Heel

Posted on Jun 3, 2025

I’ve been getting into personalizing Claude desktop for more much more detailed, powerful responses.

Model Contect Protocol (MCP), is a way to expose tools (APIs) to an AI agent to get custom data, or do custom actions. The issue is that the request and response are all fully written and read by the LLM, so for large datasets, you’re making the LLM your data transport layer as well as an AI.

Why MCP is good

Imagine you’re getting married and have Claude set up with MCPs to be able to help out. You might have an AI agent set up with MCPs to

  • get the weather
  • search Zola (wedding planning)
  • send emails

You could give a prompt “Given the weather for our wedding, look through the vendors who might be affected, and make me an email to send them asking what they’re planning on.” The agent then can make an MCP call to check the weather, make another call to check vendors in zola and their description, then decide what weather might impact what vendors, and send an email for you.

MCP gives your AI a way to get custom data and do custom actions on your behalf, without the agent vendor needing direct native integrations. {{ range site.RegularPages }}

What is this actually doing

Each one of these actions are executed as tool calls in the MCP. So you might have a chain like

-----
User:
Given the weather for our wedding, look through the vendors who might be affected, and 
make me an email to send them asking what they're planning on.
-----
Agent:
The user wants me to compare wedding data to weather data. Let me first get the weather data

Tool - Weather
{
  `query`: `Weather in New York City on 8/28`,
}

Response:
┌──────────────────────────────┬───────────────────────┤  Sat 07 Jun ├───────────────────────┬──────────────────────────────┐
│            Morning           │             Noon      └──────┬──────┘     Evening           │             Night            │
├──────────────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│               Fog            │               Fog            │               Cloudy         │    \  /       Partly Cloudy  │
│  _ - _ - _ -  18 °C          │  _ - _ - _ -  20 °C          │      .--.     21 °C          │  _ /"".-.     20 °C          │
│   _ - _ - _   ↘ 11-14 km/h   │   _ - _ - _   ↘ 11-13 km/h   │   .-(    ).   ↘ 15-20 km/h   │    \_(   ).   ↓ 5-11 km/h    │
│  _ - _ - _ -  10 km          │  _ - _ - _ -  10 km          │  (___.__)__)  10 km          │    /(___(__)  10 km          │
│               0.0 mm | 0%    │               0.0 mm | 0%    │               0.0 mm | 0%    │               0.0 mm | 0%    │
└──────────────────────────────┴──────────────────────────────┴──────────────────────────────┴──────────────────────────────┘

-----
Agent:
There's a chance of rain, now let me get the wedding vendors

Tool - Zola
{
  `list`: `Vendors`,
  `limit`: 100
}

Response
Johns Catering, Catering, johns@jeats.com
Tims Cake, Cake, null
Ruth's Flowers, Flowers,  ruthflower@exmaple.com
Terry's Tents, Tents, terry@terrystents.com

-----
Agent:
Given there's a chance of rain,  let's make sure the Caterer and tent vendors are prepared.

Tool - Email
{
  `address` `terry@terrystents.com`
  `body`: `Just checking the rain plan is all set with the tents?`,
}

Tool - Email
{
  `address` `johns@jeats.com`
  `body`: `If it rains, are we all set with the food?`,
}

Then the user is prompted if they want to send that email.

Pipes Were Good

The problem here is that each request to the tool needs to be written out in full by the LLM. Imagine instead of a couple of vendors and weather, you had a case where you wanted to email everyone at a sports game based on the weather. If you had three tools

  • Weather
  • Seat information (Who’s affect by rain)
  • Email blast

You would have to buffer 10’s of thousands of people through the LLM. The token/second becomes your data throughput. Agent decision making capabilities are confounded with data pipes. It becomes wildly more expensive and less accurate as context grows.

It’s as if in working with a POSIX shell, instead of being able to | data from a process to another, it had to be written to the terminal then pasted into the next process.

The solution

If MCPs are to actually be the “USB” of agents, they need a way to allow tools to read and respond to a blob storage / artifact. Instead of the data being processed by the LLM, only the metadata pertinent to decision making should be directly shared to the LLM. These blobs will need a standard interface for read and write, with the agent simply passing the tool the path and authorization of how access it.

Maybe S3 paths with access tokens are enough? Do we want the MCP servers to have to reimplement read/write each time, or do we need MCP to have two bands of commication, low baud instructions, and high baud raw data, and the agents handle the buffering?

MCP is incredinly useful, but limited to low data throughput tasks.

The new A2A protocol has concepts of different data types, maybe that will solve this problem?

We’ll see!