Five tools, five different jobs: a map of agent tooling

The debugging session that finally clarified this for me started with a straightforward question: why is our agent running a twelve-step loop to do something a single skill invocation could handle in one? The agent was technically working. It was also burning tokens, adding latency, and producing a call stack that took twenty minutes to trace. We had reached for the wrong primitive.

Five categories of tooling now sit in this space: MCP servers, Claude Code plugins, Custom GPTs, Claude Skills, and autonomous agents. They get conflated constantly, which is why people end up with agents doing the work of skills, or MCP servers getting rebuilt as plugins, or Custom GPTs wired into pipelines they were never designed to serve. What follows is a map of where each one lives and what it is actually for.

MCP servers: capability that travels across agents

An MCP server exposes tools and resources to any MCP-compatible host over a structured protocol. When a model needs to read a file, query a database, or call an API, it asks the MCP server to do it. The server runs as a separate process, which means any agent that connects to it can use the same capability: you write the tool once and it serves the whole pipeline.

The Filesystem MCP server is the cleanest illustration of this. It gives any connected agent scoped read/write access to the local disk without embedding file-handling logic in the agent itself. Nothing about the server cares which agent is on the other end. That’s the point.

Reach for an MCP server when you need a capability to cross assistant boundaries, or when the same tool needs to serve multiple agents. If you find yourself copying tool logic from one agent to another, you probably want an MCP server.

Claude Code plugins: team workflow as installable unit

Claude Code plugins bundle commands, hooks, and MCP server configurations into a single installable unit for Claude Code specifically. The difference from a bare MCP server is intentionality: a plugin is an opinionated workflow, not a general-purpose tool provider.

A plugin might wire up a linter, configure a commit-message hook, and add a slash command, all installed together with one line. The entire team gets the same environment from that point forward. Reach for a plugin when you want to encode how your team works with Claude Code, rather than leaving it documented in a README that people will inevitably miss.

Custom GPTs: products built for end users on ChatGPT

Custom GPTs are user-facing assistants configured on top of ChatGPT. They get a system prompt, optional Actions (HTTP-based tool calls), and a curated set of capabilities like code execution or image generation. The defining constraint is audience: Custom GPTs face end users, not developers wiring up pipelines.

If you are building a self-contained AI product for people who are not going to write code (a document analyzer, a research assistant, a customer-facing chatbot), a Custom GPT is the right primitive. The moment you need the capability inside an agent loop, you are outside the territory it was designed for.

Claude Skills: reusable processes inside a session

Claude Skills are invocable modules that extend what Claude Code can do within a session. A skill captures a repeatable process: converting a PDF to structured data, generating a frontend component, running a particular kind of code review. It becomes callable by name. Skills can orchestrate several MCP servers beneath them, and they run when invoked rather than autonomously.

The practical test for when to use one: if you find yourself rebuilding the same multi-step logic across projects, that’s a skill waiting to be written. One definition, consistent execution, no re-explaining the steps to Claude on every new project.

Autonomous agents: when the task requires its own judgment

Agents hold goals, plan steps, invoke tools, and loop until a task is complete. They decide what to do next; the other primitives do what they are told. That distinction matters more than it sounds.

Agent frameworks like LangGraph and CrewAI handle orchestration, memory, and coordination across multiple agents. Use one when the task cannot be fully specified upfront: when the next step depends on what the previous step returned, and that logic cannot be captured in a fixed sequence.

Most production systems end up combining several of these. A coding agent might run inside a framework, invoke skills for repetitive sub-tasks, use MCP servers for file and repo access, and have plugins installed to keep the team’s conventions in place. They are layers, not alternatives.

The thing teams get wrong

The default move is reaching for a full agent when a skill or a plugin would do. Autonomy is expensive: in tokens, in latency, in the debugging time when something goes wrong inside a loop you did not fully control. An agent running twelve steps to accomplish what a skill does in one is not a more capable solution. It is a slower, more fragile one that will be harder to explain to whoever has to fix it at 2am.

The instinct to skip to agents is understandable. They feel like the full version of the idea. But the right question is not “what is the most powerful tool I could use here?” It is “what is the least autonomous tool that handles this task?” The answer is usually lower on the stack than your first guess.

That’s the error mode the map is for.

Five tools, five different jobs: a map of agent tooling

MCP servers: capability that travels across agents

Claude Code plugins: team workflow as installable unit

Custom GPTs: products built for end users on ChatGPT

Claude Skills: reusable processes inside a session

Autonomous agents: when the task requires its own judgment

The thing teams get wrong

More from the blog

7 Attio Alternatives for AI Agents

7 Copper CRM Alternatives for AI Agents

9 HubSpot Alternatives for AI Agents

Recently verified