LLM Tool Architectures: Your Questions Answered
In the last post, we covered the three architectures for reusable LLM tools. That post prompted some great follow-up questions about the practical differences between these approaches, especially how using command-line (CLI) scripts as tools fits into the picture.
We'll cover:
Do LLM APIs know how a tool was defined?
What are the real-world differences in context token usage?
What's the difference between tool-calling via MCP and CLI tools?
Do LLM APIs "know" the difference between tool-calling approaches?
Not in any practical way.
At its core, tool-calling is a prompting strategy. You provide the model with a function's name, its arguments, and a description of what it does. The model then returns structured JSON that tells your application which function to run with which inputs.
The API doesn't know or care how you generated that prompt.
Direct SDK: You manually build the complex dictionary the API expects.
Higher-Level Library: The library builds that same dictionary for you from a simpler function definition.
MCP Server: The MCP client fetches tool definitions from a server and formats them for the model.
CLI Script: You instruct the model to use a command from a
Justfile
or shell script. The model uses your description if you gave it enough information, or figures out the parameters from script’s contents if not.
The final payload sent to the model API is just a structured description of the tool. The model's job is to generate a structured JSON response.
Whether you get that tool definition into the prompt via an MCP server, a library, or by telling the agent to use a CLI script, it's an implementation detail on your side.
Are there practical differences in context token usage between CLI script and MCP?
Theoretically there doesn’t need to be a difference. In practice, absolutely.
Both methods work by adding text to the model's context window. The difference comes down to what and how much text gets added.
MCP servers are designed to be general-purpose, they often define dozens of tools you may not need for a specific task. When you connect to a server, the definitions for all of its available tools are often added into your system prompt.
Geoffrey Huntley estimates that popular coding assistants already use ~24,000 tokens for their system prompts. Adding a single, popular GitHub MCP server injects definitions for 93 additional tools, consuming another 55,000 tokens. This leaves you with less context for your actual task.
CLI tools and libraries give you more control. When you give an agent access to a local script or define a tool with a library, you are typically only providing the context for that one specific tool. This is far more token-efficient.
However, this approach requires discipline. If you lazily tell your agent, "Use my_script.py
to do the thing," the agent's first step will be to read the entire script to understand what it is and how to use it. This adds turns and consumes tokens, though it’s typically still better than a large general MCP since it’s related to your specific task.
The takeaway: The most token-efficient method is the one that provides the model with only the necessary tools for the task at hand. Tailored CLI scripts and specific library functions almost always beat general-purpose MCP servers here.
What's the difference between MCP, and CLI tools the LLM calls?
Fundamentally, they are variations of the same thing. They are all methods for telling an LLM about a function it can ask you to run. The real difference is not in the "what" but in the "why"—it's a choice about your workflow.
Tool-Calling (SDK/Libraries) is best when the tool is integral to your application logic. The tool definition lives with the code that uses it.
CLI Tools are best for your personal, local development workflow. I create a set of scripts for a project and use a
Justfile
as a single entry point. I then tell the agent in myclaude.md
file to use theJustfile
for its tasks. This works great for most things. There are a few (but not many) cases where a local MCP server can be better, primarily web-dev to get a really lightweight playwright that can use your local app.MCP Servers are best for decoupling and sharing. An MCP server that you can add to claude desktop is a fairly easy path to get a tool into non-coder domain experts hands early. And getting things in domain experts early hands is REALLY REALLY helpful and underrated.
While this post focuses on tool calling, MCP provides more than just tools. Servers can expose data through Resources (like files or database records) and guide agents by provind extra Prompts.
If you wanted a colleague or a client to use those tools, sending them a folder of scripts and a claude.md
file- and then walking them through setup - is complicated.
Walking them through adding an MCP server to Claude Desktop is simple. It makes your tools portable and plug-and-play for anyone, regardless of their local setup.
The choice depends on the problem you're solving
From Theory to Production
Choosing the right tool integration strategy is more than a preference; it’s an architecture decision.
If you're ready to move beyond basic tool-calling and learn how to build reliable, customized AI workflows that scale, check out our 3-week, cohort-based course, Elite AI Assisted Coding. We dive deep into designing and shipping these exact systems with live instruction from AI engineering leaders from companies like Google and Microsoft. The next cohort runs from Oct 6–24, 2025.