Skip to content
Tool use · Function calling · MCP

Tool Use, Function Calling, MCP: The Production LLM Integration Stack (2026)

Tool use is how LLMs touch your databases, APIs, and filesystems. The 2026 stack has three layers: provider-specific tool APIs (OpenAI, Anthropic, Google), framework abstractions (LangChain, LlamaIndex), and the Model Context Protocol (MCP) for portable servers. Here's how each layer wins.

By Andy Gaber, Founder, Digital Dashboard HubUpdated

A pure text-in, text-out LLM is useful for content generation. A production LLM system — the kind that runs customer support, code generation, data analysis, or agentic workflows — needs to call APIs, query databases, and execute code. That bridge is tool use (also called function calling), and the 2026 stack has consolidated around a few dominant patterns.

Per OpenAI's function calling and tool use documentation at platform.openai.com, Anthropic's tool use guide at docs.anthropic.com, Google's Gemini function calling documentation at ai.google.dev, and the Model Context Protocol specification at modelcontextprotocol.io, the underlying primitive is: define a tool with a JSON schema, expose it to the model, parse the model's tool call, execute it, return the result, loop until done.

What differs across the stack: how the tool definition is declared, how the tool call is parsed, how errors propagate, and whether tool servers are portable across model providers. Sources here include OpenAI's function calling docs, Anthropic's tool use guide, Google's Gemini function calling, the MCP spec at modelcontextprotocol.io, LangChain's tools documentation at python.langchain.com, LlamaIndex agents documentation at docs.llamaindex.ai, and the Pydantic AI agents framework documentation at ai.pydantic.dev.

3-layer tool use stack — choose by need

Feature
Best for
Lock-in
Ecosystem
Provider-native (OpenAI/Anthropic/Google APIs)Perf-critical, simple casesHigh (provider-specific)Provider docs only
Framework (LangChain/LlamaIndex/Pydantic AI)Multi-provider orchestrationMedium (framework migration cost)100s of integrations
MCP servers (Model Context Protocol)Portable + reusable toolsLow (open spec)Growing — 100+ servers + multiple host clients

API references: [OpenAI function calling at platform.openai.com](https://platform.openai.com/docs/guides/function-calling), [Anthropic tool use at docs.anthropic.com](https://docs.anthropic.com/en/docs/build-with-claude/tool-use), [Google Gemini function calling at ai.google.dev](https://ai.google.dev/gemini-api/docs/function-calling). Framework docs: [LangChain](https://python.langchain.com/docs/concepts/tools/), [LlamaIndex](https://docs.llamaindex.ai/), [Pydantic AI](https://ai.pydantic.dev/). MCP spec at [modelcontextprotocol.io](https://modelcontextprotocol.io/); servers at [github.com/modelcontextprotocol/servers](https://github.com/modelcontextprotocol/servers).

Layer 1 — Provider-native tool APIs (OpenAI, Anthropic, Google, etc.)

**The pattern:** Declare tools as JSON-schema descriptors. Pass them with the model call. Parse `tool_calls` from the response. Execute. Return tool results in the next turn. Loop until the model returns a final assistant message without tool calls.

**OpenAI's flavor:** Per OpenAI's function calling docs at platform.openai.com, tools are declared in `tools=[{type:'function', function:{name, description, parameters}}]`. Response has `message.tool_calls[]` with `id`, `function.name`, `function.arguments` (JSON string). Tool results go back as messages with `role:'tool'` and matching `tool_call_id`.

**Anthropic's flavor:** Per Anthropic's tool use guide at docs.anthropic.com, tools are declared in `tools=[{name, description, input_schema}]`. Response content array contains `tool_use` blocks with `id`, `name`, `input` (already-parsed object). Tool results go back as `tool_result` content blocks with matching `tool_use_id`.

**Google's flavor:** Per Google's Gemini function calling docs at ai.google.dev, tools are declared as `tools=[{functionDeclarations:[...]}]`. Response has `parts[].functionCall` with `name`, `args`. Tool results go back as `parts[].functionResponse`.

**The reality:** Schemas differ. Argument shapes differ. Naming conventions differ. The mental model is identical; the wire-level code is provider-specific. This is where the next two layers (framework abstractions + MCP) become important.


Layer 2 — Framework abstractions (LangChain, LlamaIndex, Pydantic AI)

**Goal:** Write tool definitions once, run against any provider. Frameworks translate between the unified tool definition and each provider's wire format.

**LangChain:** Per LangChain's tools documentation at python.langchain.com, define tools with the `@tool` decorator or `StructuredTool.from_function`. Pydantic-schema-based input validation. Tools are then bindable to any LangChain-compatible model. Strength: ecosystem breadth (hundreds of pre-built integrations). Weakness: abstraction overhead and version churn.

**LlamaIndex:** Per LlamaIndex agents documentation at docs.llamaindex.ai, `FunctionTool.from_defaults` wraps any Python function as a tool with auto-generated schema. Integrates with LlamaIndex's agent framework. Strength: tighter integration with retrieval workflows. Weakness: smaller ecosystem than LangChain.

**Pydantic AI:** Per Pydantic AI documentation at ai.pydantic.dev, Pydantic-first agent framework. Tools are typed Pydantic functions. Type safety guaranteed by the runtime. Strength: type-safety + clean Pydantic ergonomics. Weakness: newer, smaller ecosystem.

**The choice:** LangChain for breadth (you'll find an integration), Pydantic AI for type safety (you'll catch errors at runtime cleanly), LlamaIndex if your stack is already RAG-heavy.


Layer 3 — Model Context Protocol (MCP): portable tool servers

**The problem MCP solves:** Tools are written once but bound to a specific framework + specific provider. A tool that queries your PostgreSQL database for the OpenAI API isn't reusable when you switch to Anthropic, switch to a different framework, or expose the same tool to a desktop AI app.

**The pattern:** Per the Model Context Protocol specification at modelcontextprotocol.io, MCP servers expose tools, resources, and prompts via a standardized JSON-RPC protocol. MCP clients (LLM hosts) discover what's available and consume it uniformly. The server can be a local stdio process or a remote HTTP-streaming endpoint.

**Why it's becoming dominant:** The MCP servers reference repository at github.com/modelcontextprotocol/servers already has 100+ first-party and community MCP servers (Postgres, Slack, GitHub, Google Drive, filesystem, etc.). Building tools as MCP servers means they work in Claude Desktop, Cursor, Cline, and any other MCP-aware host without porting.

**Production caveats:** MCP is still maturing. Authentication patterns are evolving. Multi-user authorization is less mature than single-user (desktop client). For production HTTP-streaming MCP, the MCP official documentation on transports is essential reading.

**The strategic move:** Build tools as MCP servers by default. Wrap them in framework code (LangChain/LlamaIndex/Pydantic AI) when you need orchestration. Talk to provider APIs directly only for performance-critical paths or features that frameworks don't yet abstract.


Failure modes the production checklist must address

**Failure 1 — Bad tool arguments.** The model produces tool arguments that don't match the schema (missing required field, wrong type, nonsense value). Mitigation: schema validation before execution + structured error message back to the model: 'Argument X must be Y. Retry.' Per Anthropic's tool use guide, surfacing schema errors back to the model lets it self-correct ~70-90% of the time.

**Failure 2 — Tool execution errors.** API returned 500. Database timeout. Filesystem permission denied. Mitigation: catch + format the error as a tool result, not a runtime exception. Let the model decide whether to retry, switch tools, or report failure to the user.

**Failure 3 — Infinite tool loops.** Model keeps calling the same tool because its result doesn't satisfy the model's plan. Mitigation: per-loop iteration cap (typically 10-25). Plus monitoring for repeated identical tool calls — usually a sign of a malformed prompt or hallucinated tool-result interpretation. Per LangChain's agent execution docs, iteration caps are essential production hygiene.

**Failure 4 — Side-effect amplification.** Tool that posts to Slack is called 5 times because the model retries. Tool that charges a credit card is called twice because the first call's response was ambiguous. Mitigation: idempotency keys on side-effecting tools + clear 'this operation has already been performed' tool results when retries happen.

**Failure 5 — Prompt injection via tool results.** A tool returns user-supplied content that contains 'ignore previous instructions...'. Mitigation: per OWASP's LLM Top 10 at owasp.org/www-project-top-10-for-large-language-model-applications, treat tool results as untrusted data. Wrap in clear `<tool_result>` delimiters. Train the prompt to ignore instructions inside tool results.

Single-provider, single-framework tool wiring: Fast initial build. Locked into one provider. Tool re-implementation every time you switch frameworks. Provider outage = system down. No portability to desktop AI clients.
MCP-first + framework abstraction: More upfront setup (~1-2 days extra). Portable across providers + clients. Future-proof against framework churn. Composable with the growing 100+ MCP server ecosystem.

Architect a production tool-use system (4 steps)

  1. 1

    Inventory the tool surface area your system needs

    List the side-effecting tools (DB writes, API calls, filesystem) vs. read-only tools (queries, retrievals). Read-only tools are safer; side-effecting tools need idempotency keys + retry guards. Per OpenAI's function calling docs and Anthropic's tool use guide, the schema design happens here.

  2. 2

    Implement tools as MCP servers where portability matters

    Per the Model Context Protocol spec at modelcontextprotocol.io and the MCP servers repo at github.com/modelcontextprotocol/servers, MCP-first design future-proofs against framework + provider churn. For internal-only tools where portability doesn't matter, framework-native (LangChain/LlamaIndex/Pydantic AI) is fine.

  3. 3

    Wire framework abstraction for orchestration

    Choose LangChain for breadth, Pydantic AI for type safety, LlamaIndex for RAG-heavy stacks. The framework handles loop control, error formatting, and provider translation.

    → Open the Code Prompt Builder
  4. 4

    Add the 5-failure-mode checklist

    Schema-validate tool args. Catch + format execution errors. Cap loop iterations (10-25). Idempotency keys on side-effecting tools. Treat tool results as untrusted per OWASP LLM Top 10 at owasp.org. Test each failure mode in staging before production.

Where to start the tool-use architecture

If you're shipping a new LLM-powered product in 2026: MCP-first design even for internal tools. Per the MCP spec at modelcontextprotocol.io, tool portability is becoming the default expectation. Future-proofs against provider + framework churn.

If you're already on a single-provider tool stack: Inventory which tools have portability value (DB queries, API integrations, filesystem) vs. which are internal-only (prompt utilities). Migrate the portable ones to MCP first. Reference: github.com/modelcontextprotocol/servers for existing implementations.

If you're choosing between LangChain, LlamaIndex, Pydantic AI: LangChain for ecosystem breadth, Pydantic AI for type-safe Pydantic-first stacks, LlamaIndex for RAG-heavy. The framework matters less than getting the failure-mode checklist right.

If you've hit a tool-execution failure in production: Audit against the 5 failure modes — bad args, execution errors, infinite loops, side-effect amplification, prompt injection via tool results. Per OWASP LLM Top 10 at owasp.org, prompt injection via tool results is the most-underestimated. The Code Prompt Builder helps design prompts robust to injection.

Frequently Asked Questions

What's the difference between tool use and function calling?

They're the same concept; different vendors use different names. Per OpenAI's function calling docs at platform.openai.com, OpenAI calls it 'function calling'. Per Anthropic's tool use guide at docs.anthropic.com, Anthropic calls it 'tool use'. Per Google's Gemini function calling docs at ai.google.dev, Google uses 'function calling'. Underlying primitive (JSON-schema-defined tool + model invokes + execute + return result + loop) is identical.

What is the Model Context Protocol (MCP)?

Per the MCP specification at modelcontextprotocol.io, MCP is an open standard for connecting AI models to data sources and tools via a standardized JSON-RPC protocol. Tools written as MCP servers work across any MCP-aware host (Claude Desktop, Cursor, Cline, etc.) without re-implementation. The reference servers repository at github.com/modelcontextprotocol/servers has 100+ implementations (Postgres, Slack, GitHub, Google Drive, etc.).

Should I use a framework like LangChain or just call the API directly?

Both have trade-offs. Direct API calls (per OpenAI / Anthropic docs) give maximum performance + zero framework risk. Frameworks (LangChain, LlamaIndex, Pydantic AI) give multi-provider portability + ecosystem of pre-built integrations + standard loop control. Most production systems benefit from framework abstractions for orchestration.

What are the most common production failure modes for tool use?

Five recurring ones: (1) bad tool arguments not matching schema, (2) tool execution errors (API down, timeout), (3) infinite tool loops, (4) side-effect amplification on retry, (5) prompt injection via tool results — see OWASP LLM Top 10 at owasp.org for the injection threat model. Each has a specific mitigation; together they form the production tool-use hygiene checklist.

How do I handle prompt injection via tool results?

Per OWASP's LLM Top 10 at owasp.org, tool results that include user-generated content or data from external systems should be treated as untrusted. Wrap tool results in clear `<tool_result>` delimiters. Use system prompts that explicitly instruct the model to ignore instructions inside tool results. Test with adversarial inputs ('ignore previous instructions and...') before production.

Do MCP servers replace LangChain/LlamaIndex/Pydantic AI?

No — they're complementary. MCP servers expose tools in a portable way. Frameworks (LangChain, LlamaIndex, Pydantic AI) handle agent orchestration, loop control, error handling, and multi-provider translation. Many production stacks use both: tools as MCP servers + framework for agent orchestration.

Architect production-grade LLM tool integration.

The Code Prompt Builder helps design tool descriptions, system prompts, and failure-mode-resilient prompt structure. Free, no signup. Part of 40+ free prompt tools.

Browse all prompt tools →