Designing MCP tools that LLMs actually use correctly

Most MCP tools we see are technically correct and practically useless. LLMs ignore them, misuse them, or call them in unhelpful ways. The principles for designing tools LLMs adopt naturally, with examples of common failures and their fixes.

Advanced12 min read

You've built an MCP server. The tools work. You hook up an LLM agent. Watching it operate, you notice: it ignores your tools, it calls them with weird parameters, it gets confused about which tool to use, it chains them in strange sequences.

This is the gap between "tools that exist" and "tools that LLMs use correctly." It's where most MCP server effort gets wasted. Companies build powerful capabilities, expose them as tools, and watch LLMs fail to use them effectively.

The reframe that helps: treat tool design as UX design where the LLM is your user. The tool description is the UI. The schema is the form. The error messages are the feedback. Get these right and LLMs work effectively. Get them wrong and your sophisticated backend is invisible to the agent.

This article covers the principles, with concrete examples of what works and what doesn't.

Principle 1: Tool names communicate intent

The tool name is the first thing the LLM sees. It should describe what the tool does, in action terms.

Bad:

customers (noun, no action)
process_customer (vague)
do_x (meaningless)

Better:

search_customers (clear action)
get_customer_by_id (specific operation)
update_customer_email (specific change)

Why it matters: LLMs scan tool lists looking for relevant ones. A descriptive name lets them find the right tool quickly. A vague name forces them to read the description carefully (which they sometimes don't).

A useful pattern: standard verb prefixes.

list_*, search_*, get_* for reads.
create_*, update_*, delete_* for writes.
analyze_*, summarize_* for computation.

Consistency across your server helps the LLM build mental models.

Principle 2: Descriptions are prompts

The tool's description is the most important text in your server. It's what the LLM uses to decide whether and how to use the tool.

Bad description:

search_customers: Search the customer database.

Better description:

search_customers: Find customers by name, email, or company. Returns up to 10 matching customers with their basic info. Use this when you need to identify a customer the user is referring to. For exact lookups by ID, use get_customer_by_id instead.

Notice what the better version does:

Describes inputs ("by name, email, or company").
Describes outputs ("up to 10 matching customers with their basic info").
Indicates when to use ("when you need to identify a customer the user is referring to").
Indicates when not to use ("For exact lookups by ID, use get_customer_by_id instead").

The "when not to use" part is critical. Without it, the LLM might call search_customers when get_customer_by_id would be more appropriate.

Principle 3: Parameter descriptions matter

Every parameter needs a description. Don't rely on the parameter name alone.

Bad:

{
  customer_id: string,
  fields: string[]
}

Better:

{
  customer_id: string,  // "The customer's unique identifier. Get this from search_customers or from explicit user input."
  fields: string[]      // "Specific fields to return. Available: name, email, phone, tier, created_at, last_active. If not specified, returns name and email."
}

The descriptions:

Tell the LLM how to obtain the value.
Specify allowed values where applicable.
Indicate defaults.

Principle 4: Errors guide recovery

When a tool errors, the error message guides the LLM's next action. Vague errors lead to confused agents.

Bad error:

{ "error": "Invalid input" }

Better error:

{
  "error": "validation_error",
  "message": "The email '...' is not in valid format. It must be like 'name@example.com'.",
  "field": "email",
  "suggestion": "Ask the user for a valid email address."
}

The LLM now knows:

What went wrong (validation error on email field).
How to fix it (use a valid email format).
What to do next (ask the user).

Compare an agent's behavior with the two errors. The first might retry the same call (wasted), give up (bad UX), or hallucinate a valid input. The second leads to a clean user interaction.

Principle 5: Output shapes the next action

The tool's output determines what the LLM does next. Output design influences agent behavior.

Bad output for a search:

[
  {"id": "c1", "n": "John", "e": "john@..."},
  {"id": "c2", "n": "Jane", "e": "jane@..."}
]

Better output:

{
  "customers": [
    {"id": "c1", "name": "John Smith", "email": "john@example.com", "tier": "pro"},
    {"id": "c2", "name": "Jane Doe", "email": "jane@example.com", "tier": "free"}
  ],
  "total_found": 2,
  "summary": "Found 2 customers matching 'john'. Note that one is named 'Jane Doe' but has 'john' in their email."
}

The better output:

Uses readable field names.
Includes meta-context (total_found).
Includes a natural-language summary that helps the LLM frame what to say next.

The summary field is powerful — it's like giving the LLM a "by the way" hint about how to interpret the result.

Principle 6: One tool, one thing

Tools that do multiple things confuse LLMs. The LLM has to decide both whether to use the tool AND what mode to use.

Confusing:

manage_customer:
  - mode: "search" | "get" | "update" | "delete"
  - params: depends on mode

The LLM has to pick the mode. They often pick wrong. Worse, the parameter schema is complex because it varies by mode.

Better: separate tools.

search_customers: search by name/email/company
get_customer: get details by ID
update_customer: update specific fields
delete_customer: archive a customer

Each tool is unambiguous. The LLM picks one based on intent. The schemas are simple.

This means more tools, but each is clearer. The LLM handles 10 clear tools better than 3 multi-mode tools.

Principle 7: Constrain inputs

Where possible, restrict input options. Enums and validation prevent LLM hallucinations.

Loose:

{
  status: string  // could be anything
}

Constrained:

{
  status: "active" | "trial" | "churned" | "suspended"
}

The constraint is enforced at the schema level (constrained generation prevents the LLM from producing invalid values).

The same applies to enums of operations, severities, types — anything with a known set of valid values.

For dates, use ISO 8601 format and specify it in the description ("Date in ISO 8601 format, e.g., 2026-05-15"). Without this, LLMs produce dates in random formats.

Principle 8: Default values reduce hallucination

When parameters have sensible defaults, make them optional with the default applied server-side.

Bad:

{
  query: string,
  limit: number,  // LLM has to provide some value
  include_archived: boolean,
  sort_by: string
}

The LLM has to pick values for all of these. They might be wrong.

Better:

{
  query: string,
  limit: number = 10,            // sensible default
  include_archived: boolean = false,  // safe default
  sort_by: "relevance" | "name" | "created_at" = "relevance"  // most common
}

The LLM only specifies parameters that matter for the specific query. Fewer parameters means less room for confusion.

Document defaults in the description: "Limit: number of results to return. Default 10, max 50."

Principle 9: Composability matters

Tools should compose into workflows the LLM can construct. The right granularity makes complex tasks easy.

Consider a task: "Tell me about all the open issues for our top 3 customers."

Bad tool set:

get_customer_summary(customer_id): returns customer + tickets + activity all in one

The LLM can't easily do the "top 3" filter — this tool returns everything for one customer at a time. To do the task, the LLM needs to know who the top customers are first, then call this tool 3 times.

Better tool set:

list_customers(sort_by="value", limit=N): returns customer summaries with priority info
list_tickets(customer_id, status): returns tickets for a customer

The LLM can compose: list top customers, then for each, list open tickets. The composition is natural.

The principle: think about the multi-tool workflows. Tools that compose well are usable; tools that don't, often aren't.

Principle 10: Idempotency is communicated

For write tools, mention idempotency requirements in the description:

create_invoice: Create a new invoice for a customer.
IMPORTANT: Pass an idempotency_key (a UUID you generate). If you retry this operation, use the same UUID to prevent duplicate invoices.

Parameters:
- amount: ...
- customer_id: ...
- idempotency_key: UUID to prevent duplicate creation on retry. Generate once per logical operation.

Now the LLM knows to generate a UUID and use the same one if retrying.

Without this guidance, the LLM might either skip the key (no idempotency) or generate a new UUID per retry (defeats the purpose).

Principle 11: Mention pre/post conditions

For tools with preconditions or important side effects, say so:

delete_customer: Archive a customer record. This is reversible within 30 days; after 30 days, the data is permanently deleted.

PRECONDITIONS:
- Customer must have no active subscriptions.
- Customer must have no open tickets.

If preconditions are not met, this tool returns an error indicating what to resolve first.

SIDE EFFECTS:
- All customer's contacts are also archived.
- Customer is removed from active reports.
- An audit log entry is created.

The LLM now knows what to check before calling, and what to expect after. It can plan multi-step workflows correctly ("first close their tickets, then delete").

Principle 12: When in doubt, examples

For complex tools, including an example in the description helps:

analyze_funnel: Analyze a conversion funnel from event data.

Parameters:
- start_date: ISO 8601 date
- end_date: ISO 8601 date  
- steps: array of step definitions, each {event_name: string, filters?: object}

Example:
{
  "start_date": "2026-01-01",
  "end_date": "2026-01-31",
  "steps": [
    {"event_name": "signup"},
    {"event_name": "first_login"},
    {"event_name": "first_action", "filters": {"action_type": "create_project"}},
    {"event_name": "subscription_started"}
  ]
}

Examples teach the LLM the structure better than schemas alone.

Principle 13: Don't expose internals

The LLM doesn't need to know your database structure or internal IDs. Surface a clean conceptual model.

Bad:

get_user_by_pk(pk: number)

The LLM has to know to use the "primary key" — a database concept.

Better:

get_user(user_id: string)

Hide the database concept. The LLM uses a user_id, which is a meaningful concept.

Similarly: don't expose deprecated fields, internal flags, debug parameters, or anything else that's about your implementation rather than the user-facing concept.

Principle 14: Avoid magic strings

Some tools require strings that look like commands or codes. These are error-prone.

Bad:

modify_record(record_id: string, change_string: string)
// where change_string is like "field1=value1;field2=value2"

The LLM has to encode changes in a specific string format. They'll make mistakes.

Better:

update_record(record_id: string, updates: { field1?: any; field2?: any; ... })

Structured updates as an object. The LLM can use any field directly.

Principle 15: Test with real LLMs

Tool descriptions read well to humans but might confuse LLMs. The only way to know is to test.

A useful workflow:

Build the tool.
Have an LLM agent attempt several realistic tasks using only your tools.
Observe failures.
Adjust descriptions based on failures.
Repeat.

The patterns you'll find:

LLM uses wrong tool → tool name or description is unclear.
LLM passes wrong parameter values → parameter description or schema needs work.
LLM gives up after errors → error messages need improvement.
LLM doesn't try a tool that would help → tool isn't surfaced or named well.

Each issue suggests a specific fix.

A diagnostic: signs your tools aren't well-designed

A few patterns indicating tool design issues:

The LLM frequently uses the wrong tool. You'll see it call search_customers when it should have called get_customer_by_id. Fix: clarify which tool is for which situation.

The LLM calls many tools to do one thing. It chains 5 tool calls to accomplish what should be 1. Fix: maybe you need a higher-level composite tool, or the granularity is too fine.

The LLM gives up after errors. It tries once, gets an error, then tells the user it can't help. Fix: better error messages that suggest next steps.

The LLM hallucinates parameter values. It invents user_ids, dates, IDs. Fix: clarify how to obtain valid values; add constraints; add error handling that catches and explains the issue.

The LLM repeats the same failed call. Same error, repeated. Fix: the error message isn't telling the LLM what's wrong specifically.

The LLM doesn't use a powerful tool. You built a great tool; the LLM never invokes it. Fix: improve discovery (clearer name, better description, "use this when..." guidance).

Tool taxonomy

A useful exercise: organize tools into a taxonomy.

Read tools (safe, idempotent):
- search_customers
- get_customer_by_id
- list_tickets
- list_orders

Compute tools (no state changes):
- summarize_account_activity
- analyze_funnel
- calculate_lifetime_value

Write tools (state changes, need idempotency):
- create_customer
- update_customer_email
- create_ticket
- send_email

Destructive tools (require careful authorization):
- delete_customer
- cancel_subscription
- archive_record

The taxonomy helps you:

Apply appropriate guardrails (idempotency, confirmation for destructive).
Document the categories in system prompts to the LLM.
Catch missing tools (if a category is empty, do you need one?).

A useful system prompt addition:

Tool categories available:
- READ tools (safe to call): search_customers, get_customer_by_id, ...
- COMPUTE tools (no side effects): summarize_account_activity, ...
- WRITE tools (side effects, include idempotency_key): create_customer, ...
- DESTRUCTIVE tools (require human confirmation): delete_customer, ...

Before calling a WRITE or DESTRUCTIVE tool, confirm with the user.

This shapes how the LLM uses the tools at the workflow level, not just per-call.

Examples of common improvements

To make the principles concrete, before-and-after examples:

Example 1: A search tool

Before:

// search documents
{
  name: "documents",
  description: "Search documents",
  inputSchema: { query: "string" }
}

After:

{
  name: "search_documents",
  description: `Search internal documents (knowledge base, wiki pages, policies). 
  Returns matching documents with title, excerpt, and link. Use when the user asks about company policies, procedures, or internal documentation. Returns up to 10 most relevant matches by semantic similarity.`,
  inputSchema: {
    query: {
      type: "string",
      description: "Search query. Be specific. Good: 'remote work policy 2026'. Bad: 'documents about work'."
    },
    document_type: {
      type: "string",
      enum: ["policy", "procedure", "guide", "faq", "any"],
      default: "any",
      description: "Filter to a specific type of document."
    },
    limit: {
      type: "number",
      default: 5,
      maximum: 10,
      description: "Number of results."
    }
  }
}

Example 2: An action tool

Before:

{
  name: "send_email",
  description: "Send an email",
  inputSchema: {
    to: "string",
    subject: "string",
    body: "string"
  }
}

After:

{
  name: "draft_email_to_customer",
  description: `Draft an email to a customer based on a recent interaction. The email is saved as a draft for human review before sending — it is NOT sent automatically. The user must approve drafts in their inbox.

  Use when:
  - You've identified an action requiring follow-up with the customer.
  - You have a specific reason and content for the email.
  
  Do NOT use:
  - To send marketing or promotional content.
  - Without explicit user request.
  - To respond to refund or cancellation requests (escalate to human instead).`,
  inputSchema: {
    customer_id: {
      type: "string",
      description: "Customer ID from search_customers or get_customer."
    },
    subject: {
      type: "string",
      description: "Email subject, 4-8 words, specific. Avoid generic subjects like 'Following up'."
    },
    body: {
      type: "string",
      description: "Email body, plain text. 3-5 sentences. Personal, specific, not template-y."
    },
    tone: {
      type: "string",
      enum: ["professional", "friendly", "apologetic", "urgent"],
      default: "professional",
      description: "Tone of the email."
    },
    idempotency_key: {
      type: "string",
      description: "UUID for this draft. Use the same UUID if retrying to avoid duplicates."
    }
  }
}

The "After" versions guide the LLM far more effectively. They feel verbose; they're worth it.

The takeaway

Tool design for LLMs is its own discipline. The principles aren't intuitive; they require thinking about the LLM as your user and designing the interface accordingly.

The patterns that matter:

Action-verb names.
Rich descriptions explaining what, when, and not-when.
Per-parameter descriptions with examples and constraints.
Structured, actionable error messages.
Output shapes that guide the next action.
One tool per concept.
Sensible defaults.
Composable granularity.
Idempotency explicit.
Pre/post conditions documented.
Examples for complex tools.
Hidden internals.
Real-LLM testing.

Most MCP servers fail not because the protocol is hard but because the tools weren't designed with the LLM in mind. Get the tool design right and your server becomes effective; get it wrong and your sophisticated backend is wasted.

Treat the LLM as the user. Design accordingly. The investment pays off many times over in how well your tools actually get used.