Tutorial 9: MCP Servers¶
The Model Context Protocol (MCP) lets a
node call tools that live in a separate process rather than as in-process
Python functions. KeGAL supports stdio (subprocess) and sse (HTTP/SSE)
transports. MCP servers are started automatically at Compiler construction
and shut down cleanly on close().
1. Basic: stdio server¶
Step 1 — Write the server¶
Any MCP-compatible server works. A minimal example using
fastmcp:
# my_server.py
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("my-tools")
@mcp.tool()
def greet(name: str) -> str:
"""Return a greeting for the given name."""
return f"Hello, {name}!"
@mcp.tool()
def add(a: int, b: int) -> int:
"""Add two integers."""
return a + b
if __name__ == "__main__":
mcp.run(transport="stdio")
Step 2 — Configure the graph¶
models:
- llm: "ollama"
model: "qwen2.5:7b"
host: "http://localhost:11434"
mcp_servers:
- id: "my_tools" # arbitrary identifier
transport: "stdio"
command: "python"
args: ["my_server.py"]
prompts:
- template:
system_template:
role: |
You are a helpful assistant. Use the available tools to answer.
prompt_template:
question: "{user_message}"
nodes:
- id: "tool_node"
model: 0
temperature: 0.0
max_tokens: 256
show: true
mcp_servers: ["my_tools"] # reference by server id
prompt:
template: 0
user_message: true
edges:
- node: "tool_node"
Step 3 — Run¶
from kegal import Compiler
with Compiler(uri="mcp_graph.yml") as compiler:
compiler.user_message = "What is the greeting for Alice, and what is 7 + 5?"
compiler.compile()
for msg in compiler.get_outputs().nodes[0].response.messages:
print(msg)
The with statement ensures the MCP subprocess is terminated when done.
2. Intermediate: SSE transport¶
For a remote MCP server (e.g. a running HTTP service), use sse transport
and provide a url instead of command/args:
No command or args are needed — the compiler connects to the running
service at construction time.
3. Intermediate: multiple servers on one node¶
A node can access tools from multiple MCP servers simultaneously:
mcp_servers:
- id: "db_server"
transport: "stdio"
command: "python"
args: ["db_server.py"]
- id: "web_server"
transport: "stdio"
command: "python"
args: ["web_server.py"]
nodes:
- id: "research_node"
model: 0
temperature: 0.2
max_tokens: 512
show: true
mcp_servers: ["db_server", "web_server"] # tools from both servers
prompt:
template: 0
user_message: true
Different nodes can access different server subsets:
nodes:
- id: "db_agent"
mcp_servers: ["db_server"] # database only
- id: "web_agent"
mcp_servers: ["web_server"] # web only
- id: "full_agent"
mcp_servers: ["db_server", "web_server"] # both
4. Intermediate: chaining MCP output¶
A common pattern is to have one node query a tool and pass its output to a second node for analysis. The MCP tool loop runs entirely inside the first node — only the final text response is forwarded via message passing.
flowchart LR
Q["query_node\nMCP + output: true"] -->|tool result| A["analyst\ninput: true"]
models:
- llm: "ollama"
model: "qwen2.5:7b"
host: "http://localhost:11434"
mcp_servers:
- id: "sqlite"
transport: "stdio"
command: "python"
args: ["sqlite_server.py", "--db", "data.db"]
prompts:
- template: # 0 — query node
system_template:
role: |
You have access to a SQLite database. Use the query tool to
retrieve the requested data. Summarise the result concisely.
prompt_template:
request: "{user_message}"
- template: # 1 — analyst
system_template:
role: |
You receive a database query result. Interpret it and provide
actionable insights.
prompt_template:
data: "{message_passing}"
nodes:
- id: "query_node"
model: 0
temperature: 0.0
max_tokens: 512
show: false
mcp_servers: ["sqlite"]
message_passing: { output: true }
prompt: { template: 0, user_message: true }
- id: "analyst"
model: 0
temperature: 0.5
max_tokens: 512
show: true
message_passing: { input: true }
prompt: { template: 1 }
edges:
- node: "query_node"
- node: "analyst"
with Compiler(uri="mcp_chain.yml") as compiler:
compiler.user_message = "Show me total sales by product category for Q3."
compiler.compile()
5. Advanced: MCP server inside a ReAct agent¶
In a ReAct graph, MCP servers must be attached to agent nodes, not the controller. The controller dispatches to agents; agents call the MCP tools.
mcp_servers:
- id: "sqlite"
transport: "stdio"
command: "python"
args: ["sqlite_server.py"]
nodes:
- id: "controller"
model: 0
temperature: 0.0
max_tokens: 512
show: true
prompt:
template: 0
user_message: true
react:
max_iterations: 6
react_output:
type: object
properties:
next_agent: { type: string }
agent_input: { type: string }
done: { type: boolean }
final_answer: { type: string }
reasoning: { type: string }
required: [done]
- id: "db_agent"
model: 0
temperature: 0.0
max_tokens: 512
show: false # react agents: show=true is ignored — use message_passing.output
mcp_servers: ["sqlite"] # ← MCP on the agent, not the controller
message_passing: { input: true, output: true }
prompt: { template: 1 }
edges:
- node: "controller"
react:
- node: "db_agent"
The MCP tool loop runs inside db_agent's isolated execution context — tool
calls are made, results are injected into the agent's conversation, and only
the final answer is returned to the controller as an observation.
See Tutorial 12: ReAct loop for the full ReAct pattern.
6. Advanced: combining MCP with Python tool executors¶
A node can have both mcp_servers and in-process tools at the same time.
The model sees all available tools from both sources and can call any of them.
tools:
- name: "format_currency"
description: "Format a number as a currency string."
parameters:
amount: { type: "number" }
currency: { type: "string" }
required: ["amount", "currency"]
mcp_servers:
- id: "exchange_rates"
transport: "stdio"
command: "python"
args: ["exchange_server.py"]
nodes:
- id: "fx_node"
model: 0
temperature: 0.0
max_tokens: 256
show: true
tools: ["format_currency"] # in-process Python function
mcp_servers: ["exchange_rates"] # out-of-process MCP server
prompt:
template: 0
user_message: true
def format_currency(amount: float, currency: str) -> str:
symbols = {"USD": "$", "EUR": "€", "GBP": "£"}
sym = symbols.get(currency, currency)
return f"{sym}{amount:,.2f}"
with Compiler(
uri="fx.yml",
tool_executors={"format_currency": format_currency},
) as compiler:
compiler.user_message = "What is €500 in USD, formatted?"
compiler.compile()
7. Intermediate: filtering tools per node¶
By default, a node sees all tools from every MCP server assigned to it. When a server exposes many tools but a node only needs a few, you can whitelist the tools it is allowed to call. The LLM never sees the hidden tools.
Use the object form of mcp_servers with an optional tools list:
mcp_servers:
- id: "file_tools"
transport: "stdio"
command: "python"
args: ["file_server.py"] # exposes: list_directory, read_text_file,
# read_pdf_file, write_text_file
nodes:
# reader — can only read text files
- id: "reader"
model: 0
temperature: 0.0
max_tokens: 2000
show: true
mcp_servers:
- id: file_tools
tools: [read_text_file]
prompt: { template: 0 }
# writer — can read and write, but not list or read PDFs
- id: "writer"
model: 0
temperature: 0.0
max_tokens: 4000
show: true
mcp_servers:
- id: file_tools
tools: [read_text_file, write_text_file]
prompt: { template: 1 }
The shorthand string form (mcp_servers: [file_tools]) remains valid and exposes
all tools — it is automatically converted to {id: file_tools, tools: null} internally.
Both forms can be mixed in the same list if a node uses multiple servers:
mcp_servers:
- id: file_tools
tools: [read_text_file, write_text_file] # filtered
- id: db_tools # all tools exposed (shorthand equivalent)
Why filter? Local LLMs can get confused when presented with many tools. Hiding irrelevant tools reduces hallucinations and keeps the model focused on the task at hand.
8. Intermediate: controlling the tool call loop limit¶
The tool loop inside each node calls the LLM repeatedly until it stops generating tool calls or a limit is reached. The default limit is 10 iterations (one iteration = one LLM call, which may produce several tool calls).
For nodes that need to read many files or perform many sequential operations,
raise the limit with max_tool_calls:
nodes:
- id: "analyst"
model: 0
temperature: 0.1
max_tokens: 8000
show: true
mcp_servers:
- id: file_tools
tools: [read_text_file, write_text_file]
max_tool_calls: 25 # allow up to 25 LLM-call iterations
prompt: { template: 0 }
max_tool_calls is a per-node setting — different nodes in the same graph can
have different limits. Nodes without mcp_servers or tools ignore it.
Key points¶
- MCP servers are started at
Compilerconstruction and stopped atclose(). Always usewith Compiler(...) as compiler:to ensure clean shutdown. stdioservers:commandandargsare required;urlmust be absent.sseservers:urlis required;commandandargsmust be absent.- A node references servers by ID — the ID must appear in the top-level
mcp_servers:list or_validate_indices()raisesValueError. - Use the object form (
{id, tools}) to whitelist which tools a node can call. The shorthand string form remains valid and exposes all tools. max_tool_callscontrols how many LLM-call iterations the tool loop runs. Default is 10. Increase for nodes that read many files or call many tools.- MCP servers must not be attached to ReAct controller nodes — only to agent nodes.
- Tool calls have a default timeout of 60 s per call; stalled tool calls
raise
TimeoutErrorrather than blocking indefinitely.
Related tutorials: 08 Tool executors — in-process Python tool functions
12 ReAct loop — MCP inside iterative agent dispatch