Skip to content

Tutorial 11: Multi-Provider Graphs

Different nodes in the same graph can use different LLM providers and models. Declare every model you need in the top-level models: list and reference them by index on each node. This lets you use the right model for each task — a fast, cheap model for classification and a powerful one for generation.


1. Basic: two providers in one graph

models:
  - llm: "ollama"
    model: "qwen2.5:7b"
    host: "http://localhost:11434"
  - llm: "anthropic"
    model: "claude-sonnet-4-6"
    api_key: "sk-ant-..."

prompts:
  - template:  # 0 — fast pre-check (Ollama)
      system_template:
        role: |
          Classify this support ticket as one of:
          billing, technical, general.
          Return one word only.
      prompt_template:
        message: "{user_message}"

  - template:  # 1 — detailed analysis (Anthropic)
      system_template:
        role: |
          You are a senior support engineer. Write a detailed,
          actionable response to this support ticket.
      prompt_template:
        ticket: "{user_message}"

nodes:
  - id: "classifier"
    model: 0             # Ollama — fast and cheap
    temperature: 0.0
    max_tokens: 16
    show: false
    message_passing: { output: true }
    prompt: { template: 0, user_message: true }

  - id: "responder"
    model: 1             # Anthropic — better quality for the customer-facing reply
    temperature: 0.4
    max_tokens: 512
    show: true
    message_passing: { input: true }
    prompt: { template: 1, user_message: true }

edges:
  - node: "classifier"
  - node: "responder"
from kegal import Compiler

with Compiler(uri="multi_provider.yml") as compiler:
    compiler.user_message = "My invoice shows the wrong amount."
    compiler.compile()

2. Intermediate: fast guard + powerful analysis

Use a lightweight model to validate the input (a task that requires very little reasoning) and reserve the powerful model for the main response.

models:
  - llm: "ollama"
    model: "qwen2.5:7b"      # small, fast
    host: "http://localhost:11434"
  - llm: "anthropic"
    model: "claude-opus-4-7"  # large, capable
    api_key: "sk-ant-..."

nodes:
  - id: "input_guard"
    model: 0                   # small model is sufficient for boolean classification
    temperature: 0.0
    max_tokens: 64
    show: false
    prompt:
      template: 0
      user_message: true
    structured_output:
      description: "Input safety check"
      parameters:
        validation:
          type: "boolean"
      required: ["validation"]

  - id: "main_analyst"
    model: 1                   # large model for the substantive response
    temperature: 0.5
    max_tokens: 2048
    show: true
    prompt:
      template: 1
      user_message: true
      retrieved_chunks: true

edges:
  - node: "input_guard"
  - node: "main_analyst"

3. Intermediate: parallel specialists on different providers

Fan out to multiple specialist nodes, each using the best model for its task.

models:
  - llm: "ollama"
    model: "qwen2.5:7b"
    host: "http://localhost:11434"
  - llm: "ollama"
    model: "qwen2.5-vl:7b"        # vision-capable for image analysis
    host: "http://localhost:11434"
  - llm: "anthropic"
    model: "claude-sonnet-4-6"    # strongest for the final synthesis
    api_key: "sk-ant-..."

images:
  - uri: "./assets/chart.png"

nodes:
  - id: "text_analyst"
    model: 0                       # text-only model for reading the report
    ...
    prompt: { template: 0, user_message: true, retrieved_chunks: true }

  - id: "image_analyst"
    model: 1                       # vision model for the chart
    ...
    images: [0]
    prompt: { template: 1 }

  - id: "synthesizer"
    model: 2                       # most capable for final synthesis
    ...
    message_passing: { input: true }
    prompt: { template: 2 }

edges:
  - node: "synthesizer"
    fan_in:
      - node: "text_analyst"
      - node: "image_analyst"

4. Advanced: mixing cloud and local models

Combine Anthropic or OpenAI cloud models with locally-run Ollama models to balance capability, latency, and cost.

models:
  - llm: "ollama"
    model: "qwen2.5:7b"
    host: "http://localhost:11434"
    # local — no API key, no network latency, free

  - llm: "openai"
    model: "gpt-4o-mini"
    api_key: "sk-..."
    # cloud — small but capable, low cost

  - llm: "anthropic"
    model: "claude-sonnet-4-6"
    api_key: "sk-ant-..."
    # cloud — strongest reasoning

nodes:
  - id: "pre_filter"
    model: 0       # local Ollama — zero cost for simple filtering
    ...

  - id: "extractor"
    model: 1       # OpenAI mini — structured extraction, low cost
    ...

  - id: "reasoning"
    model: 2       # Anthropic — complex reasoning step
    ...

5. Provider reference

Provider key Required fields Install extra Notes
ollama host kegal[ollama] Local or remote Ollama instance. No API key.
anthropic api_key kegal[anthropic] Anthropic cloud API.
anthropic_aws api_key kegal[aws] Anthropic via AWS API Gateway.
openai api_key kegal[openai] OpenAI cloud API.
bedrock aws_region_name, aws_access_key, aws_secret_key kegal[aws] AWS Bedrock.
gemini api_key kegal[gemini] Google Gemini cloud API.
# Ollama
models:
  - llm: "ollama"
    model: "qwen2.5:7b"
    host: "http://localhost:11434"
    context_window: 32768

# Anthropic
models:
  - llm: "anthropic"
    model: "claude-sonnet-4-6"
    api_key: "${ANTHROPIC_API_KEY}"

# OpenAI
models:
  - llm: "openai"
    model: "gpt-4o"
    api_key: "${OPENAI_API_KEY}"

# Google Gemini
models:
  - llm: "gemini"
    model: "gemini-2.0-flash"
    api_key: "${GEMINI_API_KEY}"

# AWS Bedrock
models:
  - llm: "bedrock"
    model: "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0"
    aws_region_name: "${AWS_REGION}"
    aws_access_key: "${AWS_ACCESS_KEY_ID}"
    aws_secret_key: "${AWS_SECRET_ACCESS_KEY}"

6. Keeping secrets out of YAML — environment variables

All string fields in a model config support ${ENV_VAR} substitution. KeGAL replaces every ${NAME} pattern with os.environ["NAME"] before parsing the YAML. If the variable is not set, a ValueError is raised immediately (before the graph starts), with a clear message identifying the missing variable.

models:
  - llm: "anthropic"
    model: "claude-sonnet-4-6"
    api_key: "${ANTHROPIC_API_KEY}"

  - llm: "gemini"
    model: "gemini-2.0-flash"
    api_key: "${GEMINI_API_KEY}"

  - llm: "bedrock"
    model: "..."
    aws_region_name: "${AWS_REGION}"
    aws_access_key: "${AWS_ACCESS_KEY_ID}"
    aws_secret_key: "${AWS_SECRET_ACCESS_KEY}"

Setting variables on Linux / macOS:

# Temporary (current terminal session)
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AIza..."

# Persistent — add to ~/.bashrc or ~/.zshrc
echo 'export ANTHROPIC_API_KEY="sk-ant-..."' >> ~/.bashrc

# Conda-env scoped — survives across sessions, isolated to the env
conda env config vars set GEMINI_API_KEY="AIza..." -n myenv
conda activate myenv   # reactivate to pick it up

7. Advanced: context_window per model

Declare context_window on any model where you need accurate ReAct compaction or utilization reporting. The value is specific to each model instance in the models: list.

models:
  - llm: "ollama"
    model: "qwen2.5:7b"
    host: "http://localhost:11434"
    context_window: 32768     # 32 K tokens

  - llm: "anthropic"
    model: "claude-sonnet-4-6"
    api_key: "sk-ant-..."
    context_window: 200000    # 200 K tokens

See Tutorial 13: Context window for how this is used.


Key points

  • Each entry in models: is a fully independent LLM configuration. The same provider can appear multiple times with different models or hosts.
  • Nodes reference models by index — the first model in the list is 0.
  • All models are instantiated at Compiler construction. If a provider fails to connect, the error surfaces before the first compile() call.
  • context_window is optional but enables accurate ReAct compaction and per-node utilization display.
  • Use ${ENV_VAR} in any string field to read the value from os.environ. An unset variable raises ValueError before the graph starts.
  • Each provider requires its own pip extra: kegal[anthropic], kegal[openai], kegal[gemini], kegal[ollama], kegal[aws]. Install kegal[all] for all.

Related tutorials: 13 Context window — tracking token usage per model
03 Guard nodes — using a fast model as a guard before a powerful one
12 ReAct loop — controller and agents can use different models