Ollama
Verified Code examples on this page have been automatically tested and verified.Configure Ollama to serve local models through agentgateway. Ollama runs on a machine outside your cluster, and agentgateway routes requests to it over the network.
Before you begin
- Install and set up an agentgateway proxy.
Install and run Ollama on a machine accessible from your Kubernetes cluster.
Get the IP address of the machine running Ollama.
Set up Ollama
From the cluster where you installed Ollama, make sure that you have at least one model pulled.
ollama listIf not, pull a model.
ollama pull llama3.2Configure Ollama to accept external connections. By default, Ollama only listens on
localhost. You can change this setting with theOLLAMA_HOSTenvironment variable.export OLLAMA_HOST=0.0.0.0:11434⚠️Binding Ollama to0.0.0.0exposes it on all network interfaces. Use firewall rules to restrict access to your Kubernetes cluster nodes only.Restart Ollama to apply the new setting.
Verify Ollama is accessible from the machine’s network address.
curl http://<OLLAMA_IP>:11434/v1/models
Configure agentgateway to reach Ollama
Because Ollama runs outside your Kubernetes cluster, you need a headless Service and EndpointSlice to give it a stable in-cluster DNS name.
Get the IP address of the machine running Ollama.
# macOS ipconfig getifaddr en0 # Linux hostname -I | awk '{print $1}'Create a headless Service and EndpointSlice that point to the external Ollama instance. Replace
<OLLAMA_IP>with the actual IP address.kubectl apply -f- <<EOF apiVersion: v1 kind: Service metadata: name: ollama namespace: agentgateway-system spec: type: ClusterIP clusterIP: None ports: - port: 11434 targetPort: 11434 protocol: TCP --- apiVersion: discovery.k8s.io/v1 kind: EndpointSlice metadata: name: ollama namespace: agentgateway-system labels: kubernetes.io/service-name: ollama addressType: IPv4 endpoints: - addresses: - <OLLAMA_IP> ports: - port: 11434 protocol: TCP EOFCreate an AgentgatewayBackend resource. The
openaiprovider type is used because Ollama exposes an OpenAI-compatible API. Thehostandportfields point to the headless Service DNS name.kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayBackend metadata: name: ollama namespace: agentgateway-system spec: ai: provider: openai: model: llama3.2 host: ollama.agentgateway-system.svc.cluster.local port: 11434 EOFReview the following table to understand this configuration. For more information, see the API reference.
Setting Description ai.provider.openaiThe OpenAI-compatible provider type. Ollama exposes an OpenAI-compatible API, so the openaitype is used here.openai.modelThe Ollama model to use. This must match a model you pulled with ollama pull.hostThe in-cluster DNS name of the headless Service pointing to the external Ollama instance. portThe port Ollama listens on. The default is 11434.Create an HTTPRoute to expose the Ollama backend through the gateway.
kubectl apply -f- <<EOF apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: ollama namespace: agentgateway-system spec: parentRefs: - name: agentgateway-proxy namespace: agentgateway-system rules: - backendRefs: - name: ollama namespace: agentgateway-system group: agentgateway.dev kind: AgentgatewayBackend EOF
Send a request to verify the setup.
curl "$INGRESS_GW_ADDRESS" \ -H "content-type: application/json" \ -d '{ "model": "llama3.2", "messages": [ { "role": "user", "content": "Explain the benefits of running models locally." } ] }' | jqIn one terminal, start a port-forward to the gateway:
kubectl port-forward -n agentgateway-system svc/agentgateway-proxy 8080:80In a second terminal, send a request:
curl "localhost:8080" \ -H "content-type: application/json" \ -d '{ "model": "llama3.2", "messages": [ { "role": "user", "content": "Explain the benefits of running models locally." } ] }' | jqExample output:
{ "id": "chatcmpl-123", "object": "chat.completion", "created": 1727967462, "model": "llama3.2", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Running models locally provides complete data privacy, no API costs or rate limits, and consistent low latency without network dependencies." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 15, "completion_tokens": 32, "total_tokens": 47 } }
Troubleshooting
Connection refused or 503 response
What’s happening:
Requests fail with a connection error or the gateway returns a 503 response.
Why it’s happening:
The Kubernetes cluster cannot reach the Ollama instance. This is usually caused by an incorrect IP in the EndpointSlice, a firewall blocking port 11434, or Ollama not configured to accept external connections.
How to fix it:
Verify Ollama is reachable from the machine’s network address:
curl http://<OLLAMA_IP>:11434/v1/modelsCheck that the EndpointSlice contains the correct IP:
kubectl get endpointslice ollama -n agentgateway-system -o yamlTest connectivity from inside the cluster:
kubectl run -it --rm debug --image=curlimages/curl --restart=Never \ -- curl http://ollama.agentgateway-system.svc.cluster.local:11434/v1/models
Model not found
What’s happening:
The request returns an error indicating the model is not available.
Why it’s happening:
The model specified in the request or the AgentgatewayBackend resource has not been pulled in Ollama.
How to fix it:
List models available in Ollama:
ollama listPull the model if it is missing:
ollama pull llama3.2