Traffic splitting

Verified

Set up weight-based routing between multiple apps for A/B testing, traffic splitting, and canary deployments.

About A/B testing and traffic splitting

A/B testing, traffic splitting, and canary deployments are techniques for gradually introducing changes by distributing traffic across multiple versions of an app or service based on weight percentages.

Common use cases:

A/B testing: Compare two versions of an app by routing a percentage of traffic to each version to measure performance, user engagement, or business metrics.
Traffic splitting: Distribute load across multiple backends, such as different LLM models or providers, to balance cost, performance, or capacity.
Canary deployments: Gradually roll out a new version of your app by routing a small percentage of traffic to the new version, then increasing the percentage as confidence grows.

These patterns use weighted backendRefs in HTTPRoute (a standard Gateway API feature) to control the percentage of requests sent to each backend. Unlike failover, which uses priority groups to switch between backends when one fails, traffic splitting distributes traffic based on static weight ratios.

Before you begin

Follow the Get started guide to install agentgateway.
Follow the Sample app guide to create a gateway proxy with an HTTP listener and deploy the httpbin sample app.

Get the external address of the gateway and save it in an environment variable.

export INGRESS_GW_ADDRESS=$(kubectl get svc -n agentgateway-system http -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}")
echo $INGRESS_GW_ADDRESS

kubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 8080:80

Example 1: A/B testing with multiple app versions

This example demonstrates A/B testing and canary deployments by distributing traffic across 3 versions of the Helloworld sample app.

Deploy the Helloworld sample app

Create the helloworld namespace.
```
kubectl create namespace helloworld
```

Deploy the Hellworld sample apps.

kubectl -n helloworld apply -f https://raw.githubusercontent.com/solo-io/gloo-edge-use-cases/main/docs/sample-apps/helloworld.yaml

Example output:

service/helloworld-v1 created
service/helloworld-v2 created
service/helloworld-v3 created
deployment.apps/helloworld-v1 created
deployment.apps/helloworld-v2 created
deployment.apps/helloworld-v3 created

Verify that the Helloworld pods are up and running.

kubectl -n default get pods -n helloworld

Example output:

NAME                             READY   STATUS    RESTARTS   AGE
helloworld-v1-5c457458f-rfkc7    3/3     Running   0          30s
helloworld-v2-6594c54f6b-8dvjp   3/3     Running   0          29s
helloworld-v3-8576f76d87-czdll   3/3     Running   0          29s

Set up weighted routing

Create an HTTPRoute resource for the traffic.split.example domain that routes 10% of the traffic to helloworld-v1, 10% to helloworld-v2, and 80% to helloworld-v3.

This configuration demonstrates a canary deployment pattern where version 3 (the stable version) receives most traffic while versions 1 and 2 (canary versions) receive smaller amounts for testing.

kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: traffic-split
  namespace: helloworld
spec:
  parentRefs:
  - name: http
    namespace: agentgateway-system
  hostnames:
  - traffic.split.example
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: helloworld-v1
      port: 5000
      weight: 10
    - name: helloworld-v2
      port: 5000
      weight: 10
    - name: helloworld-v3
      port: 5000
      weight: 80
EOF

Setting	Description
`spec.parentRefs.name`	The name and namespace of the gateway resource that serves the route. In this example, you use the gateway that you created when you set up the Sample app.
`spec.hostnames`	The hostname for which you want to apply traffic splitting.
`spec.rules.matches.path`	The path prefix to match on. In this example, `/` is used.
`spec.rules.backendRefs`	A list of services you want to forward traffic to. Use the `weight` option to define the amount of traffic that you want to forward to each service.

Verify that the HTTPRoute is applied successfully.

kubectl get httproute/traffic-split -n helloworld -o yaml

Send a few requests to the /hello path. Verify that you see responses from all 3 Helloworld apps, and that most responses are returned from helloworld-v3.

for i in {1..20}; do curl -i http://$INGRESS_GW_ADDRESS:80/hello \
-H "host: traffic.split.example:8080"; done

for i in {1..20}; do curl -i localhost:8080/hello \
-H "host: traffic.split.example"; done

Example output:

HTTP/1.1 200 OK
server: envoy
date: Wed, 12 Mar 2025 20:59:35 GMT
content-type: text/html; charset=utf-8
content-length: 60
x-envoy-upstream-service-time: 110

Hello version: v3, instance: helloworld-v3-55bfdf76cf-nv545

Example 2: A/B testing with LLM models

This example demonstrates traffic splitting for LLM workloads, distributing requests across multiple models or providers for cost optimization or A/B testing.

Set up weighted routing for LLM models

Create separate AgentgatewayBackend resources for each model you want to include in the traffic split.

This example creates two backends: one for the cheaper gpt-4o-mini model and one for the more capable gpt-4o model.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: openai-mini-backend
  namespace: agentgateway-system
spec:
  ai:
    provider:
      openai:
        model: gpt-4o-mini
  policies:
    auth:
      secretRef:
        name: openai-secret
---
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: openai-premium-backend
  namespace: agentgateway-system
spec:
  ai:
    provider:
      openai:
        model: gpt-4o
  policies:
    auth:
      secretRef:
        name: openai-secret
EOF

Create an HTTPRoute resource with weighted backendRefs to distribute traffic between the two backends.

This example routes 80% of traffic to the cheaper gpt-4o-mini model and 20% to the more capable gpt-4o model, allowing you to optimize costs while testing the premium model’s performance.

kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: test
  namespace: agentgateway-system
spec:
  parentRefs:
    - name: agentgateway-proxy
      namespace: agentgateway-system
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /test
      backendRefs:
        - name: openai-mini-backend
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
          weight: 80
        - name: openai-premium-backend
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
          weight: 20
EOF

Setting	Description
`spec.rules[].backendRefs[].weight`	The relative weight for traffic distribution. In this example, weights of 80 and 20 result in an 80/20 traffic split. The default weight is 1 if not specified.

Send multiple requests to observe the traffic distribution. In your request, do not specify a model. Instead, the HTTPRoute distributes traffic according to the backend weights (80% to gpt-4o-mini, 20% to gpt-4o).

for i in {1..10}; do
  curl -s "$INGRESS_GW_ADDRESS/test" \
    -H "Content-Type: application/json" \
    -d '{"messages": [{"role": "user", "content": "What is 2+2?"}]}' | \
    jq -r '.model'
done

for i in {1..10}; do
  curl -s "localhost:8080/test" \
    -H "Content-Type: application/json" \
    -d '{"messages": [{"role": "user", "content": "What is 2+2?"}]}' | \
    jq -r '.model'
done

Example output showing ~80% gpt-4o-mini and ~20% gpt-4o responses:

gpt-4o-mini-2024-07-18
gpt-4o-mini-2024-07-18
gpt-4o-2024-08-06
gpt-4o-mini-2024-07-18
gpt-4o-mini-2024-07-18
gpt-4o-mini-2024-07-18
gpt-4o-mini-2024-07-18
gpt-4o-2024-08-06
gpt-4o-mini-2024-07-18
gpt-4o-mini-2024-07-18

Cleanup

You can remove the resources that you created in this guide.

Remove the backends and routes.

kubectl delete httproute traffic-split -n helloworld
kubectl delete httproute test -n agentgateway-system
kubectl delete AgentgatewayBackend openai-mini-backend openai-premium-backend -n agentgateway-system

Remove the Helloworld apps.

kubectl delete -n helloworld -f https://raw.githubusercontent.com/solo-io/gloo-edge-use-cases/main/docs/sample-apps/helloworld.yaml

Transformations

Traffic splitting

About A/B testing and traffic splitting

Before you begin

Example 1: A/B testing with multiple app versions

Deploy the Helloworld sample app

Set up weighted routing

Example 2: A/B testing with LLM models

Set up weighted routing for LLM models

Cleanup

What could be improved?