Traffic splitting

Verified Code examples on this page have been automatically tested and verified.

Set up weight-based routing between multiple apps for A/B testing, traffic splitting, and canary deployments.

About A/B testing and traffic splitting

A/B testing, traffic splitting, and canary deployments are techniques for gradually introducing changes by distributing traffic across multiple versions of an app or service based on weight percentages.

Common use cases:

  • A/B testing: Compare two versions of an app by routing a percentage of traffic to each version to measure performance, user engagement, or business metrics.
  • Traffic splitting: Distribute load across multiple backends, such as different LLM models or providers, to balance cost, performance, or capacity.
  • Canary deployments: Gradually roll out a new version of your app by routing a small percentage of traffic to the new version, then increasing the percentage as confidence grows.

These patterns use weighted backendRefs in HTTPRoute (a standard Gateway API feature) to control the percentage of requests sent to each backend. Unlike failover, which uses priority groups to switch between backends when one fails, traffic splitting distributes traffic based on static weight ratios.

Before you begin

  1. Follow the Get started guide to install agentgateway.

  2. Follow the Sample app guide to create a gateway proxy with an HTTP listener and deploy the httpbin sample app.

  3. Get the external address of the gateway and save it in an environment variable.

    export INGRESS_GW_ADDRESS=$(kubectl get svc -n agentgateway-system http -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}")
    echo $INGRESS_GW_ADDRESS  
    kubectl port-forward deployment/agentgateway-proxy -n agentgateway-system 8080:80

Example 1: A/B testing with multiple app versions

This example demonstrates A/B testing and canary deployments by distributing traffic across 3 versions of the Helloworld sample app.

Deploy the Helloworld sample app

  1. Create the helloworld namespace.

    kubectl create namespace helloworld
  2. Deploy the Hellworld sample apps.

    kubectl -n helloworld apply -f https://raw.githubusercontent.com/solo-io/gloo-edge-use-cases/main/docs/sample-apps/helloworld.yaml

    Example output:

    service/helloworld-v1 created
    service/helloworld-v2 created
    service/helloworld-v3 created
    deployment.apps/helloworld-v1 created
    deployment.apps/helloworld-v2 created
    deployment.apps/helloworld-v3 created
  3. Verify that the Helloworld pods are up and running.

    kubectl -n default get pods -n helloworld

    Example output:

    NAME                             READY   STATUS    RESTARTS   AGE
    helloworld-v1-5c457458f-rfkc7    3/3     Running   0          30s
    helloworld-v2-6594c54f6b-8dvjp   3/3     Running   0          29s
    helloworld-v3-8576f76d87-czdll   3/3     Running   0          29s

Set up weighted routing

  1. Create an HTTPRoute resource for the traffic.split.example domain that routes 10% of the traffic to helloworld-v1, 10% to helloworld-v2, and 80% to helloworld-v3.

    This configuration demonstrates a canary deployment pattern where version 3 (the stable version) receives most traffic while versions 1 and 2 (canary versions) receive smaller amounts for testing.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: traffic-split
      namespace: helloworld
    spec:
      parentRefs:
      - name: http
        namespace: agentgateway-system
      hostnames:
      - traffic.split.example
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /
        backendRefs:
        - name: helloworld-v1
          port: 5000
          weight: 10
        - name: helloworld-v2
          port: 5000
          weight: 10
        - name: helloworld-v3
          port: 5000
          weight: 80
    EOF
    SettingDescription
    spec.parentRefs.nameThe name and namespace of the gateway resource that serves the route. In this example, you use the gateway that you created when you set up the Sample app.
    spec.hostnamesThe hostname for which you want to apply traffic splitting.
    spec.rules.matches.pathThe path prefix to match on. In this example, / is used.
    spec.rules.backendRefsA list of services you want to forward traffic to. Use the weight option to define the amount of traffic that you want to forward to each service.
  2. Verify that the HTTPRoute is applied successfully.

    kubectl get httproute/traffic-split -n helloworld -o yaml
  3. Send a few requests to the /hello path. Verify that you see responses from all 3 Helloworld apps, and that most responses are returned from helloworld-v3.

    for i in {1..20}; do curl -i http://$INGRESS_GW_ADDRESS:80/hello \
    -H "host: traffic.split.example:8080"; done
    for i in {1..20}; do curl -i localhost:8080/hello \
    -H "host: traffic.split.example"; done

    Example output:

    HTTP/1.1 200 OK
    server: envoy
    date: Wed, 12 Mar 2025 20:59:35 GMT
    content-type: text/html; charset=utf-8
    content-length: 60
    x-envoy-upstream-service-time: 110
    
    Hello version: v3, instance: helloworld-v3-55bfdf76cf-nv545

Example 2: A/B testing with LLM models

This example demonstrates traffic splitting for LLM workloads, distributing requests across multiple models or providers for cost optimization or A/B testing.

Set up weighted routing for LLM models

  1. Create separate AgentgatewayBackend resources for each model you want to include in the traffic split.

    This example creates two backends: one for the cheaper gpt-4o-mini model and one for the more capable gpt-4o model.

    kubectl apply -f- <<EOF
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayBackend
    metadata:
      name: openai-mini-backend
      namespace: agentgateway-system
    spec:
      ai:
        provider:
          openai:
            model: gpt-4o-mini
      policies:
        auth:
          secretRef:
            name: openai-secret
    ---
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayBackend
    metadata:
      name: openai-premium-backend
      namespace: agentgateway-system
    spec:
      ai:
        provider:
          openai:
            model: gpt-4o
      policies:
        auth:
          secretRef:
            name: openai-secret
    EOF
  2. Create an HTTPRoute resource with weighted backendRefs to distribute traffic between the two backends.

    This example routes 80% of traffic to the cheaper gpt-4o-mini model and 20% to the more capable gpt-4o model, allowing you to optimize costs while testing the premium model’s performance.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: test
      namespace: agentgateway-system
    spec:
      parentRefs:
        - name: agentgateway-proxy
          namespace: agentgateway-system
      rules:
        - matches:
            - path:
                type: PathPrefix
                value: /test
          backendRefs:
            - name: openai-mini-backend
              namespace: agentgateway-system
              group: agentgateway.dev
              kind: AgentgatewayBackend
              weight: 80
            - name: openai-premium-backend
              namespace: agentgateway-system
              group: agentgateway.dev
              kind: AgentgatewayBackend
              weight: 20
    EOF
    SettingDescription
    spec.rules[].backendRefs[].weightThe relative weight for traffic distribution. In this example, weights of 80 and 20 result in an 80/20 traffic split. The default weight is 1 if not specified.
  3. Send multiple requests to observe the traffic distribution. In your request, do not specify a model. Instead, the HTTPRoute distributes traffic according to the backend weights (80% to gpt-4o-mini, 20% to gpt-4o).

    for i in {1..10}; do
      curl -s "$INGRESS_GW_ADDRESS/test" \
        -H "Content-Type: application/json" \
        -d '{"messages": [{"role": "user", "content": "What is 2+2?"}]}' | \
        jq -r '.model'
    done
    for i in {1..10}; do
      curl -s "localhost:8080/test" \
        -H "Content-Type: application/json" \
        -d '{"messages": [{"role": "user", "content": "What is 2+2?"}]}' | \
        jq -r '.model'
    done

    Example output showing ~80% gpt-4o-mini and ~20% gpt-4o responses:

    gpt-4o-mini-2024-07-18
    gpt-4o-mini-2024-07-18
    gpt-4o-2024-08-06
    gpt-4o-mini-2024-07-18
    gpt-4o-mini-2024-07-18
    gpt-4o-mini-2024-07-18
    gpt-4o-mini-2024-07-18
    gpt-4o-2024-08-06
    gpt-4o-mini-2024-07-18
    gpt-4o-mini-2024-07-18

Cleanup

You can remove the resources that you created in this guide.
  1. Remove the backends and routes.

    kubectl delete httproute traffic-split -n helloworld
    kubectl delete httproute test -n agentgateway-system
    kubectl delete AgentgatewayBackend openai-mini-backend openai-premium-backend -n agentgateway-system
  2. Remove the Helloworld apps.

    kubectl delete -n helloworld -f https://raw.githubusercontent.com/solo-io/gloo-edge-use-cases/main/docs/sample-apps/helloworld.yaml
Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.