Anthropic

Configure Anthropic (Claude) as an LLM provider in agentgateway.

Before you begin

Install and set up an agentgateway proxy.

Set up access to Anthropic

  1. Get an API key to access the Anthropic API.

  2. Save the API key in an environment variable.

    export ANTHROPIC_API_KEY=<insert your API key>
  3. Create a Kubernetes secret to store your Anthropic API key.

    kubectl apply -f- <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: anthropic-secret
      namespace: agentgateway-system
    type: Opaque
    stringData:
      Authorization: $ANTHROPIC_API_KEY
    EOF
  4. Create an AgentgatewayBackend resource to configure your LLM provider that references the Anthropic API key secret.

    kubectl apply -f- <<EOF
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayBackend
    metadata:
      name: anthropic
      namespace: agentgateway-system
    spec:
      ai:
        provider:
          anthropic:
            model: "claude-opus-4-6"
      policies:
        auth:
          secretRef:
            name: anthropic-secret
    EOF

    Review the following table to understand this configuration. For more information, see the API reference.

    SettingDescription
    ai.provider.anthropicDefine the LLM provider that you want to use. The example uses Anthropic.
    anthropic.modelThe model to use to generate responses. In this example, you use the claude-opus-4-6 model.
    policies.authProvide the credentials to use to access the Anthropic API. The example refers to the secret that you previously created. The token is automatically sent in the x-api-key header.
  5. Create an HTTPRoute resource that routes incoming traffic to the AgentgatewayBackend. The following example sets up a route on the /anthropic path. Note that agentgateway automatically rewrites the endpoint to the Anthropic /v1/messages endpoint.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: anthropic
      namespace: agentgateway-system
    spec:
      parentRefs:
        - name: agentgateway-proxy
          namespace: agentgateway-system
      rules:
      - backendRefs:
        - name: anthropic
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
    EOF
    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: anthropic
      namespace: agentgateway-system
    spec:
      parentRefs:
        - name: agentgateway-proxy
          namespace: agentgateway-system
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /v1/chat/completions
        backendRefs:
        - name: anthropic
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
    EOF
    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: anthropic
      namespace: agentgateway-system
    spec:
      parentRefs:
        - name: agentgateway-proxy
          namespace: agentgateway-system
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /anthropic
        backendRefs:
        - name: anthropic
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
    EOF
  6. Send a request to the LLM provider API along the route that you previously created. Verify that the request succeeds and that you get back a response from the API.

    Cloud Provider LoadBalancer:

    curl "$INGRESS_GW_ADDRESS/v1/messages" -H content-type:application/json  -d '{
       "model": "",
       "messages": [
         {
           "role": "user",
           "content": "Explain how AI works in simple terms."
         }
       ]
     }' | jq

    Localhost:

    curl "localhost:8080/v1/messages" -H content-type:application/json  -d '{
       "model": "",
       "messages": [
         {
           "role": "user",
           "content": "Explain how AI works in simple terms."
         }
       ]
     }' | jq

    Cloud Provider LoadBalancer:

    curl "$INGRESS_GW_ADDRESS/v1/chat/completions" -H content-type:application/json  -d '{
       "model": "",
       "messages": [
         {
           "role": "user",
           "content": "Explain how AI works in simple terms."
         }
       ]
     }' | jq

    Localhost:

    curl "localhost:8080/v1/chat/completions" -H content-type:application/json  -d '{
       "model": "",
       "messages": [
         {
           "role": "user",
           "content": "Explain how AI works in simple terms."
         }
       ]
     }' | jq

    Cloud Provider LoadBalancer:

    curl "$INGRESS_GW_ADDRESS/anthropic" -H content-type:application/json  -d '{
       "model": "",
       "messages": [
         {
           "role": "user",
           "content": "Explain how AI works in simple terms."
         }
       ]
     }' | jq

    Localhost:

    curl "localhost:8080/anthropic" -H content-type:application/json  -d '{
       "model": "",
       "messages": [
         {
           "role": "user",
           "content": "Explain how AI works in simple terms."
         }
       ]
     }' | jq

    Example output:

    {
      "model": "claude-opus-4-6",
      "usage": {
        "prompt_tokens": 16,
        "completion_tokens": 318,
        "total_tokens": 334
      },
      "choices": [
        {
          "message": {
            "content": "Artificial Intelligence (AI) is a field of computer science that focuses on creating machines that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. Here's a simple explanation of how AI works:\n\n1. Data input: AI systems require data to learn and make decisions. This data can be in the form of images, text, numbers, or any other format.\n\n2. Training: The AI system is trained using this data. During training, the system learns to recognize patterns, relationships, and make predictions based on the input data.\n\n3. Algorithms: AI uses various algorithms, which are sets of instructions or rules, to process and analyze the data. These algorithms can be simple or complex, depending on the task at hand.\n\n4. Machine Learning: A subset of AI, machine learning, enables the system to automatically learn and improve from experience without being explicitly programmed. As the AI system is exposed to more data, it can refine its algorithms and become more accurate over time.\n\n5. Output: Once the AI system has processed the data, it generates an output. This output can be a prediction, a decision, or an action, depending on the purpose of the AI system.\n\nAI can be categorized into narrow (weak) AI and general (strong) AI. Narrow AI is designed to perform a specific task, such as playing chess or recognizing speech, while general AI aims to have human-like intelligence that can perform any intellectual task.",
            "role": "assistant"
          },
          "index": 0,
          "finish_reason": "stop"
        }
      ],
      "id": "msg_01PbaJfDHnjEBG4BueJNR2ff",
      "created": 1764627002,
      "object": "chat.completion"
    }

Extended thinking and reasoing

Extended thinking and reasoning lets Claude reason through complex problems before generating a response. You can opt in to extended thinking and reasoning by adding specific parameters to your request.

ℹ️
Extended thinking and reasoning requires a Claude model that supports these, such as claude-opus-4-6.

To opt in to extended thinking, include the thinking.type field in your request. You can also set the output_config.effort field to control how much reasoning the model applies.

The following values are supported:

thinking field

type valueAdditional fieldsBehavior
adaptiveoutput_config.effortThe model decides whether to think and how much. Requires output_config.effort to be set.
enabledbudget_tokens: <number>Explicitly enables thinking with a fixed token budget. Works standalone without output_config.
disablednoneExplicitly disables thinking.

output_config field

output_config has two independent sub-fields. You can use either or both.

Sub-fieldDescription
effortControls the reasoning effort level. Accepted values: low, medium, high, max.
formatConstrains the response to a JSON schema. Set type to json_schema and provide a schema object. For more information, see Structured outputs.

The following example request uses adaptive extended thinking. Note that this setting requires the output_config.effort field to be set too.

Cloud Provider LoadBalancer:

curl "$INGRESS_GW_ADDRESS/v1/messages" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 1024,
  "thinking": {
    "type": "adaptive"
  },
  "output_config": {
    "effort": "high"
  },
  "messages": [
    {
      "role": "user",
      "content": "Explain the trade-offs between consistency and availability in distributed systems."
    }
  ]
}' | jq

Localhost:

curl "localhost:8080/v1/messages" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 1024,
  "thinking": {
    "type": "adaptive"
  },
  "output_config": {
    "effort": "high"
  },
  "messages": [
    {
      "role": "user",
      "content": "Explain the trade-offs between consistency and availability in distributed systems."
    }
  ]
}' | jq

Example output:

{
  "id": "msg_01HVEzWf4NJrsKyVeEUDnHNW",
  "type": "message",
  "role": "assistant",
  "model": "claude-opus-4-6",
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me think through the trade-offs between consistency and availability..."
    },
    {
      "type": "text",
      "text": "# Consistency vs. Availability in Distributed Systems\n\n..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 21,
    "output_tokens": 1024
  }
}

Use the reasoning_effort field in your request to enable extended thinking. The value that you set is automatically mapped to a specific thinking budget as shown in the following table.

reasoning_effort valueThinking budget
minimal or low1,024 tokens
medium2,048 tokens
high or xhigh4,096 tokens

Note that the max_tokens value must be greater than the tokens in the thinking budget for the request to succeed.

Cloud Provider LoadBalancer:

curl "$INGRESS_GW_ADDRESS/v1/chat/completions" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 6000,
  "reasoning_effort": "high",
  "messages": [
    {
      "role": "user",
      "content": "Explain the trade-offs between consistency and availability in distributed systems."
    }
  ]
}' | jq

Localhost:

curl "localhost:8080/v1/chat/completions" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 6000,
  "reasoning_effort": "high",
  "messages": [
    {
      "role": "user",
      "content": "Explain the trade-offs between consistency and availability in distributed systems."
    }
  ]
}' | jq

Example output:

{
  "model": "claude-opus-4-6",
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 2549,
    "total_tokens": 2599,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "cache_read_input_tokens": 0,
    "cache_creation_input_tokens": 0
  },
  "choices": [
    {
      "message": {
        "content": "# Consistency vs. Availability in Distributed ..."
      },
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  "id": "msg_01CVnXAQYeWkUjeaDceBRk3e",
  "created": 1773251049,
  "object": "chat.completion"
}

Structured outputs

Structured outputs constrain the model to respond with a specific JSON schema. You must provide the schema definition in your request.

Provide the JSON schema definition in the output_config.format field.

Cloud Provider LoadBalancer:

curl "$INGRESS_GW_ADDRESS/v1/messages" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 256,
  "output_config": {
    "format": {
      "type": "json_schema",
      "schema": {
        "type": "object",
        "properties": {
          "answer": { "type": "string" },
          "confidence": { "type": "number" }
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "Is the sky blue? Respond with your answer and a confidence score between 0 and 1."
    }
  ]
}' | jq

Localhost:

curl "localhost:8080/v1/messages" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 256,
  "output_config": {
    "format": {
      "type": "json_schema",
      "schema": {
        "type": "object",
        "properties": {
          "answer": { "type": "string" },
          "confidence": { "type": "number" }
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "Is the sky blue? Respond with your answer and a confidence score between 0 and 1."
    }
  ]
}' | jq

Example output:

{
  "id": "msg_01PsCxtLN1vftAKZgvWXhCan",
  "type": "message",
  "role": "assistant",
  "model": "claude-opus-4-6",
  "content": [
    {
      "type": "text",
      "text": "{\"answer\":\"Yes, the sky is blue during clear daytime conditions.\",\"confidence\":0.98}"
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 29,
    "output_tokens": 28
  }
}

Provide the schema definition in the response_format field.

Cloud Provider LoadBalancer:

curl "$INGRESS_GW_ADDRESS/v1/chat/completions" -H content-type:application/json -d '{
  "model": "",
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "answer_schema",
      "schema": {
        "type": "object",
        "properties": {
          "answer": { "type": "string" },
          "confidence": { "type": "number" }
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "Is the sky blue? Respond with your answer and a confidence score between 0 and 1."
    }
  ]
}' | jq

Localhost:

curl "localhost:8080/v1/chat/completions" -H content-type:application/json -d '{
  "model": "",
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "answer_schema",
      "schema": {
        "type": "object",
        "properties": {
          "answer": { "type": "string" },
          "confidence": { "type": "number" }
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "Is the sky blue? Respond with your answer and a confidence score between 0 and 1."
    }
  ]
}' | jq

Example output:

{
  "model": "claude-opus-4-6",
  "usage": {
    "prompt_tokens": 192,
    "completion_tokens": 68,
    "total_tokens": 260,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "cache_read_input_tokens": 0,
    "cache_creation_input_tokens": 0
  },
  "choices": [
    {
      "message": {
        "content": "{\"answer\":\"Yes, the sky is blue...",
        "role": "assistant"
      },
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  "id": "msg_01BLohqXbvfZHQnnXxmviCcg",
  "created": 1773251560,
  "object": "chat.completion"
}

Connect to Claude Code

To route Claude Code CLI traffic through agentgateway, see the Claude Code integration guide. For a full tutorial with prompt guards and observability, see the Claude Code CLI proxy tutorial.

Next steps

Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.