Virtual key management
Verified Code examples on this page have been automatically tested and verified.Issue API keys to users or applications and control token usage (also known as virtual keys).
About
Virtual key management allows you to issue API keys to users or applications, each with independent tracking and cost controls. Agentgateway achieves this by composing existing capabilities:
- API key authentication: Identify incoming requests by API key
- Token-based rate limiting: Enforce token budgets
- Observability metrics: Track per-key spending and usage
How virtual keys work
flowchart TD
A[Request arrives with API key] --> B[Validate API key]
B --> C{Key valid?}
C -->|Yes| D[Check token budget]
D --> E{Budget available?}
E -->|Yes| F[Forward to LLM]
F --> G[Track token usage]
G --> H[Deduct from budget]
E -->|No| I[Reject with 429]
C -->|No| J[Reject with 401]
subgraph refill["Budget refills periodically"]
H
end
Before you begin
Install theagentgateway binary.Set up virtual keys
Step 1: Configure API key authentication
Create a configuration with API key authentication. This example creates two virtual keys for Alice and Bob.
cat <<'EOF' > config.yaml
# yaml-language-server: $schema=https://agentgateway.dev/schema/config
llm:
policies:
apiKey:
mode: strict
keys:
- key: sk-alice-abc123def456
metadata:
user: alice
- key: sk-bob-xyz789uvw012
metadata:
user: bob
models:
- name: "*"
provider: openAI
params:
apiKey: "$OPENAI_API_KEY"
EOF| Setting | Description |
|---|---|
apiKey.mode | Set to strict to require a valid API key for all requests. Use optional to allow unauthenticated requests. |
apiKey.keys | List of API keys. Each key has a key value and optional metadata. |
key | The API key value that users include in the Authorization: Bearer <key> header. |
metadata | Optional metadata associated with the key, such as a user identifier or tier. |
Step 2: Start agentgateway
agentgateway -f config.yamlStep 3: Test the virtual keys
Send a request with Alice’s API key. Verify that the request succeeds.
curl -s http://localhost:4000/v1/chat/completions \ -H "Authorization: Bearer sk-alice-abc123def456" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello!"}] }' | jq .Example successful response:
{ "choices": [{ "message": { "role": "assistant", "content": "Hello! How can I help you today?" } }], "usage": { "prompt_tokens": 10, "completion_tokens": 9, "total_tokens": 19 } }Send a request without a valid API key. Verify that the request is rejected with a 401 status.
curl -s -o /dev/null -w "%{http_code}" http://localhost:4000/v1/chat/completions \ -H "Authorization: Bearer invalid-key" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Hello!"}] }'Expected response:
HTTP/1.1 401 Unauthorized
Add a global token budget
To add a token budget that limits total token usage across all keys, use the routing-based configuration format with localRateLimit. Local rate limits apply to the gateway as a whole, not per key.
binds/listeners/routes configuration format because localRateLimit is an HTTP-level policy. For more information, see the Routing-based configuration guide.cat <<'EOF' > config.yaml
# yaml-language-server: $schema=https://agentgateway.dev/schema/config
binds:
- port: 4000
listeners:
- routes:
- backends:
- ai:
name: openai
provider:
openAI:
model: gpt-3.5-turbo
policies:
apiKey:
mode: strict
keys:
- key: sk-alice-abc123def456
metadata:
user: alice
- key: sk-bob-xyz789uvw012
metadata:
user: bob
backendAuth:
key: "$OPENAI_API_KEY"
localRateLimit:
- maxTokens: 100000
tokensPerFill: 100000
fillInterval: 86400s
type: tokens
EOF| Setting | Description |
|---|---|
localRateLimit | Token-based rate limiting applied to all requests through this route. |
maxTokens | The maximum number of tokens available in the budget. |
tokensPerFill | The number of tokens added during each refill. |
fillInterval | The interval between refills. Use 86400s for a daily budget. |
type | Set to tokens for token-based limits. Use requests for request-based limits. |
For more details on rate limiting, see Control spend.
Monitor per-key spending
Track token usage and spending for each virtual key using Prometheus metrics exposed by agentgateway.
Access the agentgateway metrics endpoint.
curl http://localhost:15000/metricsQuery token usage metrics.
# Total tokens consumed over the last 24 hours sum( increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) + increase(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h]) )Calculate costs by multiplying token counts by your provider’s pricing. For example, with OpenAI GPT-3.5:
# Estimated cost (assuming $0.50 per 1M input tokens, $1.50 per 1M output tokens) sum( ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[24h]) / 1000000) * 0.50) + ((rate(agentgateway_gen_ai_client_token_usage_sum{gen_ai_token_type="output"}[24h]) / 1000000) * 1.50) )
For more information on cost tracking, see the cost tracking guide.
What’s next
- Manage API keys for detailed authentication configuration
- Control spend for token-based rate limiting
- Set up observability to view token usage metrics and logs