Advanced

Advanced

Advanced configuration and customization options for Knull.

Header Mutation

Knull can modify HTTP headers before sending requests to backends:

models:
  - id: gpt-4o-mini
    provider: azure
    endpoint: ${AZURE_OPENAI_ENDPOINT_HOSTNAME}
    apiKey: ${AZURE_OPENAI_API_KEY}
    headerMutation:
      set:
        - name: x-custom-header
          value: "custom-value"
        - name: x-azure-api-version
          value: "2024-05-01-preview"
      remove:
        - authorization
        - x-api-key

Header Mutation Fields

FieldDescription
setHeaders to add or overwrite
removeHeaders to remove from request

Body Mutation

Modify request body fields before sending to backends:

models:
  - id: gpt-4o-mini
    provider: azure
    endpoint: ${AZURE_OPENAI_ENDPOINT_HOSTNAME}
    bodyMutation:
      set:
        - path: max_tokens
          value: "1000"
        - path: temperature
          value: "0.7"
      remove:
        - user

Body Mutation Fields

FieldDescription
setFields to add or overwrite
removeTop-level fields to remove

Body Field Value Types

Values are parsed as JSON:

bodyMutation:
  set:
    - path: max_tokens
      value: "1000"  # Number
 
    - path: temperature
      value: "0.7"   # Number
 
    - path: metadata
      value: '{"source": "knull"}'  # Object
 
    - path: stop
      value: '["\\n", "##"]'  # Array

Custom API Schemas

Override the default API schema for a model:

models:
  - id: custom-model
    provider: openai_compatible
    endpoint: http://custom-api:8080
    schema:
      name: OpenAI
      prefix: /api/v2

Schema Options

SchemaDescription
OpenAIStandard OpenAI API (/v1/chat/completions)
AzureOpenAIAzure OpenAI API
AWSBedrockAWS Bedrock API
AWSAnthropicAWS Bedrock Anthropic API
AnthropicAnthropic API (/v1/messages)
GCPVertexAIGoogle VertexAI API
GCPAnthropicGoogle VertexAI Anthropic API
CohereCohere API

Custom Prefix

models:
  - id: custom-model
    provider: openai_compatible
    endpoint: http://custom-api:8080
    schema:
      name: OpenAI
      prefix: /custom/api/v1

Requests to /v1/chat/completions will be routed to /custom/api/v1/chat/completions.

Backend Authentication

Configure how Knull authenticates to backend services:

Azure API Key

models:
  - id: azure-model
    provider: azure
    endpoint: ${AZURE_OPENAI_ENDPOINT}
    auth:
      azureAPIKey:
        key: ${AZURE_API_KEY}

AWS Credentials

models:
  - id: bedrock-model
    provider: aws_bedrock
    endpoint: ${BEDROCK_ENDPOINT}
    awsRegion: us-east-1
    auth:
      aws:
        credentialFileLiteral: |
          [default]
          aws_access_key_id = ${AWS_ACCESS_KEY}
          aws_secret_access_key = ${AWS_SECRET_KEY}
        region: us-east-1

GCP Authentication

models:
  - id: vertex-model
    provider: gcp_vertex
    endpoint: ${VERTEX_ENDPOINT}
    gcpProject: my-project
    gcpLocation: us-central1
    auth:
      gcp:
        accessToken: ${GCP_TOKEN}
        region: us-central1
        projectName: my-project

Advanced Routing

Model Aliases

Use aliases to simplify model names:

models:
  - id: gpt-4o-mini
    provider: azure
    endpoint: ${AZURE_ENDPOINT}
    alias: gpt

Now clients can use either gpt-4o-mini or gpt:

curl http://localhost:1975/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt", "messages": [...]}'

Header-based Routing

Route based on custom headers:

# Requires custom Envoy configuration

Token Counting

Knull counts tokens for usage tracking:

Token Types

TypeDescription
InputTokenPrompt tokens
OutputTokenResponse tokens
CachedInputTokenCached prompt tokens
TotalTokenInput + Output

CEL Cost Calculation

Knull allows you to calculate request costs using complex logic via Common Expression Language (CEL). This is useful when backends have tiered pricing or when you want to apply custom multipliers.

Configuration

# Inside your gateway or model configuration
llmRequestCosts:
  - metadataKey: "custom_cost"
    type: CEL
    cel: "model == 'gpt-4' ? input_tokens * 2 + output_tokens * 4 : total_tokens"

Available Variables

VariableTypeDescription
modelstringThe model name in the request.
backendstringThe target backend name (name.namespace).
input_tokensuintCount of input (prompt) tokens.
output_tokensuintCount of output (completion) tokens.
cached_input_tokensuintCount of cached input tokens.
total_tokensuintTotal tokens processed.

Example Expressions

  • Tiered Pricing: backend == 'premium' ? total_tokens * 10 : total_tokens
  • Cache Discount: (input_tokens - cached_input_tokens) + (cached_input_tokens * 0.1)
  • Safety Margin: total_tokens * 1.1

Metrics and Observability

Prometheus Metrics

Knull exposes Prometheus metrics at /metrics:

curl http://localhost:1064/metrics

Available metrics:

  • knull_requests_total - Total requests
  • knull_tokens_total - Total tokens processed
  • knull_latency_seconds - Request latency
  • knull_errors_total - Total errors

OpenTelemetry

Enable OpenTelemetry tracing:

OTEL_AIGW_METRICS_REQUEST_HEADER_ATTRIBUTES="user_id,api_key" \
OTEL_AIGW_SPAN_REQUEST_HEADER_ATTRIBUTES="user_id,api_key" \
./bin/knull run config.yaml --debug

Performance Tuning

Worker Threads

Knull uses Go's runtime for concurrency. For high throughput:

GOMAXPROCS=4 ./bin/knull run config.yaml

Connection Pooling

Backend connection pooling is handled by Envoy. Configure in Envoy resources.

Memory Usage

Default memory is sufficient for moderate loads. For high traffic:

resources:
  limits:
    memory: 2Gi
    cpu: 2000m

Debugging

Enable Debug Mode

./bin/knull run config.yaml --debug

View Logs

Knull follows XDG standards for log placement.

# Follow logs in the state directory
tail -f ~/.local/state/knull/runs/*/aigw.log

Check Health

# Basic health
curl http://localhost:1064/health
 
# Readiness
curl http://localhost:1064/ready

View Configuration

# View current active configuration
curl http://localhost:1064/config

View Logs

# Follow logs
tail -f ~/.local/state/knull/runs/*/aigw.log
 
# Or use journald
journalctl -u knull -f

Test Configuration

# Validate YAML
./bin/knull validate config.yaml
 
# Dry run
./bin/knull run config.yaml --dry-run

Troubleshooting

Check Configuration

# View loaded configuration
curl http://localhost:1064/config

Check Health

# Basic health
curl http://localhost:1064/health
 
# Readiness
curl http://localhost:1064/ready

Common Issues

Connection Refused

Error: connection refused to backend

Check:

  • Backend endpoint is correct
  • Backend is running
  • Network connectivity

API Key Invalid

Error: invalid API key

Check:

  • API key format is correct (sk- prefix)
  • API key exists in configuration
  • Policy allows the model

Budget Exceeded

Error: budget exceeded

Check:

  • Policy has remaining budget
  • Usage is correctly tracked
  • Consider increasing budget limit