Fallbacks

Wenn ein Aufruf nach `num_retries` fehlschlägt, wird auf eine andere Modellgruppe zurückgegriffen.

Schnellstart Load Balancing
Schnellstart Client-seitige Fallbacks

Fallbacks werden typischerweise von einem model_name zu einem anderen model_name durchgeführt.

Schnellstart

1. Fallbacks einrichten

Wichtige Änderung

fallbacks=[{"gpt-3.5-turbo": ["gpt-4"]}]

SDK
PROXY

from litellm import Router 
router = Router(
    model_list=[
    {
      "model_name": "gpt-3.5-turbo",
      "litellm_params": {
        "model": "azure/<your-deployment-name>",
        "api_base": "<your-azure-endpoint>",
        "api_key": "<your-azure-api-key>",
        "rpm": 6
      }
    },
    {
      "model_name": "gpt-4",
      "litellm_params": {
        "model": "azure/gpt-4-ca",
        "api_base": "https://my-endpoint-canada-berri992.openai.azure.com/",
        "api_key": "<your-azure-api-key>",
        "rpm": 6
      }
    }
    ],
    fallbacks=[{"gpt-3.5-turbo": ["gpt-4"]}] # 👈 KEY CHANGE
)

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/<your-deployment-name>
      api_base: <your-azure-endpoint>
      api_key: <your-azure-api-key>
      rpm: 6      # Rate limit for this deployment: in requests per minute (rpm)
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4-ca
      api_base: https://my-endpoint-canada-berri992.openai.azure.com/
      api_key: <your-azure-api-key>
      rpm: 6

router_settings:
  fallbacks: [{"gpt-3.5-turbo": ["gpt-4"]}]

2. Proxy starten

litellm --config /path/to/config.yaml

3. Fallbacks testen

Übergeben Sie mock_testing_fallbacks=true im Request-Body, um Fallbacks auszulösen.

SDK
PROXY

from litellm import Router

model_list = [{..}, {..}] # defined in Step 1.

router = Router(model_list=model_list, fallbacks=[{"bad-model": ["my-good-model"]}])

response = router.completion(
    model="bad-model",
    messages=[{"role": "user", "content": "Hey, how's it going?"}],
    mock_testing_fallbacks=True,
)

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "my-bad-model",
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
  "mock_testing_fallbacks": true # 👈 KEY CHANGE
}
'

Erklärung

Fallbacks werden der Reihe nach durchgeführt -["gpt-3.5-turbo", "gpt-4", "gpt-4-32k"], wird zuerst 'gpt-3.5-turbo', dann 'gpt-4' usw. aufgerufen.

Sie können auch default_fallbacks festlegen, falls eine bestimmte Modellgruppe falsch konfiguriert / schlecht ist.

Es gibt 3 Arten von Fallbacks

content_policy_fallbacks: Für litellm.ContentPolicyViolationError - LiteLLM ordnet Content-Policy-Verletzungsfehler über Anbieter hinweg zu Code anzeigen
context_window_fallbacks: Für litellm.ContextWindowExceededErrors - LiteLLM ordnet Context-Window-Fehlermeldungen über Anbieter hinweg zu Code anzeigen
fallbacks: Für alle übrigen Fehler - z. B. litellm.RateLimitError

Client-seitige Fallbacks

Legen Sie Fallbacks im .completion() Aufruf für SDK und clientseitig für den Proxy fest.

Bei dieser Anfrage wird Folgendes passieren

Der Aufruf an model="zephyr-beta" wird fehlschlagen
Der liteLLM-Proxy wird alle im fallbacks=["gpt-3.5-turbo"] angegebenen Modellgruppen durchlaufen
Der Aufruf an model="gpt-3.5-turbo" wird erfolgreich sein und der Client, der die Anfrage stellt, erhält eine Antwort von gpt-3.5-turbo

👉 Wichtige Änderung: "fallbacks": ["gpt-3.5-turbo"]

SDK
PROXY

from litellm import Router

router = Router(model_list=[..]) # defined in Step 1.

resp = router.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hey, how's it going?"}],
    mock_testing_fallbacks=True, # 👈 trigger fallbacks
    fallbacks=[
        {
            "model": "claude-3-haiku",
            "messages": [{"role": "user", "content": "What is LiteLLM?"}],
        }
    ],
)

print(resp)

OpenAI Python v1.0.0+
Curl-Anfrage
Langchain

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="zephyr-beta",
    messages = [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    extra_body={
        "fallbacks": ["gpt-3.5-turbo"]
    }
)

print(response)

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "zephyr-beta"",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ],
    "fallbacks": ["gpt-3.5-turbo"]
}'

from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
import os 

os.environ["OPENAI_API_KEY"] = "anything"

chat = ChatOpenAI(
    openai_api_base="http://0.0.0.0:4000",
    model="zephyr-beta",
    extra_body={
        "fallbacks": ["gpt-3.5-turbo"]
    }
)

messages = [
    SystemMessage(
        content="You are a helpful assistant that im using to make a test request to."
    ),
    HumanMessage(
        content="test from litellm. tell me why it's amazing in 1 sentence"
    ),
]
response = chat(messages)

print(response)

Fallback-Prompts steuern

Übergeben Sie Nachrichten/Temperatur/etc. pro Modell im Fallback (funktioniert auch für Embedding/Bildgenerierung/etc.).

Wichtige Änderung

fallbacks = [
  {
    "model": <model_name>,
    "messages": <model-specific-messages>
    ... # any other model-specific parameters
  }
]

SDK
PROXY

from litellm import Router

router = Router(model_list=[..]) # defined in Step 1.

resp = router.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hey, how's it going?"}],
    mock_testing_fallbacks=True, # 👈 trigger fallbacks
    fallbacks=[
        {
            "model": "claude-3-haiku",
            "messages": [{"role": "user", "content": "What is LiteLLM?"}],
        }
    ],
)

print(resp)

OpenAI Python v1.0.0+
Curl-Anfrage
Langchain

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="zephyr-beta",
    messages = [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    extra_body={
      "fallbacks": [{
          "model": "claude-3-haiku",
          "messages": [{"role": "user", "content": "What is LiteLLM?"}]
      }]
    }
)

print(response)

curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Hi, how are you ?"
          }
        ]
      }
    ],
    "fallbacks": [{
        "model": "claude-3-haiku",
        "messages": [{"role": "user", "content": "What is LiteLLM?"}]
    }],
    "mock_testing_fallbacks": true
}'

from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
import os 

os.environ["OPENAI_API_KEY"] = "anything"

chat = ChatOpenAI(
    openai_api_base="http://0.0.0.0:4000",
    model="zephyr-beta",
    extra_body={
      "fallbacks": [{
          "model": "claude-3-haiku",
          "messages": [{"role": "user", "content": "What is LiteLLM?"}]
      }]
    }
)

messages = [
    SystemMessage(
        content="You are a helpful assistant that im using to make a test request to."
    ),
    HumanMessage(
        content="test from litellm. tell me why it's amazing in 1 sentence"
    ),
]
response = chat(messages)

print(response)

Fallback bei Inhaltsrichtlinienverletzung

Wichtige Änderung

content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}]

SDK
PROXY

from litellm import Router 

router = Router(
    model_list=[
        {
            "model_name": "claude-2",
            "litellm_params": {
                "model": "claude-2",
                "api_key": "",
                "mock_response": Exception("content filtering policy"),
            },
        },
        {
            "model_name": "my-fallback-model",
            "litellm_params": {
                "model": "claude-2",
                "api_key": "",
                "mock_response": "This works!",
            },
        },
    ],
    content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}], # 👈 KEY CHANGE
    # fallbacks=[..], # [OPTIONAL]
    # context_window_fallbacks=[..], # [OPTIONAL]
)

response = router.completion(
    model="claude-2",
    messages=[{"role": "user", "content": "Hey, how's it going?"}],
)

Fügen Sie in Ihrer Proxy config.yaml einfach diese Zeile hinzu 👇

router_settings:
    content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}]

Proxy starten

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

Fallback bei überschrittenem Kontextfenster

Wichtige Änderung

context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}]

SDK
PROXY

from litellm import Router 

router = Router(
    model_list=[
        {
            "model_name": "claude-2",
            "litellm_params": {
                "model": "claude-2",
                "api_key": "",
                "mock_response": Exception("prompt is too long"),
            },
        },
        {
            "model_name": "my-fallback-model",
            "litellm_params": {
                "model": "claude-2",
                "api_key": "",
                "mock_response": "This works!",
            },
        },
    ],
    context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}], # 👈 KEY CHANGE
    # fallbacks=[..], # [OPTIONAL]
    # content_policy_fallbacks=[..], # [OPTIONAL]
)

response = router.completion(
    model="claude-2",
    messages=[{"role": "user", "content": "Hey, how's it going?"}],
)

Fügen Sie in Ihrer Proxy config.yaml einfach diese Zeile hinzu 👇

router_settings:
    context_window_fallbacks=[{"claude-2": ["my-fallback-model"]}]

Proxy starten

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

Erweitert

Fallbacks + Wiederholungsversuche + Timeouts + Cooldowns

Um Fallbacks einzurichten, machen Sie einfach

litellm_settings:
  fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}]

Umfasst alle Fehler (429, 500 usw.)

Über Konfiguration festlegen

model_list:
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8001
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8002
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8003
  - model_name: gpt-3.5-turbo
    litellm_params:
        model: gpt-3.5-turbo
        api_key: <my-openai-key>
  - model_name: gpt-3.5-turbo-16k
    litellm_params:
        model: gpt-3.5-turbo-16k
        api_key: <my-openai-key>

litellm_settings:
  num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
  request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout 
  fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}] # fallback to gpt-3.5-turbo if call fails num_retries 
  allowed_fails: 3 # cooldown model if it fails > 1 call in a minute. 
  cooldown_time: 30 # how long to cooldown model if fails/min > allowed_fails

Fallback auf spezifische Modell-ID

Wenn alle Modelle einer Gruppe im Cooldown sind (z. B. wegen Ratenbeschränkung), greift LiteLLM auf das Modell mit der spezifischen Modell-ID zurück.

Dies überspringt jede Cooldown-Prüfung für das Fallback-Modell.

Geben Sie die Modell-ID in model_info an

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
    model_info:
      id: my-specific-model-id # 👈 KEY CHANGE
  - model_name: gpt-4
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
  - model_name: anthropic-claude
    litellm_params:
      model: anthropic/claude-3-opus-20240229
      api_key: os.environ/ANTHROPIC_API_KEY

Hinweis: Dies greift nur auf das Modell mit der spezifischen Modell-ID zurück. Wenn Sie auf eine andere Modellgruppe zurückgreifen möchten, können Sie fallbacks=[{"gpt-4": ["anthropic-claude"]}] festlegen.

Fallbacks in der Konfiguration festlegen

litellm_settings:
  fallbacks: [{"gpt-4": ["my-specific-model-id"]}]

Testen Sie es!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
  "mock_testing_fallbacks": true
}'

Validieren Sie die Funktion, indem Sie die Antwort-Header x-litellm-model-id überprüfen

x-litellm-model-id: my-specific-model-id

Fallbacks testen!

Prüfen Sie, ob Ihre Fallbacks wie erwartet funktionieren.

Reguläre Fallbacks

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "my-bad-model",
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
  "mock_testing_fallbacks": true # 👈 KEY CHANGE
}
'

Inhaltsrichtlinien-Fallbacks

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "my-bad-model",
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
  "mock_testing_content_policy_fallbacks": true # 👈 KEY CHANGE
}
'

Kontextfenster-Fallbacks

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "my-bad-model",
  "messages": [
    {
      "role": "user",
      "content": "ping"
    }
  ],
  "mock_testing_context_window_fallbacks": true # 👈 KEY CHANGE
}
'

Kontextfenster-Fallbacks (Vorabprüfungen + Fallbacks)

Vor dem Aufruf prüfen Sie, ob ein Aufruf innerhalb des Modell-Kontextfensters liegt, mit enable_pre_call_checks: true.

Code anzeigen

1. Konfiguration einrichten

Für Azure-Deployments legen Sie das Basismodell fest. Wählen Sie das Basismodell aus dieser Liste, alle Azure-Modelle beginnen mit azure/.

Gleiche Gruppe
Kontextfenster-Fallbacks (Unterschiedliche Gruppen)

Filtern Sie ältere Instanzen eines Modells (z. B. gpt-3.5-turbo) mit kleineren Kontextfenstern

router_settings:
    enable_pre_call_checks: true # 1. Enable pre-call checks

model_list:
    - model_name: gpt-3.5-turbo
      litellm_params:
        model: azure/chatgpt-v-2
        api_base: os.environ/AZURE_API_BASE
        api_key: os.environ/AZURE_API_KEY
        api_version: "2023-07-01-preview"
      model_info:
        base_model: azure/gpt-4-1106-preview # 2. 👈 (azure-only) SET BASE MODEL
    
    - model_name: gpt-3.5-turbo
      litellm_params:
        model: gpt-3.5-turbo-1106
        api_key: os.environ/OPENAI_API_KEY

2. Proxy starten

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. Testen!

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

text = "What is the meaning of 42?" * 5000

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages = [
        {"role": "system", "content": text},
        {"role": "user", "content": "Who was Alexander?"},
    ],
)

print(response)

Greifen Sie auf größere Modelle zurück, wenn das aktuelle Modell zu klein ist.

router_settings:
    enable_pre_call_checks: true # 1. Enable pre-call checks

model_list:
    - model_name: gpt-3.5-turbo-small
      litellm_params:
        model: azure/chatgpt-v-2
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
      model_info:
      base_model: azure/gpt-4-1106-preview # 2. 👈 (azure-only) SET BASE MODEL
    
    - model_name: gpt-3.5-turbo-large
      litellm_params:
      model: gpt-3.5-turbo-1106
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-opus
    litellm_params:
      model: claude-3-opus-20240229
      api_key: os.environ/ANTHROPIC_API_KEY

litellm_settings:
  context_window_fallbacks: [{"gpt-3.5-turbo-small": ["gpt-3.5-turbo-large", "claude-opus"]}]

2. Proxy starten

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. Testen!

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

text = "What is the meaning of 42?" * 5000

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages = [
        {"role": "system", "content": text},
        {"role": "user", "content": "Who was Alexander?"},
    ],
)

print(response)

Inhaltsrichtlinien-Fallbacks

Greifen Sie über Anbieter hinweg zurück (z. B. von Azure OpenAI zu Anthropic), wenn Sie Fehler wegen Inhaltsrichtlinienverletzungen erhalten.

model_list:
    - model_name: gpt-3.5-turbo-small
      litellm_params:
        model: azure/chatgpt-v-2
        api_base: os.environ/AZURE_API_BASE
        api_key: os.environ/AZURE_API_KEY
        api_version: "2023-07-01-preview"

    - model_name: claude-opus
      litellm_params:
        model: claude-3-opus-20240229
        api_key: os.environ/ANTHROPIC_API_KEY

litellm_settings:
  content_policy_fallbacks: [{"gpt-3.5-turbo-small": ["claude-opus"]}]

Standard-Fallbacks

Sie können auch `default_fallbacks` festlegen, falls eine bestimmte Modellgruppe falsch konfiguriert / schlecht ist.

model_list:
    - model_name: gpt-3.5-turbo-small
      litellm_params:
        model: azure/chatgpt-v-2
        api_base: os.environ/AZURE_API_BASE
        api_key: os.environ/AZURE_API_KEY
        api_version: "2023-07-01-preview"

    - model_name: claude-opus
      litellm_params:
        model: claude-3-opus-20240229
        api_key: os.environ/ANTHROPIC_API_KEY

litellm_settings:
  default_fallbacks: ["claude-opus"]

Dies greift auf claude-opus zurück, falls ein Modell fehlschlägt.

Modellspezifische Fallbacks (z. B. {"gpt-3.5-turbo-small"["claude-opus"]}) überschreiben die Standard-Fallback.

EU-Regionenfilterung (Vorabprüfungen)

Vor dem Aufruf prüfen Sie, ob ein Aufruf innerhalb des Modell-Kontextfensters liegt, mit enable_pre_call_checks: true.

Legen Sie den 'region_name' der Bereitstellung fest.

Hinweis: LiteLLM kann den `region_name` für Vertex AI, Bedrock und IBM WatsonxAI automatisch anhand Ihrer litellm-Parameter ableiten. Für Azure setzen Sie litellm.enable_preview = True.

1. Konfiguration festlegen

router_settings:
    enable_pre_call_checks: true # 1. Enable pre-call checks

model_list:
- model_name: gpt-3.5-turbo
  litellm_params:
    model: azure/chatgpt-v-2
    api_base: os.environ/AZURE_API_BASE
    api_key: os.environ/AZURE_API_KEY
    api_version: "2023-07-01-preview"
    region_name: "eu" # 👈 SET EU-REGION

- model_name: gpt-3.5-turbo
  litellm_params:
    model: gpt-3.5-turbo-1106
    api_key: os.environ/OPENAI_API_KEY

- model_name: gemini-pro
  litellm_params:
    model: vertex_ai/gemini-pro-1.5
    vertex_project: adroit-crow-1234
    vertex_location: us-east1 # 👈 AUTOMATICALLY INFERS 'region_name'

2. Proxy starten

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

3. Testen!

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.with_raw_response.create(
    model="gpt-3.5-turbo",
    messages = [{"role": "user", "content": "Who was Alexander?"}]
)

print(response)

print(f"response.headers.get('x-litellm-model-api-base')")

Festlegen von Fallbacks für Wildcard-Modelle

Sie können Fallbacks für Wildcard-Modelle (z. B. azure/*) in Ihrer Konfigurationsdatei festlegen.

Konfiguration einrichten

model_list:
  - model_name: "gpt-4o"
    litellm_params:
      model: "openai/gpt-4o"
      api_key: os.environ/OPENAI_API_KEY
  - model_name: "azure/*"
    litellm_params:
      model: "azure/*"
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE

litellm_settings:
  fallbacks: [{"gpt-4o": ["azure/gpt-4o"]}]

Proxy starten

litellm --config /path/to/config.yaml

Testen Sie es!

curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [    
          {
            "type": "text",
            "text": "what color is red"
          }
        ]
      }
    ],
    "max_tokens": 300,
    "mock_testing_fallbacks": true
}'

Fallbacks deaktivieren (pro Anfrage/Schlüssel)

Pro Anfrage
Pro Schlüssel

Sie können Fallbacks pro Schlüssel deaktivieren, indem Sie disable_fallbacks: true in Ihrem Request-Body festlegen.

curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "messages": [
        {
            "role": "user",
            "content": "List 5 important events in the XIX century"
        }
    ],
    "model": "gpt-3.5-turbo",
    "disable_fallbacks": true # 👈 DISABLE FALLBACKS
}'

Sie können Fallbacks pro Schlüssel deaktivieren, indem Sie disable_fallbacks: true in Ihren Schlüssel-Metadaten festlegen.

curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
    "metadata": {
        "disable_fallbacks": true
    }
}'

Fallbacks

Schnellstart​

1. Fallbacks einrichten​

2. Proxy starten​

3. Fallbacks testen​

Erklärung​

Client-seitige Fallbacks​

Fallback-Prompts steuern​

Fallback bei Inhaltsrichtlinienverletzung​

Fallback bei überschrittenem Kontextfenster​

Erweitert​

Fallbacks + Wiederholungsversuche + Timeouts + Cooldowns​

Fallback auf spezifische Modell-ID​

Fallbacks testen!​

Reguläre Fallbacks​

Inhaltsrichtlinien-Fallbacks​

Kontextfenster-Fallbacks​

Kontextfenster-Fallbacks (Vorabprüfungen + Fallbacks)​

Inhaltsrichtlinien-Fallbacks​

Standard-Fallbacks​

EU-Regionenfilterung (Vorabprüfungen)​

Festlegen von Fallbacks für Wildcard-Modelle​

Fallbacks deaktivieren (pro Anfrage/Schlüssel)​

Schnellstart

1. Fallbacks einrichten

2. Proxy starten

3. Fallbacks testen

Erklärung

Client-seitige Fallbacks

Fallback-Prompts steuern

Fallback bei Inhaltsrichtlinienverletzung

Fallback bei überschrittenem Kontextfenster

Erweitert

Fallbacks + Wiederholungsversuche + Timeouts + Cooldowns

Fallback auf spezifische Modell-ID

Fallbacks testen!

Reguläre Fallbacks

Inhaltsrichtlinien-Fallbacks

Kontextfenster-Fallbacks

Kontextfenster-Fallbacks (Vorabprüfungen + Fallbacks)

Inhaltsrichtlinien-Fallbacks

Standard-Fallbacks

EU-Regionenfilterung (Vorabprüfungen)

Festlegen von Fallbacks für Wildcard-Modelle

Fallbacks deaktivieren (pro Anfrage/Schlüssel)