/responses[Beta]

LiteLLM bietet einen BETA-Endpunkt im Stil der /responses API von OpenAI.

Feature	Unterstützt	Notizen
Kostenverfolgung	✅	Funktioniert mit allen unterstützten Modellen
Protokollierung	✅	Funktioniert mit allen Integrationen
Endbenutzerverfolgung	✅
Streaming	✅
Fallbacks	✅	Funktioniert zwischen unterstützten Modellen
Lastverteilung	✅	Funktioniert zwischen unterstützten Modellen
Unterstützte Operationen	Eine Antwort erstellen, Eine Antwort abrufen, Eine Antwort löschen
Unterstützte LiteLLM-Versionen	1.63.8+
Unterstützte LLM-Anbieter	Alle von LiteLLM unterstützten Anbieter	`openai`, `anthropic`, `bedrock`, `vertex_ai`, `gemini`, `azure`, `azure_ai` usw.

Verwendung

LiteLLM Python SDK

OpenAI
Anthropic
Vertex AI
AWS Bedrock
Google AI Studio

Nicht-streaming

OpenAI Nicht-Streaming-Antwort
import litellm

# Non-streaming response
response = litellm.responses(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

print(response)

Streaming

OpenAI Streaming-Antwort
import litellm

# Streaming response
response = litellm.responses(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

Antwort abrufen (GET)

Antwort nach ID abrufen
import litellm

# First, create a response
response = litellm.responses(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

# Get the response ID
response_id = response.id

# Retrieve the response by ID
retrieved_response = litellm.get_responses(
    response_id=response_id
)

print(retrieved_response)

# For async usage
# retrieved_response = await litellm.aget_responses(response_id=response_id)

Antwort löschen (DELETE)

Antwort nach ID löschen
import litellm

# First, create a response
response = litellm.responses(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

# Get the response ID
response_id = response.id

# Delete the response by ID
delete_response = litellm.delete_responses(
    response_id=response_id
)

print(delete_response)

# For async usage
# delete_response = await litellm.adelete_responses(response_id=response_id)

Nicht-streaming

Anthropic Nicht-Streaming-Antwort
import litellm
import os

# Set API key
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-api-key"

# Non-streaming response
response = litellm.responses(
    model="anthropic/claude-3-5-sonnet-20240620",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

print(response)

Streaming

Anthropic Streaming-Antwort
import litellm
import os

# Set API key
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-api-key"

# Streaming response
response = litellm.responses(
    model="anthropic/claude-3-5-sonnet-20240620",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

Nicht-Streaming

Vertex AI Nicht-Streaming-Antwort
import litellm
import os

# Set credentials - Vertex AI uses application default credentials
# Run 'gcloud auth application-default login' to authenticate
os.environ["VERTEXAI_PROJECT"] = "your-gcp-project-id"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

# Non-streaming response
response = litellm.responses(
    model="vertex_ai/gemini-1.5-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

print(response)

Streaming

Vertex AI Streaming-Antwort
import litellm
import os

# Set credentials - Vertex AI uses application default credentials
# Run 'gcloud auth application-default login' to authenticate
os.environ["VERTEXAI_PROJECT"] = "your-gcp-project-id"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

# Streaming response
response = litellm.responses(
    model="vertex_ai/gemini-1.5-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

Nicht-Streaming

AWS Bedrock Nicht-Streaming-Antwort
import litellm
import os

# Set AWS credentials
os.environ["AWS_ACCESS_KEY_ID"] = "your-access-key-id"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-secret-access-key"
os.environ["AWS_REGION_NAME"] = "us-west-2"  # or your AWS region

# Non-streaming response
response = litellm.responses(
    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

print(response)

Streaming

AWS Bedrock Streaming-Antwort
import litellm
import os

# Set AWS credentials
os.environ["AWS_ACCESS_KEY_ID"] = "your-access-key-id"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-secret-access-key"
os.environ["AWS_REGION_NAME"] = "us-west-2"  # or your AWS region

# Streaming response
response = litellm.responses(
    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

Nicht-Streaming

Google AI Studio Nicht-Streaming-Antwort
import litellm
import os

# Set API key for Google AI Studio
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

# Non-streaming response
response = litellm.responses(
    model="gemini/gemini-1.5-flash",
    input="Tell me a three sentence bedtime story about a unicorn.",
    max_output_tokens=100
)

print(response)

Streaming

Google AI Studio Streaming-Antwort
import litellm
import os

# Set API key for Google AI Studio
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

# Streaming response
response = litellm.responses(
    model="gemini/gemini-1.5-flash",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

LiteLLM Proxy mit OpenAI SDK

Richten Sie zuerst Ihren LiteLLM-Proxy-Server ein und starten Sie ihn.

LiteLLM Proxy-Server starten
litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

OpenAI
Anthropic
Vertex AI
AWS Bedrock
Google AI Studio

Fügen Sie zunächst dies zu Ihrer LiteLLM Proxy config.yaml hinzu

OpenAI Proxy-Konfiguration
model_list:
  - model_name: openai/o1-pro
    litellm_params:
      model: openai/o1-pro
      api_key: os.environ/OPENAI_API_KEY

Nicht-Streaming

OpenAI Proxy Nicht-Streaming-Antwort
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

Streaming

OpenAI Proxy Streaming-Antwort
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Streaming response
response = client.responses.create(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

Antwort abrufen (GET)

Antwort mit OpenAI SDK nach ID abrufen
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# First, create a response
response = client.responses.create(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn."
)

# Get the response ID
response_id = response.id

# Retrieve the response by ID
retrieved_response = client.responses.retrieve(response_id)

print(retrieved_response)

Antwort löschen (DELETE)

Antwort mit OpenAI SDK nach ID löschen
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# First, create a response
response = client.responses.create(
    model="openai/o1-pro",
    input="Tell me a three sentence bedtime story about a unicorn."
)

# Get the response ID
response_id = response.id

# Delete the response by ID
delete_response = client.responses.delete(response_id)

print(delete_response)

Fügen Sie zunächst dies zu Ihrer LiteLLM Proxy config.yaml hinzu

Anthropic Proxy-Konfiguration
model_list:
  - model_name: anthropic/claude-3-5-sonnet-20240620
    litellm_params:
      model: anthropic/claude-3-5-sonnet-20240620
      api_key: os.environ/ANTHROPIC_API_KEY

Nicht-Streaming

Anthropic Proxy Nicht-Streaming-Antwort
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
    model="anthropic/claude-3-5-sonnet-20240620",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

Streaming

Anthropic Proxy Streaming-Antwort
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Streaming response
response = client.responses.create(
    model="anthropic/claude-3-5-sonnet-20240620",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

Fügen Sie zunächst dies zu Ihrer LiteLLM Proxy config.yaml hinzu

Vertex AI Proxy-Konfiguration
model_list:
  - model_name: vertex_ai/gemini-1.5-pro
    litellm_params:
      model: vertex_ai/gemini-1.5-pro
      vertex_project: your-gcp-project-id
      vertex_location: us-central1

Nicht-Streaming

Vertex AI Proxy Nicht-Streaming-Antwort
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
    model="vertex_ai/gemini-1.5-pro",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

Streaming

Vertex AI Proxy Streaming-Antwort
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Streaming response
response = client.responses.create(
    model="vertex_ai/gemini-1.5-pro",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

Fügen Sie zunächst dies zu Ihrer LiteLLM Proxy config.yaml hinzu

AWS Bedrock Proxy-Konfiguration
model_list:
  - model_name: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
    litellm_params:
      model: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-west-2

Nicht-Streaming

AWS Bedrock Proxy Nicht-Streaming-Antwort
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

Streaming

AWS Bedrock Proxy Streaming-Antwort
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Streaming response
response = client.responses.create(
    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

Fügen Sie zunächst dies zu Ihrer LiteLLM Proxy config.yaml hinzu

Google AI Studio Proxy-Konfiguration
model_list:
  - model_name: gemini/gemini-1.5-flash
    litellm_params:
      model: gemini/gemini-1.5-flash
      api_key: os.environ/GEMINI_API_KEY

Nicht-Streaming

Google AI Studio Proxy Nicht-Streaming-Antwort
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
    model="gemini/gemini-1.5-flash",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

Streaming

Google AI Studio Proxy Streaming-Antwort
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
    base_url="https://:4000",  # Your proxy URL
    api_key="your-api-key"             # Your proxy API key
)

# Streaming response
response = client.responses.create(
    model="gemini/gemini-1.5-flash",
    input="Tell me a three sentence bedtime story about a unicorn.",
    stream=True
)

for event in response:
    print(event)

Unterstützte Parameter der Responses API

Anbieter	Unterstützte Parameter
`openai`	Alle Parameter der Responses API werden unterstützt
`azure`	Alle Parameter der Responses API werden unterstützt
`anthropic`	Unterstützte Parameter hier einsehen
`bedrock`	Unterstützte Parameter hier einsehen
`gemini`	Unterstützte Parameter hier einsehen
`vertex_ai`	Unterstützte Parameter hier einsehen
`azure_ai`	Unterstützte Parameter hier einsehen
Alle anderen LLM API-Anbieter	Unterstützte Parameter hier einsehen

Lastverteilung mit Sitzungskontinuität.

Bei Verwendung der Responses API mit mehreren Bereitstellungen desselben Modells (z. B. mehrere Azure OpenAI-Endpunkte) bietet LiteLLM eine Sitzungskontinuität. Dies stellt sicher, dass Folgeanfragen, die eine previous_response_id verwenden, an dieselbe Bereitstellung weitergeleitet werden, die die ursprüngliche Antwort generiert hat.

Beispielverwendung

Python SDK
Proxy-Server

Python SDK mit Sitzungskontinuität
import litellm

# Set up router with multiple deployments of the same model
router = litellm.Router(
    model_list=[
        {
            "model_name": "azure-gpt4-turbo",
            "litellm_params": {
                "model": "azure/gpt-4-turbo",
                "api_key": "your-api-key-1",
                "api_version": "2024-06-01",
                "api_base": "https://endpoint1.openai.azure.com",
            },
        },
        {
            "model_name": "azure-gpt4-turbo",
            "litellm_params": {
                "model": "azure/gpt-4-turbo",
                "api_key": "your-api-key-2",
                "api_version": "2024-06-01",
                "api_base": "https://endpoint2.openai.azure.com",
            },
        },
    ],
    optional_pre_call_checks=["responses_api_deployment_check"],
)

# Initial request
response = await router.aresponses(
    model="azure-gpt4-turbo",
    input="Hello, who are you?",
    truncation="auto",
)

# Store the response ID
response_id = response.id

# Follow-up request - will be automatically routed to the same deployment
follow_up = await router.aresponses(
    model="azure-gpt4-turbo",
    input="Tell me more about yourself",
    truncation="auto",
    previous_response_id=response_id  # This ensures routing to the same deployment
)

1. Sitzungskontinuität in der Proxy config.yaml einrichten

Um die Sitzungskontinuität für die Responses API in Ihrem LiteLLM-Proxy zu aktivieren, setzen Sie optional_pre_call_checks: ["responses_api_deployment_check"] in Ihrer proxy config.yaml.

config.yaml mit Sitzungskontinuität
model_list:
  - model_name: azure-gpt4-turbo
    litellm_params:
      model: azure/gpt-4-turbo
      api_key: your-api-key-1
      api_version: 2024-06-01
      api_base: https://endpoint1.openai.azure.com
  - model_name: azure-gpt4-turbo
    litellm_params:
      model: azure/gpt-4-turbo
      api_key: your-api-key-2
      api_version: 2024-06-01
      api_base: https://endpoint2.openai.azure.com

router_settings:
  optional_pre_call_checks: ["responses_api_deployment_check"]

2. Verwenden Sie das OpenAI Python SDK, um Anfragen an den LiteLLM Proxy zu stellen

OpenAI-Client mit Proxy-Server
from openai import OpenAI

client = OpenAI(
    base_url="https://:4000",
    api_key="your-api-key"
)

# Initial request
response = client.responses.create(
    model="azure-gpt4-turbo",
    input="Hello, who are you?"
)

response_id = response.id

# Follow-up request - will be automatically routed to the same deployment
follow_up = client.responses.create(
    model="azure-gpt4-turbo",
    input="Tell me more about yourself",
    previous_response_id=response_id  # This ensures routing to the same deployment
)

Sitzungsverwaltung - Nicht-OpenAI-Modelle

Der LiteLLM Proxy unterstützt die Sitzungsverwaltung für Nicht-OpenAI-Modelle. Dies ermöglicht es Ihnen, den Konversationsverlauf (Zustand) im LiteLLM Proxy zu speichern und abzurufen.

Verwendung

Speicherung des Anfrage-/Antwortinhalts in der Datenbank aktivieren

Setzen Sie store_prompts_in_spend_logs: true in Ihrer proxy config.yaml. Wenn dies aktiviert ist, speichert LiteLLM den Anfrage- und Antwortinhalt in der Datenbank.

general_settings:
  store_prompts_in_spend_logs: true

Anfrage 1 ohne previous_response_id stellen (neue Sitzung)

Starten Sie eine neue Konversation, indem Sie eine Anfrage stellen, ohne eine vorherige Antwort-ID anzugeben.

Curl
OpenAI Python SDK

curl https://:4000/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet-latest",
    "input": "who is Michael Jordan"
  }'

from openai import OpenAI

# Initialize the client with your LiteLLM proxy URL
client = OpenAI(
    base_url="https://:4000",
    api_key="sk-1234"
)

# Make initial request to start a new conversation
response = client.responses.create(
    model="anthropic/claude-3-5-sonnet-latest",
    input="who is Michael Jordan"
)

print(response.id)  # Store this ID for future requests in same session
print(response.output[0].content[0].text)

Antwort

{
  "id":"resp_123abc",
  "model":"claude-3-5-sonnet-20241022",
  "output":[{
    "type":"message",
    "content":[{
      "type":"output_text",
      "text":"Michael Jordan is widely considered one of the greatest basketball players of all time. He played for the Chicago Bulls (1984-1993, 1995-1998) and Washington Wizards (2001-2003), winning 6 NBA Championships with the Bulls."
    }]
  }]
}

Anfrage 2 mit previous_response_id stellen (gleiche Sitzung)

Setzen Sie die Konversation fort, indem Sie die vorherige Antwort-ID referenzieren, um den Konversationskontext beizubehalten.

Curl
OpenAI Python SDK

curl https://:4000/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet-latest",
    "input": "can you tell me more about him",
    "previous_response_id": "resp_123abc"
  }'

from openai import OpenAI

# Initialize the client with your LiteLLM proxy URL
client = OpenAI(
    base_url="https://:4000",
    api_key="sk-1234"
)

# Make follow-up request in the same conversation session
follow_up_response = client.responses.create(
    model="anthropic/claude-3-5-sonnet-latest",
    input="can you tell me more about him",
    previous_response_id="resp_123abc"  # ID from the previous response
)

print(follow_up_response.output[0].content[0].text)

Antwort

{
  "id":"resp_456def",
  "model":"claude-3-5-sonnet-20241022",
  "output":[{
    "type":"message",
    "content":[{
      "type":"output_text",
      "text":"Michael Jordan was born February 17, 1963. He attended University of North Carolina before being drafted 3rd overall by the Bulls in 1984. Beyond basketball, he built the Air Jordan brand with Nike and later became owner of the Charlotte Hornets."
    }]
  }]
}

Anfrage 3 ohne previous_response_id stellen (neue Sitzung)

Starten Sie eine brandneue Konversation, ohne sich auf vorherigen Kontext zu beziehen, um zu demonstrieren, wie der Kontext zwischen den Sitzungen nicht aufrechterhalten wird.

Curl
OpenAI Python SDK

curl https://:4000/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet-latest",
    "input": "can you tell me more about him"
  }'

from openai import OpenAI

# Initialize the client with your LiteLLM proxy URL
client = OpenAI(
    base_url="https://:4000",
    api_key="sk-1234"
)

# Make a new request without previous context
new_session_response = client.responses.create(
    model="anthropic/claude-3-5-sonnet-latest",
    input="can you tell me more about him"
    # No previous_response_id means this starts a new conversation
)

print(new_session_response.output[0].content[0].text)

Antwort

{
  "id":"resp_789ghi",
  "model":"claude-3-5-sonnet-20241022",
  "output":[{
    "type":"message",
    "content":[{
      "type":"output_text",
      "text":"I don't see who you're referring to in our conversation. Could you let me know which person you'd like to learn more about?"
    }]
  }]
}

/responses[Beta]

Verwendung​

LiteLLM Python SDK​

Nicht-streaming​

Streaming​

Antwort abrufen (GET)​

Antwort löschen (DELETE)​

Nicht-streaming​

Streaming​

Nicht-Streaming​

Streaming​

Nicht-Streaming​

Streaming​

Nicht-Streaming​

Streaming​

LiteLLM Proxy mit OpenAI SDK​

Nicht-Streaming​

Streaming​

Antwort abrufen (GET)​

Antwort löschen (DELETE)​

Nicht-Streaming​

Streaming​

Nicht-Streaming​

Streaming​

Nicht-Streaming​

Streaming​

Nicht-Streaming​

Streaming​

Unterstützte Parameter der Responses API​

Lastverteilung mit Sitzungskontinuität.​

Beispielverwendung​

1. Sitzungskontinuität in der Proxy config.yaml einrichten​

2. Verwenden Sie das OpenAI Python SDK, um Anfragen an den LiteLLM Proxy zu stellen​

Sitzungsverwaltung - Nicht-OpenAI-Modelle​

Verwendung​

Verwendung

LiteLLM Python SDK

Nicht-streaming

Streaming

Antwort abrufen (GET)

Antwort löschen (DELETE)

Nicht-streaming

Streaming

Nicht-Streaming

Streaming

Nicht-Streaming

Streaming

Nicht-Streaming

Streaming

LiteLLM Proxy mit OpenAI SDK

Nicht-Streaming

Streaming

Antwort abrufen (GET)

Antwort löschen (DELETE)

Nicht-Streaming

Streaming

Nicht-Streaming

Streaming

Nicht-Streaming

Streaming

Nicht-Streaming

Streaming

Unterstützte Parameter der Responses API

Lastverteilung mit Sitzungskontinuität.

Beispielverwendung

1. Sitzungskontinuität in der Proxy config.yaml einrichten

2. Verwenden Sie das OpenAI Python SDK, um Anfragen an den LiteLLM Proxy zu stellen

Sitzungsverwaltung - Nicht-OpenAI-Modelle

Verwendung