VertexAI[Anthropic, Gemini, Model Garden]
Übersicht
| Eigenschaft | Details |
|---|---|
| Beschreibung | Vertex AI ist eine vollständig verwaltete KI-Entwicklungsplattform zum Erstellen und Nutzen generativer KI. |
| Provider-Routing in LiteLLM | vertex_ai/ |
| Link zur Anbieterdokumentation | Vertex AI ↗ |
| Basis-URL | https://{vertex_location}-aiplatform.googleapis.com/ |
| Unterstützte Operationen | /chat/completions, /completions, /embeddings, /audio/speech, /fine_tuning, /batches, /files, /images |
vertex_ai/ Route
Die vertex_ai/ Route verwendet die REST-API von VertexAI.
from litellm import completion
import json
## GET CREDENTIALS
## RUN ##
# !gcloud auth application-default login - run this to add vertex credentials to your env
## OR ##
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
## COMPLETION CALL
response = completion(
model="vertex_ai/gemini-pro",
messages=[{ "content": "Hello, how are you?","role": "user"}],
vertex_credentials=vertex_credentials_json
)
Systemnachricht
from litellm import completion
import json
## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
response = completion(
model="vertex_ai/gemini-pro",
messages=[{"content": "You are a good bot.","role": "system"}, {"content": "Hello, how are you?","role": "user"}],
vertex_credentials=vertex_credentials_json
)
Funktionsaufrufe
Erzwinge, dass Gemini Tool-Aufrufe mit tool_choice="required" tätigt.
from litellm import completion
import json
## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
messages = [
{
"role": "system",
"content": "Your name is Litellm Bot, you are a helpful assistant",
},
# User asks for their name and weather in San Francisco
{
"role": "user",
"content": "Hello, what is your name and can you tell me the weather?",
},
]
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"],
},
},
}
]
data = {
"model": "vertex_ai/gemini-1.5-pro-preview-0514"),
"messages": messages,
"tools": tools,
"tool_choice": "required",
"vertex_credentials": vertex_credentials_json
}
## COMPLETION CALL
print(completion(**data))
JSON-Schema
Ab v1.40.1+ unterstützt LiteLLM das Senden von response_schema als Parameter für Gemini-1.5-Pro auf Vertex AI. Für andere Modelle (z. B. gemini-1.5-flash oder claude-3-5-sonnet) fügt LiteLLM das Schema der Nachrichtenliste mit einem benutzergesteuerten Prompt hinzu.
Antwortschema
- SDK
- PROXY
from litellm import completion
import json
## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env
messages = [
{
"role": "user",
"content": "List 5 popular cookie recipes."
}
]
response_schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"recipe_name": {
"type": "string",
},
},
"required": ["recipe_name"],
},
}
completion(
model="vertex_ai/gemini-1.5-pro",
messages=messages,
response_format={"type": "json_object", "response_schema": response_schema} # 👈 KEY CHANGE
)
print(json.loads(completion.choices[0].message.content))
- Modell zur config.yaml hinzufügen
model_list:
- model_name: gemini-pro
litellm_params:
model: vertex_ai/gemini-1.5-pro
vertex_project: "project-id"
vertex_location: "us-central1"
vertex_credentials: "/path/to/service_account.json" # [OPTIONAL] Do this OR `!gcloud auth application-default login` - run this to add vertex credentials to your env
- Proxy starten
$ litellm --config /path/to/config.yaml
- Anfrage stellen!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
"model": "gemini-pro",
"messages": [
{"role": "user", "content": "List 5 popular cookie recipes."}
],
"response_format": {"type": "json_object", "response_schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"recipe_name": {
"type": "string",
},
},
"required": ["recipe_name"],
},
}}
}
'
Schema validieren
Um das response_schema zu validieren, setzen Sie enforce_validation: true.
- SDK
- PROXY
from litellm import completion, JSONSchemaValidationError
try:
completion(
model="vertex_ai/gemini-1.5-pro",
messages=messages,
response_format={
"type": "json_object",
"response_schema": response_schema,
"enforce_validation": true # 👈 KEY CHANGE
}
)
except JSONSchemaValidationError as e:
print("Raw Response: {}".format(e.raw_response))
raise e
- Modell zur config.yaml hinzufügen
model_list:
- model_name: gemini-pro
litellm_params:
model: vertex_ai/gemini-1.5-pro
vertex_project: "project-id"
vertex_location: "us-central1"
vertex_credentials: "/path/to/service_account.json" # [OPTIONAL] Do this OR `!gcloud auth application-default login` - run this to add vertex credentials to your env
- Proxy starten
$ litellm --config /path/to/config.yaml
- Anfrage stellen!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
"model": "gemini-pro",
"messages": [
{"role": "user", "content": "List 5 popular cookie recipes."}
],
"response_format": {"type": "json_object", "response_schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"recipe_name": {
"type": "string",
},
},
"required": ["recipe_name"],
},
},
"enforce_validation": true
}
}
'
LiteLLM validiert die Antwort anhand des Schemas und löst eine JSONSchemaValidationError aus, wenn die Antwort nicht mit dem Schema übereinstimmt.
JSONSchemaValidationError erbt von openai.APIError
Greifen Sie mit e.raw_response auf die Rohantwort zu
Selbst zum Prompt hinzufügen
from litellm import completion
## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
messages = [
{
"role": "user",
"content": """
List 5 popular cookie recipes.
Using this JSON schema:
Recipe = {"recipe_name": str}
Return a `list[Recipe]`
"""
}
]
completion(model="vertex_ai/gemini-1.5-flash-preview-0514", messages=messages, response_format={ "type": "json_object" })
Grounding - Websuche
Fügt Google-Suchergebnisse zu Vertex AI-Aufrufen hinzu.
Relevante VertexAI-Dokumentation
Sehen Sie die Grounding-Metadaten mit response_obj._hidden_params["vertex_ai_grounding_metadata"]
- SDK
- PROXY
from litellm import completion
## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env
tools = [{"googleSearch": {}}] # 👈 ADD GOOGLE SEARCH
resp = litellm.completion(
model="vertex_ai/gemini-1.0-pro-001",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=tools,
)
print(resp)
- OpenAI Python SDK
- cURL
from openai import OpenAI
client = OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)
response = client.chat.completions.create(
model="gemini-pro",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=[{"googleSearch": {}}],
)
print(response)
curl https://:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemini-pro",
"messages": [
{"role": "user", "content": "Who won the world cup?"}
],
"tools": [
{
"googleSearch": {}
}
]
}'
Sie können auch das Tool enterpriseWebSearch für eine unternehmenskonforme Suche verwenden.
- SDK
- PROXY
from litellm import completion
## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env
tools = [{"enterpriseWebSearch": {}}] # 👈 ADD GOOGLE ENTERPRISE SEARCH
resp = litellm.completion(
model="vertex_ai/gemini-1.0-pro-001",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=tools,
)
print(resp)
- OpenAI Python SDK
- cURL
from openai import OpenAI
client = OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)
response = client.chat.completions.create(
model="gemini-pro",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=[{"enterpriseWebSearch": {}}],
)
print(response)
curl https://:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemini-pro",
"messages": [
{"role": "user", "content": "Who won the world cup?"}
],
"tools": [
{
"enterpriseWebSearch": {}
}
]
}'
Umstellung vom Vertex AI SDK auf LiteLLM (GROUNDING)
Wenn dies Ihr anfänglicher VertexAI Grounding-Code war,
import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig, Tool, grounding
vertexai.init(project=project_id, location="us-central1")
model = GenerativeModel("gemini-1.5-flash-001")
# Use Google Search for grounding
tool = Tool.from_google_search_retrieval(grounding.GoogleSearchRetrieval())
prompt = "When is the next total solar eclipse in US?"
response = model.generate_content(
prompt,
tools=[tool],
generation_config=GenerationConfig(
temperature=0.0,
),
)
print(response)
dann sieht er jetzt so aus
from litellm import completion
# !gcloud auth application-default login - run this to add vertex credentials to your env
tools = [{"googleSearch": {"disable_attributon": False}}] # 👈 ADD GOOGLE SEARCH
resp = litellm.completion(
model="vertex_ai/gemini-1.0-pro-001",
messages=[{"role": "user", "content": "Who won the world cup?"}],
tools=tools,
vertex_project="project-id"
)
print(resp)
Thinking / reasoning_content
LiteLLM übersetzt reasoning_effort von OpenAI in den Parameter thinking von Gemini. Code
Mapping
| reasoning_effort | Denken |
|---|---|
| "low" | "budget_tokens": 1024 |
| "medium" | "budget_tokens": 2048 |
| "high" | "budget_tokens": 4096 |
- SDK
- PROXY
from litellm import completion
# !gcloud auth application-default login - run this to add vertex credentials to your env
resp = completion(
model="vertex_ai/gemini-2.5-flash-preview-04-17",
messages=[{"role": "user", "content": "What is the capital of France?"}],
reasoning_effort="low",
vertex_project="project-id",
vertex_location="us-central1"
)
- Konfigurieren Sie config.yaml
- model_name: gemini-2.5-flash
litellm_params:
model: vertex_ai/gemini-2.5-flash-preview-04-17
vertex_credentials: {"project_id": "project-id", "location": "us-central1", "project_key": "project-key"}
vertex_project: "project-id"
vertex_location: "us-central1"
- Proxy starten
litellm --config /path/to/config.yaml
- Testen Sie es!
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"reasoning_effort": "low"
}'
Erwartete Antwort
ModelResponse(
id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
created=1740470510,
model='claude-3-7-sonnet-20250219',
object='chat.completion',
system_fingerprint=None,
choices=[
Choices(
finish_reason='stop',
index=0,
message=Message(
content="The capital of France is Paris.",
role='assistant',
tool_calls=None,
function_call=None,
reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
),
)
],
usage=Usage(
completion_tokens=68,
prompt_tokens=42,
total_tokens=110,
completion_tokens_details=None,
prompt_tokens_details=PromptTokensDetailsWrapper(
audio_tokens=None,
cached_tokens=0,
text_tokens=None,
image_tokens=None
),
cache_creation_input_tokens=0,
cache_read_input_tokens=0
)
)
Übergeben Sie thinking an Gemini-Modelle
Sie können auch den Parameter thinking an Gemini-Modelle übergeben.
Dies wird in den thinkingConfig-Parameter von Gemini übersetzt.
- SDK
- PROXY
from litellm import completion
# !gcloud auth application-default login - run this to add vertex credentials to your env
response = litellm.completion(
model="vertex_ai/gemini-2.5-flash-preview-04-17",
messages=[{"role": "user", "content": "What is the capital of France?"}],
thinking={"type": "enabled", "budget_tokens": 1024},
vertex_project="project-id",
vertex_location="us-central1"
)
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "vertex_ai/gemini-2.5-flash-preview-04-17",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"thinking": {"type": "enabled", "budget_tokens": 1024}
}'
Kontext-Caching
Die Nutzung des Vertex AI-Kontext-Cachings wird durch direkte Aufrufe der Provider-API unterstützt. (Unified Endpoint-Unterstützung folgt in Kürze).
Voraussetzungen
pip install google-cloud-aiplatform(im Proxy-Docker-Image vorinstalliert)Authentifizierung
führen Sie
gcloud auth application-default loginaus. Siehe Google Cloud DocsAlternativ können Sie
GOOGLE_APPLICATION_CREDENTIALSsetzenHier erfahren Sie, wie: Zum Code springen
- Erstellen Sie ein Dienstkonto auf GCP
- Exportieren Sie die Anmeldeinformationen als JSON
- Laden Sie die JSON-Datei und wandeln Sie sie in einen String um.
- Speichern Sie den JSON-String in Ihrer Umgebung als
GOOGLE_APPLICATION_CREDENTIALS
Beispielverwendung
import litellm
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1" # proj location
response = litellm.completion(model="gemini-pro", messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}])
Verwendung mit LiteLLM Proxy Server
Hier erfahren Sie, wie Sie Vertex AI mit dem LiteLLM Proxy Server verwenden
Konfigurieren Sie die config.yaml
- Unterschiedlicher Standort pro Modell
- Ein Standort für alle Vertex-Modelle
Verwenden Sie dies, wenn Sie für jedes Vertex-Modell einen anderen Standort festlegen müssen
model_list:
- model_name: gemini-vision
litellm_params:
model: vertex_ai/gemini-1.0-pro-vision-001
vertex_project: "project-id"
vertex_location: "us-central1"
- model_name: gemini-vision
litellm_params:
model: vertex_ai/gemini-1.0-pro-vision-001
vertex_project: "project-id2"
vertex_location: "us-east"Verwenden Sie dies, wenn Sie einen einzigen Vertex-Standort für alle Modelle haben
litellm_settings:
vertex_project: "hardy-device-38811" # Your Project ID
vertex_location: "us-central1" # proj location
model_list:
-model_name: team1-gemini-pro
litellm_params:
model: gemini-proStarten Sie den Proxy
$ litellm --config /path/to/config.yamlAnfrage an LiteLLM Proxy Server senden
- OpenAI Python v1.0.0+
- curl
import openai
client = openai.OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)
response = client.chat.completions.create(
model="team1-gemini-pro",
messages = [
{
"role": "user",
"content": "what llm are you"
}
],
)
print(response)curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "team1-gemini-pro",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
Authentifizierung - vertex_project, vertex_location, etc.
Legen Sie Ihre Vertex-Anmeldeinformationen fest über
- dynamische Parameter ODER
- Umgebungsvariablen
Dynamische Parameter
Sie können festlegen
vertex_credentials(str) - kann ein JSON-String oder ein Dateipfad zu Ihrem Vertex AI-Dienstkonto.json seinvertex_location(str) - Ort, an dem das Vertex-Modell bereitgestellt wird (us-central1, asia-southeast1, etc.)vertex_projectOptional[str]- verwenden Sie dies, wenn das Vertex-Projekt vomjenigen in vertex_credentials abweicht
als dynamische Parameter für einen litellm.completion-Aufruf.
- SDK
- PROXY
from litellm import completion
import json
## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
response = completion(
model="vertex_ai/gemini-pro",
messages=[{"content": "You are a good bot.","role": "system"}, {"content": "Hello, how are you?","role": "user"}],
vertex_credentials=vertex_credentials_json,
vertex_project="my-special-project",
vertex_location="my-special-location"
)
model_list:
- model_name: gemini-1.5-pro
litellm_params:
model: gemini-1.5-pro
vertex_credentials: os.environ/VERTEX_FILE_PATH_ENV_VAR # os.environ["VERTEX_FILE_PATH_ENV_VAR"] = "/path/to/service_account.json"
vertex_project: "my-special-project"
vertex_location: "my-special-location:
Umgebungsvariablen
Sie können festlegen
GOOGLE_APPLICATION_CREDENTIALS- speichern Sie hier den Dateipfad zu Ihrem service_account.json (wird direkt vom Vertex SDK verwendet).- VERTEXAI_LOCATION - Ort, an dem das Vertex-Modell bereitgestellt wird (us-central1, asia-southeast1, etc.)
- VERTEXAI_PROJECT - Optional[str]- verwenden Sie dies, wenn das Vertex-Projekt vomjenigen in vertex_credentials abweicht
- GOOGLE_APPLICATION_CREDENTIALS
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service_account.json"
- VERTEXAI_LOCATION
export VERTEXAI_LOCATION="us-central1" # can be any vertex location
- VERTEXAI_PROJECT
export VERTEXAI_PROJECT="my-test-project" # ONLY use if model project is different from service account project
Festlegen von Sicherheitseinstellungen
In bestimmten Anwendungsfällen müssen Sie möglicherweise Aufrufe an die Modelle tätigen und Sicherheitseinstellungen übergeben, die von den Standardwerten abweichen. Um dies zu tun, übergeben Sie einfach das Argument safety_settings an completion oder acompletion. Zum Beispiel
Pro Modell/Anfrage festlegen
- SDK
- Proxy
response = completion(
model="vertex_ai/gemini-pro",
messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
safety_settings=[
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
},
]
)
Option 1: In der Konfiguration festlegen
model_list:
- model_name: gemini-experimental
litellm_params:
model: vertex_ai/gemini-experimental
vertex_project: litellm-epic
vertex_location: us-central1
safety_settings:
- category: HARM_CATEGORY_HARASSMENT
threshold: BLOCK_NONE
- category: HARM_CATEGORY_HATE_SPEECH
threshold: BLOCK_NONE
- category: HARM_CATEGORY_SEXUALLY_EXPLICIT
threshold: BLOCK_NONE
- category: HARM_CATEGORY_DANGEROUS_CONTENT
threshold: BLOCK_NONE
Option 2: Beim Aufruf festlegen
response = client.chat.completions.create(
model="gemini-experimental",
messages=[
{
"role": "user",
"content": "Can you write exploits?",
}
],
max_tokens=8192,
stream=False,
temperature=0.0,
extra_body={
"safety_settings": [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
},
],
}
)
Global festlegen
- SDK
- Proxy
import litellm
litellm.set_verbose = True 👈 See RAW REQUEST/RESPONSE
litellm.vertex_ai_safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
},
]
response = completion(
model="vertex_ai/gemini-pro",
messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
)
model_list:
- model_name: gemini-experimental
litellm_params:
model: vertex_ai/gemini-experimental
vertex_project: litellm-epic
vertex_location: us-central1
litellm_settings:
vertex_ai_safety_settings:
- category: HARM_CATEGORY_HARASSMENT
threshold: BLOCK_NONE
- category: HARM_CATEGORY_HATE_SPEECH
threshold: BLOCK_NONE
- category: HARM_CATEGORY_SEXUALLY_EXPLICIT
threshold: BLOCK_NONE
- category: HARM_CATEGORY_DANGEROUS_CONTENT
threshold: BLOCK_NONE
Vertex-Projekt & Vertex-Standort festlegen
Alle Aufrufe mit Vertex AI erfordern die folgenden Parameter
- Ihre Projekt-ID
import os, litellm
# set via env var
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811" # Your Project ID`
### OR ###
# set directly on module
litellm.vertex_project = "hardy-device-38811" # Your Project ID`
- Ihr Projektstandort
import os, litellm
# set via env var
os.environ["VERTEXAI_LOCATION"] = "us-central1 # Your Location
### OR ###
# set directly on module
litellm.vertex_location = "us-central1 # Your Location
Anthropic
| Modellname | Funktionsaufruf |
|---|---|
| claude-3-opus@20240229 | completion('vertex_ai/claude-3-opus@20240229', messages) |
| claude-3-5-sonnet@20240620 | completion('vertex_ai/claude-3-5-sonnet@20240620', messages) |
| claude-3-sonnet@20240229 | completion('vertex_ai/claude-3-sonnet@20240229', messages) |
| claude-3-haiku@20240307 | completion('vertex_ai/claude-3-haiku@20240307', messages) |
| claude-3-7-sonnet@20250219 | completion('vertex_ai/claude-3-7-sonnet@20250219', messages) |
Verwendung
- SDK
- Proxy
from litellm import completion
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""
model = "claude-3-sonnet@20240229"
vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]
response = completion(
model="vertex_ai/" + model,
messages=[{"role": "user", "content": "hi"}],
temperature=0.7,
vertex_ai_project=vertex_ai_project,
vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)
1. Zur Konfiguration hinzufügen
model_list:
- model_name: anthropic-vertex
litellm_params:
model: vertex_ai/claude-3-sonnet@20240229
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-east-1"
- model_name: anthropic-vertex
litellm_params:
model: vertex_ai/claude-3-sonnet@20240229
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-west-1"
2. Proxy starten
litellm --config /path/to/config.yaml
# RUNNING at http://0.0.0.0:4000
3. Testen!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "anthropic-vertex", # 👈 the 'model_name' in config
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
Verwendung - thinking / reasoning_content
- SDK
- PROXY
from litellm import completion
resp = completion(
model="vertex_ai/claude-3-7-sonnet-20250219",
messages=[{"role": "user", "content": "What is the capital of France?"}],
thinking={"type": "enabled", "budget_tokens": 1024},
)
- Konfigurieren Sie config.yaml
- model_name: claude-3-7-sonnet-20250219
litellm_params:
model: vertex_ai/claude-3-7-sonnet-20250219
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-west-1"
- Proxy starten
litellm --config /path/to/config.yaml
- Testen Sie es!
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "claude-3-7-sonnet-20250219",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"thinking": {"type": "enabled", "budget_tokens": 1024}
}'
Erwartete Antwort
ModelResponse(
id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
created=1740470510,
model='claude-3-7-sonnet-20250219',
object='chat.completion',
system_fingerprint=None,
choices=[
Choices(
finish_reason='stop',
index=0,
message=Message(
content="The capital of France is Paris.",
role='assistant',
tool_calls=None,
function_call=None,
provider_specific_fields={
'citations': None,
'thinking_blocks': [
{
'type': 'thinking',
'thinking': 'The capital of France is Paris. This is a very straightforward factual question.',
'signature': 'EuYBCkQYAiJAy6...'
}
]
}
),
thinking_blocks=[
{
'type': 'thinking',
'thinking': 'The capital of France is Paris. This is a very straightforward factual question.',
'signature': 'EuYBCkQYAiJAy6AGB...'
}
],
reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
)
],
usage=Usage(
completion_tokens=68,
prompt_tokens=42,
total_tokens=110,
completion_tokens_details=None,
prompt_tokens_details=PromptTokensDetailsWrapper(
audio_tokens=None,
cached_tokens=0,
text_tokens=None,
image_tokens=None
),
cache_creation_input_tokens=0,
cache_read_input_tokens=0
)
)
Meta/Llama API
| Modellname | Funktionsaufruf |
|---|---|
| meta/llama-3.2-90b-vision-instruct-maas | completion('vertex_ai/meta/llama-3.2-90b-vision-instruct-maas', messages) |
| meta/llama3-8b-instruct-maas | completion('vertex_ai/meta/llama3-8b-instruct-maas', messages) |
| meta/llama3-70b-instruct-maas | completion('vertex_ai/meta/llama3-70b-instruct-maas', messages) |
| meta/llama3-405b-instruct-maas | completion('vertex_ai/meta/llama3-405b-instruct-maas', messages) |
| meta/llama-4-scout-17b-16e-instruct-maas | completion('vertex_ai/meta/llama-4-scout-17b-16e-instruct-maas', messages) |
| meta/llama-4-scout-17-128e-instruct-maas | completion('vertex_ai/meta/llama-4-scout-128b-16e-instruct-maas', messages) |
| meta/llama-4-maverick-17b-128e-instruct-maas | completion('vertex_ai/meta/llama-4-maverick-17b-128e-instruct-maas',messages) |
| meta/llama-4-maverick-17b-16e-instruct-maas | completion('vertex_ai/meta/llama-4-maverick-17b-16e-instruct-maas',messages) |
Verwendung
- SDK
- Proxy
from litellm import completion
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""
model = "meta/llama3-405b-instruct-maas"
vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]
response = completion(
model="vertex_ai/" + model,
messages=[{"role": "user", "content": "hi"}],
vertex_ai_project=vertex_ai_project,
vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)
1. Zur Konfiguration hinzufügen
model_list:
- model_name: anthropic-llama
litellm_params:
model: vertex_ai/meta/llama3-405b-instruct-maas
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-east-1"
- model_name: anthropic-llama
litellm_params:
model: vertex_ai/meta/llama3-405b-instruct-maas
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-west-1"
2. Proxy starten
litellm --config /path/to/config.yaml
# RUNNING at http://0.0.0.0:4000
3. Testen!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "anthropic-llama", # 👈 the 'model_name' in config
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
Mistral API
| Modellname | Funktionsaufruf |
|---|---|
| mistral-large@latest | completion('vertex_ai/mistral-large@latest', messages) |
| mistral-large@2407 | completion('vertex_ai/mistral-large@2407', messages) |
| mistral-nemo@latest | completion('vertex_ai/mistral-nemo@latest', messages) |
| codestral@latest | completion('vertex_ai/codestral@latest', messages) |
| codestral@@2405 | completion('vertex_ai/codestral@2405', messages) |
Verwendung
- SDK
- Proxy
from litellm import completion
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""
model = "mistral-large@2407"
vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]
response = completion(
model="vertex_ai/" + model,
messages=[{"role": "user", "content": "hi"}],
vertex_ai_project=vertex_ai_project,
vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)
1. Zur Konfiguration hinzufügen
model_list:
- model_name: vertex-mistral
litellm_params:
model: vertex_ai/mistral-large@2407
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-east-1"
- model_name: vertex-mistral
litellm_params:
model: vertex_ai/mistral-large@2407
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-west-1"
2. Proxy starten
litellm --config /path/to/config.yaml
# RUNNING at http://0.0.0.0:4000
3. Testen!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "vertex-mistral", # 👈 the 'model_name' in config
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
Verwendung - Codestral FIM
Rufen Sie Codestral auf VertexAI über den /v1/completion-Endpunkt von OpenAI für FIM-Aufgaben auf.
Hinweis: Sie können Codestral auch über /chat/completion aufrufen.
- SDK
- Proxy
from litellm import completion
import os
# os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""
# OR run `!gcloud auth print-access-token` in your terminal
model = "codestral@2405"
vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]
response = text_completion(
model="vertex_ai/" + model,
vertex_ai_project=vertex_ai_project,
vertex_ai_location=vertex_ai_location,
prompt="def is_odd(n): \n return n % 2 == 1 \ndef test_is_odd():",
suffix="return True", # optional
temperature=0, # optional
top_p=1, # optional
max_tokens=10, # optional
min_tokens=10, # optional
seed=10, # optional
stop=["return"], # optional
)
print("\nModel Response", response)
1. Zur Konfiguration hinzufügen
model_list:
- model_name: vertex-codestral
litellm_params:
model: vertex_ai/codestral@2405
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-east-1"
- model_name: vertex-codestral
litellm_params:
model: vertex_ai/codestral@2405
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-west-1"
2. Proxy starten
litellm --config /path/to/config.yaml
# RUNNING at http://0.0.0.0:4000
3. Testen!
curl -X POST 'http://0.0.0.0:4000/completions' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"model": "vertex-codestral", # 👈 the 'model_name' in config
"prompt": "def is_odd(n): \n return n % 2 == 1 \ndef test_is_odd():",
"suffix":"return True", # optional
"temperature":0, # optional
"top_p":1, # optional
"max_tokens":10, # optional
"min_tokens":10, # optional
"seed":10, # optional
"stop":["return"], # optional
}'
AI21-Modelle
| Modellname | Funktionsaufruf |
|---|---|
| jamba-1.5-mini@001 | completion(model='vertex_ai/jamba-1.5-mini@001', messages) |
| jamba-1.5-large@001 | completion(model='vertex_ai/jamba-1.5-large@001', messages) |
Verwendung
- SDK
- Proxy
from litellm import completion
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""
model = "meta/jamba-1.5-mini@001"
vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]
response = completion(
model="vertex_ai/" + model,
messages=[{"role": "user", "content": "hi"}],
vertex_ai_project=vertex_ai_project,
vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)
1. Zur Konfiguration hinzufügen
model_list:
- model_name: jamba-1.5-mini
litellm_params:
model: vertex_ai/jamba-1.5-mini@001
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-east-1"
- model_name: jamba-1.5-large
litellm_params:
model: vertex_ai/jamba-1.5-large@001
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-west-1"
2. Proxy starten
litellm --config /path/to/config.yaml
# RUNNING at http://0.0.0.0:4000
3. Testen!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "jamba-1.5-large",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
Gemini Pro
| Modellname | Funktionsaufruf |
|---|---|
| gemini-pro | completion('gemini-pro', messages), completion('vertex_ai/gemini-pro', messages) |
Feinabgestimmte Modelle
Sie können feinabgestimmte Vertex AI Gemini-Modelle über LiteLLM aufrufen.
| Eigenschaft | Details |
|---|---|
| Provider-Route | vertex_ai/gemini/{MODEL_ID} |
| Vertex-Dokumentation | Vertex AI - Fine-tuned Gemini Models |
| Unterstützte Operationen | /chat/completions, /completions, /embeddings, /images |
Um ein Modell zu verwenden, das dem /gemini-Anfrage-/Antwortformat folgt, setzen Sie einfach den Modellparameter auf
model="vertex_ai/gemini/<your-finetuned-model>"
- LiteLLM Python SDK
- LiteLLM Proxy
import litellm
import os
## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = litellm.completion(
model="vertex_ai/gemini/<your-finetuned-model>", # e.g. vertex_ai/gemini/4965075652664360960
messages=[{ "content": "Hello, how are you?","role": "user"}],
)
- Vertex-Anmeldeinformationen zu Ihrer Umgebung hinzufügen
!gcloud auth application-default login
- Konfigurieren Sie config.yaml
- model_name: finetuned-gemini
litellm_params:
model: vertex_ai/gemini/<ENDPOINT_ID>
vertex_project: <PROJECT_ID>
vertex_location: <LOCATION>
- Testen Sie es!
- OpenAI Python SDK
- curl
from openai import OpenAI
client = OpenAI(
api_key="your-litellm-key",
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="finetuned-gemini",
messages=[
{"role": "user", "content": "hi"}
]
)
print(response)
curl --location 'https://0.0.0.0:4000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: <LITELLM_KEY>' \
--data '{"model": "finetuned-gemini" ,"messages":[{"role": "user", "content":[{"type": "text", "text": "hi"}]}]}'
Model Garden
Alle OpenAI-kompatiblen Modelle aus dem Vertex Model Garden werden unterstützt.
Model Garden verwenden
Fast alle Modelle aus dem Vertex Model Garden sind OpenAI-kompatibel.
- OpenAI-kompatible Modelle
- Nicht-OpenAI-kompatible Modelle
| Eigenschaft | Details |
|---|---|
| Provider-Route | vertex_ai/openai/{MODEL_ID} |
| Vertex-Dokumentation | Vertex Model Garden - OpenAI Chat Completions, Vertex Model Garden |
| Unterstützte Operationen | /chat/completions, /embeddings |
- SDK
- Proxy
from litellm import completion
import os
## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = completion(
model="vertex_ai/openai/<your-endpoint-id>",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
1. Zur Konfiguration hinzufügen
model_list:
- model_name: llama3-1-8b-instruct
litellm_params:
model: vertex_ai/openai/5464397967697903616
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-east-1"
2. Proxy starten
litellm --config /path/to/config.yaml
# RUNNING at http://0.0.0.0:4000
3. Testen!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "llama3-1-8b-instruct", # 👈 the 'model_name' in config
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
from litellm import completion
import os
## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = completion(
model="vertex_ai/<your-endpoint-id>",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
Gemini Pro Vision
| Modellname | Funktionsaufruf |
|---|---|
| gemini-pro-vision | completion('gemini-pro-vision', messages), completion('vertex_ai/gemini-pro-vision', messages) |
Gemini 1.5 Pro (und Vision)
| Modellname | Funktionsaufruf |
|---|---|
| gemini-1.5-pro | completion('gemini-1.5-pro', messages), completion('vertex_ai/gemini-1.5-pro', messages) |
| gemini-1.5-flash-preview-0514 | completion('gemini-1.5-flash-preview-0514', messages), completion('vertex_ai/gemini-1.5-flash-preview-0514', messages) |
| gemini-1.5-pro-preview-0514 | completion('gemini-1.5-pro-preview-0514', messages), completion('vertex_ai/gemini-1.5-pro-preview-0514', messages) |
Gemini Pro Vision verwenden
Rufen Sie gemini-pro-vision im gleichen Eingabe-/Ausgabeformat wie OpenAI gpt-4-vision auf.
LiteLLM unterstützt die folgenden Bildtypen, die in url übergeben werden
- Bilder mit Cloud Storage URIs - gs://cloud-samples-data/generative-ai/image/boats.jpeg
- Bilder mit direkten Links - https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg
- Videos mit Cloud Storage URIs - https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4
- Base64-kodierte lokale Bilder
Beispielanfrage - Bild-URL
- Bilder mit direkten Links
- Lokale Base64-Bilder
import litellm
response = litellm.completion(
model = "vertex_ai/gemini-pro-vision",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Whats in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
)
print(response)
import litellm
def encode_image(image_path):
import base64
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
image_path = "cached_logo.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
response = litellm.completion(
model="vertex_ai/gemini-pro-vision",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Whats in this image?"},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64," + base64_image
},
},
],
}
],
)
print(response)
Verwendung - Funktionsaufrufe
LiteLLM unterstützt Funktionsaufrufe für Vertex AI Gemini-Modelle.
from litellm import completion
import os
# set env
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ".."
os.environ["VERTEX_AI_PROJECT"] = ".."
os.environ["VERTEX_AI_LOCATION"] = ".."
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]
response = completion(
model="vertex_ai/gemini-pro-vision",
messages=messages,
tools=tools,
)
# Add any assertions, here to check response args
print(response)
assert isinstance(response.choices[0].message.tool_calls[0].function.name, str)
assert isinstance(
response.choices[0].message.tool_calls[0].function.arguments, str
)
Verwendung - PDF / Videos / Audio etc. Dateien
Übergeben Sie alle von Vertex AI unterstützten Dateien über LiteLLM.
LiteLLM unterstützt die folgenden Dateitypen, die in url übergeben werden.
Die Verwendung von file-Nachrichtentyp für VertexAI ist ab v1.65.1+ live.
Files with Cloud Storage URIs - gs://cloud-samples-data/generative-ai/image/boats.jpeg
Files with direct links - https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg
Videos with Cloud Storage URIs - https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4
Base64 Encoded Local Files
- SDK
- PROXY
Verwendung von gs:// oder einer beliebigen URL
from litellm import completion
response = completion(
model="vertex_ai/gemini-1.5-flash",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "You are a very professional document summarization specialist. Please summarize the given document."},
{
"type": "file",
"file": {
"file_id": "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
"format": "application/pdf" # OPTIONAL - specify mime-type
}
},
],
}
],
max_tokens=300,
)
print(response.choices[0])
Verwendung von base64
from litellm import completion
import base64
import requests
# URL of the file
url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
# Download the file
response = requests.get(url)
file_data = response.content
encoded_file = base64.b64encode(file_data).decode("utf-8")
response = completion(
model="vertex_ai/gemini-1.5-flash",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "You are a very professional document summarization specialist. Please summarize the given document."},
{
"type": "file",
"file": {
"file_data": f"data:application/pdf;base64,{encoded_file}", # 👈 PDF
}
},
{
"type": "audio_input",
"audio_input {
"audio_input": f"data:audio/mp3;base64,{encoded_file}", # 👈 AUDIO File ('file' message works as too)
}
},
],
}
],
max_tokens=300,
)
print(response.choices[0])
- Modell zur Konfiguration hinzufügen
- model_name: gemini-1.5-flash
litellm_params:
model: vertex_ai/gemini-1.5-flash
vertex_credentials: "/path/to/service_account.json"
- Proxy starten
litellm --config /path/to/config.yaml
- Testen Sie es!
Verwendung von gs://
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "gemini-1.5-flash",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "You are a very professional document summarization specialist. Please summarize the given document"
},
{
"type": "file",
"file": {
"file_id": "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
"format": "application/pdf" # OPTIONAL
}
}
}
]
}
],
"max_tokens": 300
}'
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "gemini-1.5-flash",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "You are a very professional document summarization specialist. Please summarize the given document"
},
{
"type": "file",
"file": {
"file_data": f"data:application/pdf;base64,{encoded_file}", # 👈 PDF
},
},
{
"type": "audio_input",
"audio_input {
"audio_input": f"data:audio/mp3;base64,{encoded_file}", # 👈 AUDIO File ('file' message works as too)
}
},
]
}
],
"max_tokens": 300
}'
Chat-Modelle
| Modellname | Funktionsaufruf |
|---|---|
| chat-bison-32k | completion('chat-bison-32k', messages) |
| chat-bison | completion('chat-bison', messages) |
| chat-bison@001 | completion('chat-bison@001', messages) |
Code Chat-Modelle
| Modellname | Funktionsaufruf |
|---|---|
| codechat-bison | completion('codechat-bison', messages) |
| codechat-bison-32k | completion('codechat-bison-32k', messages) |
| codechat-bison@001 | completion('codechat-bison@001', messages) |
Textmodelle
| Modellname | Funktionsaufruf |
|---|---|
| text-bison | completion('text-bison', messages) |
| text-bison@001 | completion('text-bison@001', messages) |
Code Text-Modelle
| Modellname | Funktionsaufruf |
|---|---|
| code-bison | completion('code-bison', messages) |
| code-bison@001 | completion('code-bison@001', messages) |
| code-gecko@001 | completion('code-gecko@001', messages) |
| code-gecko@latest | completion('code-gecko@latest', messages) |
Embedding-Modelle
Verwendung - Embedding
- SDK
- LiteLLM PROXY
import litellm
from litellm import embedding
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1" # proj location
response = embedding(
model="vertex_ai/textembedding-gecko",
input=["good morning from litellm"],
)
print(response)
- Modell zur config.yaml hinzufügen
model_list:
- model_name: snowflake-arctic-embed-m-long-1731622468876
litellm_params:
model: vertex_ai/<your-model-id>
vertex_project: "adroit-crow-413218"
vertex_location: "us-central1"
vertex_credentials: adroit-crow-413218-a956eef1a2a8.json
litellm_settings:
drop_params: True
- Proxy starten
$ litellm --config /path/to/config.yaml
- Anfrage über OpenAI Python SDK, Langchain Python SDK stellen
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
response = client.embeddings.create(
model="snowflake-arctic-embed-m-long-1731622468876",
input = ["good morning from litellm", "this is another item"],
)
print(response)
Unterstützte Embedding-Modelle
Alle Modelle, die hier aufgelistet sind, werden unterstützt.
| Modellname | Funktionsaufruf |
|---|---|
| text-embedding-004 | embedding(model="vertex_ai/text-embedding-004", input) |
| text-multilingual-embedding-002 | embedding(model="vertex_ai/text-multilingual-embedding-002", input) |
| textembedding-gecko | embedding(model="vertex_ai/textembedding-gecko", input) |
| textembedding-gecko-multilingual | embedding(model="vertex_ai/textembedding-gecko-multilingual", input) |
| textembedding-gecko-multilingual@001 | embedding(model="vertex_ai/textembedding-gecko-multilingual@001", input) |
| textembedding-gecko@001 | embedding(model="vertex_ai/textembedding-gecko@001", input) |
| textembedding-gecko@003 | embedding(model="vertex_ai/textembedding-gecko@003", input) |
| text-embedding-preview-0409 | embedding(model="vertex_ai/text-embedding-preview-0409", input) |
| text-multilingual-embedding-preview-0409 | embedding(model="vertex_ai/text-multilingual-embedding-preview-0409", input) |
| Feinabgestimmte ODER benutzerdefinierte Embedding-Modelle | embedding(model="vertex_ai/<your-model-id>", input) |
Unterstützte OpenAI (Unified) Parameter
| Parameter | Typ | Vertex-Äquivalent |
|---|---|---|
input | String oder Liste[string] | instances |
dimensions | int | output_dimensionality |
input_type | Literal["RETRIEVAL_QUERY","RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING", "FACT_VERIFICATION"] | task_type |
Verwendung mit OpenAI (Unified) Parametern
- SDK
- LiteLLM PROXY
response = litellm.embedding(
model="vertex_ai/text-embedding-004",
input=["good morning from litellm", "gm"]
input_type = "RETRIEVAL_DOCUMENT",
dimensions=1,
)
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
response = client.embeddings.create(
model="text-embedding-004",
input = ["good morning from litellm", "gm"],
dimensions=1,
extra_body = {
"input_type": "RETRIEVAL_QUERY",
}
)
print(response)
Unterstützte Vertex-spezifische Parameter
| Parameter | Typ |
|---|---|
auto_truncate | bool |
task_type | Literal["RETRIEVAL_QUERY","RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING", "FACT_VERIFICATION"] |
title | str |
Verwendung mit Vertex-spezifischen Parametern (Verwenden Sie task_type und title)
Sie können beliebige Vertex-spezifische Parameter an das Embedding-Modell übergeben. Übergeben Sie sie einfach an die Embedding-Funktion wie folgt
Relevante Vertex AI-Dokumentation mit allen Embedding-Parametern
- SDK
- LiteLLM PROXY
response = litellm.embedding(
model="vertex_ai/text-embedding-004",
input=["good morning from litellm", "gm"]
task_type = "RETRIEVAL_DOCUMENT",
title = "test",
dimensions=1,
auto_truncate=True,
)
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
response = client.embeddings.create(
model="text-embedding-004",
input = ["good morning from litellm", "gm"],
dimensions=1,
extra_body = {
"task_type": "RETRIEVAL_QUERY",
"auto_truncate": True,
"title": "test",
}
)
print(response)
Multi-modale Embeddings
Bekannte Einschränkungen
- Unterstützt nur 1 Bild / Video / Bild pro Anfrage
- Unterstützt nur GCS- oder base64-kodierte Bilder / Videos
Verwendung
- SDK
- LiteLLM PROXY (Unified Endpoint)
- LiteLLM PROXY (Vertex SDK)
Verwendung von GCS-Bildern
response = await litellm.aembedding(
model="vertex_ai/multimodalembedding@001",
input="gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png" # will be sent as a gcs image
)
Verwendung von base64-kodierten Bildern
response = await litellm.aembedding(
model="vertex_ai/multimodalembedding@001",
input="data:image/jpeg;base64,..." # will be sent as a base64 encoded image
)
- Modell zur config.yaml hinzufügen
model_list:
- model_name: multimodalembedding@001
litellm_params:
model: vertex_ai/multimodalembedding@001
vertex_project: "adroit-crow-413218"
vertex_location: "us-central1"
vertex_credentials: adroit-crow-413218-a956eef1a2a8.json
litellm_settings:
drop_params: True
- Proxy starten
$ litellm --config /path/to/config.yaml
- Anfrage über OpenAI Python SDK, Langchain Python SDK stellen
- OpenAI SDK
- Langchain
Anfragen mit GCS-Bild-/Video-URI
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
model="multimodalembedding@001",
input = "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png",
)
print(response)
Anfragen mit base64-kodierten Bildern
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
model="multimodalembedding@001",
input = "data:image/jpeg;base64,...",
)
print(response)
Anfragen mit GCS-Bild-/Video-URI
from langchain_openai import OpenAIEmbeddings
embeddings_models = "multimodalembedding@001"
embeddings = OpenAIEmbeddings(
model="multimodalembedding@001",
base_url="http://0.0.0.0:4000",
api_key="sk-1234", # type: ignore
)
query_result = embeddings.embed_query(
"gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
)
print(query_result)
Anfragen mit base64-kodierten Bildern
from langchain_openai import OpenAIEmbeddings
embeddings_models = "multimodalembedding@001"
embeddings = OpenAIEmbeddings(
model="multimodalembedding@001",
base_url="http://0.0.0.0:4000",
api_key="sk-1234", # type: ignore
)
query_result = embeddings.embed_query(
"data:image/jpeg;base64,..."
)
print(query_result)
- Modell zur config.yaml hinzufügen
default_vertex_config:
vertex_project: "adroit-crow-413218"
vertex_location: "us-central1"
vertex_credentials: adroit-crow-413218-a956eef1a2a8.json
- Proxy starten
$ litellm --config /path/to/config.yaml
- Anfrage über OpenAI Python SDK stellen
import vertexai
from vertexai.vision_models import Image, MultiModalEmbeddingModel, Video
from vertexai.vision_models import VideoSegmentConfig
from google.auth.credentials import Credentials
LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"
import datetime
class CredentialsWrapper(Credentials):
def __init__(self, token=None):
super().__init__()
self.token = token
self.expiry = None # or set to a future date if needed
def refresh(self, request):
pass
def apply(self, headers, token=None):
headers['Authorization'] = f'Bearer {self.token}'
@property
def expired(self):
return False # Always consider the token as non-expired
@property
def valid(self):
return True # Always consider the credentials as valid
credentials = CredentialsWrapper(token=LITELLM_PROXY_API_KEY)
vertexai.init(
project="adroit-crow-413218",
location="us-central1",
api_endpoint=LITELLM_PROXY_BASE,
credentials = credentials,
api_transport="rest",
)
model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding")
image = Image.load_from_file(
"gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
)
embeddings = model.get_embeddings(
image=image,
contextual_text="Colosseum",
dimension=1408,
)
print(f"Image Embedding: {embeddings.image_embedding}")
print(f"Text Embedding: {embeddings.text_embedding}")
Text + Bild + Video Embeddings
- SDK
- LiteLLM PROXY (Unified Endpoint)
Text + Bild
response = await litellm.aembedding(
model="vertex_ai/multimodalembedding@001",
input=["hey", "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"] # will be sent as a gcs image
)
Text + Video
response = await litellm.aembedding(
model="vertex_ai/multimodalembedding@001",
input=["hey", "gs://my-bucket/embeddings/supermarket-video.mp4"] # will be sent as a gcs image
)
Bild + Video
response = await litellm.aembedding(
model="vertex_ai/multimodalembedding@001",
input=["gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png", "gs://my-bucket/embeddings/supermarket-video.mp4"] # will be sent as a gcs image
)
- Modell zur config.yaml hinzufügen
model_list:
- model_name: multimodalembedding@001
litellm_params:
model: vertex_ai/multimodalembedding@001
vertex_project: "adroit-crow-413218"
vertex_location: "us-central1"
vertex_credentials: adroit-crow-413218-a956eef1a2a8.json
litellm_settings:
drop_params: True
- Proxy starten
$ litellm --config /path/to/config.yaml
- Anfrage über OpenAI Python SDK, Langchain Python SDK stellen
Text + Bild
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
model="multimodalembedding@001",
input = ["hey", "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"],
)
print(response)
Text + Video
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
model="multimodalembedding@001",
input = ["hey", "gs://my-bucket/embeddings/supermarket-video.mp4"],
)
print(response)
Bild + Video
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
model="multimodalembedding@001",
input = ["gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png", "gs://my-bucket/embeddings/supermarket-video.mp4"],
)
print(response)
Bildgenerierungsmodelle
Verwendung
response = await litellm.aimage_generation(
prompt="An olympic size swimming pool",
model="vertex_ai/imagegeneration@006",
vertex_ai_project="adroit-crow-413218",
vertex_ai_location="us-central1",
)
Mehrere Bilder generieren
Verwenden Sie den Parameter n, um anzugeben, wie viele Bilder generiert werden sollen.
response = await litellm.aimage_generation(
prompt="An olympic size swimming pool",
model="vertex_ai/imagegeneration@006",
vertex_ai_project="adroit-crow-413218",
vertex_ai_location="us-central1",
n=1,
)
Unterstützte Bildgenerierungsmodelle
| Modellname | Verwendung |
|---|---|
imagen-3.0-generate-001 | litellm.image_generation('vertex_ai/imagen-3.0-generate-001', prompt) |
imagen-3.0-fast-generate-001 | litellm.image_generation('vertex_ai/imagen-3.0-fast-generate-001', prompt) |
imagegeneration@006 | litellm.image_generation('vertex_ai/imagegeneration@006', prompt) |
imagegeneration@005 | litellm.image_generation('vertex_ai/imagegeneration@005', prompt) |
imagegeneration@002 | litellm.image_generation('vertex_ai/imagegeneration@002', prompt) |
Text-to-Speech-APIs
LiteLLM unterstützt den Aufruf der Vertex AI Text-to-Speech API im OpenAI-Format für Text-to-Speech-APIs.
Verwendung - Grundlegend
- SDK
- LiteLLM PROXY (Unified Endpoint)
Vertex AI unterstützt keinen model-Parameter, daher ist die Übergabe von model=vertex_ai/ der einzige erforderliche Parameter.
Sync-Verwendung
speech_file_path = Path(__file__).parent / "speech_vertex.mp3"
response = litellm.speech(
model="vertex_ai/",
input="hello what llm guardrail do you have",
)
response.stream_to_file(speech_file_path)
Asynchrone Verwendung
speech_file_path = Path(__file__).parent / "speech_vertex.mp3"
response = litellm.aspeech(
model="vertex_ai/",
input="hello what llm guardrail do you have",
)
response.stream_to_file(speech_file_path)
- Modell zur config.yaml hinzufügen
model_list:
- model_name: vertex-tts
litellm_params:
model: vertex_ai/ # Vertex AI does not support passing a `model` param - so passing `model=vertex_ai/` is the only required param
vertex_project: "adroit-crow-413218"
vertex_location: "us-central1"
vertex_credentials: adroit-crow-413218-a956eef1a2a8.json
litellm_settings:
drop_params: True
- Proxy starten
$ litellm --config /path/to/config.yaml
- Anfrage über OpenAI Python SDK stellen
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
# see supported values for "voice" on vertex here:
# https://console.cloud.google.com/vertex-ai/generative/speech/text-to-speech
response = client.audio.speech.create(
model = "vertex-tts",
input="the quick brown fox jumped over the lazy dogs",
voice={'languageCode': 'en-US', 'name': 'en-US-Studio-O'}
)
print("response from proxy", response)
Verwendung - ssml als Eingabe
Übergeben Sie Ihr ssml als Eingabe an den Parameter input. Wenn es <speak> enthält, wird es automatisch erkannt und als ssml an die Vertex AI API übergeben.
Wenn Sie erzwingen möchten, dass Ihre input als ssml übergeben wird, setzen Sie use_ssml=True.
- SDK
- LiteLLM PROXY (Unified Endpoint)
Vertex AI unterstützt keinen model-Parameter, daher ist die Übergabe von model=vertex_ai/ der einzige erforderliche Parameter.
speech_file_path = Path(__file__).parent / "speech_vertex.mp3"
ssml = """
<speak>
<p>Hello, world!</p>
<p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""
response = litellm.speech(
input=ssml,
model="vertex_ai/test",
voice={
"languageCode": "en-UK",
"name": "en-UK-Studio-O",
},
audioConfig={
"audioEncoding": "LINEAR22",
"speakingRate": "10",
},
)
response.stream_to_file(speech_file_path)
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
ssml = """
<speak>
<p>Hello, world!</p>
<p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""
# see supported values for "voice" on vertex here:
# https://console.cloud.google.com/vertex-ai/generative/speech/text-to-speech
response = client.audio.speech.create(
model = "vertex-tts",
input=ssml,
voice={'languageCode': 'en-US', 'name': 'en-US-Studio-O'},
)
print("response from proxy", response)
SSML-Verwendung erzwingen
Sie können die Verwendung von SSML erzwingen, indem Sie den Parameter use_ssml auf True setzen. Dies ist nützlich, wenn Sie sicherstellen möchten, dass Ihre Eingabe als SSML behandelt wird, auch wenn sie keine <speak>-Tags enthält.
Hier sind Beispiele, wie die SSML-Verwendung erzwungen wird
- SDK
- LiteLLM PROXY (Unified Endpoint)
Vertex AI unterstützt keinen model-Parameter, daher ist die Übergabe von model=vertex_ai/ der einzige erforderliche Parameter.
speech_file_path = Path(__file__).parent / "speech_vertex.mp3"
ssml = """
<speak>
<p>Hello, world!</p>
<p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""
response = litellm.speech(
input=ssml,
use_ssml=True,
model="vertex_ai/test",
voice={
"languageCode": "en-UK",
"name": "en-UK-Studio-O",
},
audioConfig={
"audioEncoding": "LINEAR22",
"speakingRate": "10",
},
)
response.stream_to_file(speech_file_path)
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
ssml = """
<speak>
<p>Hello, world!</p>
<p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""
# see supported values for "voice" on vertex here:
# https://console.cloud.google.com/vertex-ai/generative/speech/text-to-speech
response = client.audio.speech.create(
model = "vertex-tts",
input=ssml, # pass as None since OpenAI SDK requires this param
voice={'languageCode': 'en-US', 'name': 'en-US-Studio-O'},
extra_body={"use_ssml": True},
)
print("response from proxy", response)
Batch-APIs
Fügen Sie einfach die folgenden Vertex-Umgebungsvariablen zu Ihrer Umgebung hinzu.
# GCS Bucket settings, used to store batch prediction files in
export GCS_BUCKET_NAME = "litellm-testing-bucket" # the bucket you want to store batch prediction files in
export GCS_PATH_SERVICE_ACCOUNT="/path/to/service_account.json" # path to your service account json file
# Vertex /batch endpoint settings, used for LLM API requests
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service_account.json" # path to your service account json file
export VERTEXAI_LOCATION="us-central1" # can be any vertex location
export VERTEXAI_PROJECT="my-test-project"
Verwendung
1. Erstellen Sie eine Datei mit Batch-Anfragen für Vertex
LiteLLM erwartet, dass die Datei dem **OpenAI-Batches-Dateiformat** folgt.
Jeder body in der Datei sollte eine **OpenAI API-Anfrage** sein.
Erstellen Sie eine Datei namens vertex_batch_completions.jsonl im aktuellen Arbeitsverzeichnis. Das model sollte der Name des Vertex AI-Modells sein.
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash-001", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 10}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash-001", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 10}}
2. Laden Sie eine Datei mit Batch-Anfragen hoch
Für vertex_ai wird LiteLLM die Datei in den bereitgestellten GCS_BUCKET_NAME hochladen.
import os
oai_client = OpenAI(
api_key="sk-1234", # litellm proxy API key
base_url="https://:4000" # litellm proxy base url
)
file_name = "vertex_batch_completions.jsonl" #
_current_dir = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(_current_dir, file_name)
file_obj = oai_client.files.create(
file=open(file_path, "rb"),
purpose="batch",
extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use vertex_ai for this file upload
)
Erwartete Antwort
{
"id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
"bytes": 416,
"created_at": 1733392026,
"filename": "litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
"object": "file",
"purpose": "batch",
"status": "uploaded",
"status_details": null
}
3. Erstellen Sie einen Batch
batch_input_file_id = file_obj.id # use `file_obj` from step 2
create_batch_response = oai_client.batches.create(
completion_window="24h",
endpoint="/v1/chat/completions",
input_file_id=batch_input_file_id, # example input_file_id = "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/c2b1b785-252b-448c-b180-033c4c63b3ce"
extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use `vertex_ai` for this batch request
)
Erwartete Antwort
{
"id": "3814889423749775360",
"completion_window": "24hrs",
"created_at": 1733392026,
"endpoint": "",
"input_file_id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
"object": "batch",
"status": "validating",
"cancelled_at": null,
"cancelling_at": null,
"completed_at": null,
"error_file_id": null,
"errors": null,
"expired_at": null,
"expires_at": null,
"failed_at": null,
"finalizing_at": null,
"in_progress_at": null,
"metadata": null,
"output_file_id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001",
"request_counts": null
}
4. Rufen Sie einen Batch ab
retrieved_batch = oai_client.batches.retrieve(
batch_id=create_batch_response.id,
extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use `vertex_ai` for this batch request
)
Erwartete Antwort
{
"id": "3814889423749775360",
"completion_window": "24hrs",
"created_at": 1736500100,
"endpoint": "",
"input_file_id": "gs://example-bucket-1-litellm/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/7b2e47f5-3dd4-436d-920f-f9155bbdc952",
"object": "batch",
"status": "completed",
"cancelled_at": null,
"cancelling_at": null,
"completed_at": null,
"error_file_id": null,
"errors": null,
"expired_at": null,
"expires_at": null,
"failed_at": null,
"finalizing_at": null,
"in_progress_at": null,
"metadata": null,
"output_file_id": "gs://example-bucket-1-litellm/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001",
"request_counts": null
}
Fine-Tuning-APIs
| Eigenschaft | Details |
|---|---|
| Beschreibung | Erstellen Sie Fine-Tuning-Jobs in Vertex AI (/tuningJobs) mit dem OpenAI Python SDK. |
| Vertex Fine-Tuning-Dokumentation | Vertex Fine-Tuning |
Verwendung
1. Fügen Sie finetune_settings zu Ihrer config.yaml hinzu
model_list:
- model_name: gpt-4
litellm_params:
model: openai/fake
api_key: fake-key
api_base: https://exampleopenaiendpoint-production.up.railway.app/
# 👇 Key change: For /fine_tuning/jobs endpoints
finetune_settings:
- custom_llm_provider: "vertex_ai"
vertex_project: "adroit-crow-413218"
vertex_location: "us-central1"
vertex_credentials: "/Users/ishaanjaffer/Downloads/adroit-crow-413218-a956eef1a2a8.json"
2. Erstellen Sie einen Fine-Tuning-Job
- OpenAI Python SDK
- curl
ft_job = await client.fine_tuning.jobs.create(
model="gemini-1.0-pro-002", # Vertex model you want to fine-tune
training_file="gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl", # file_id from create file response
extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm proxy which provider to use
)
curl https://:4000/v1/fine_tuning/jobs \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"custom_llm_provider": "vertex_ai",
"model": "gemini-1.0-pro-002",
"training_file": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl"
}'
Fortgeschrittener Anwendungsfall - Übergabe von adapter_size an die Vertex AI API
Setzen Sie Hyperparameter wie n_epochs, learning_rate_multiplier und adapter_size. Siehe Vertex Erweiterte Hyperparameter
- OpenAI Python SDK
- curl
ft_job = client.fine_tuning.jobs.create(
model="gemini-1.0-pro-002", # Vertex model you want to fine-tune
training_file="gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl", # file_id from create file response
hyperparameters={
"n_epochs": 3, # epoch_count on Vertex
"learning_rate_multiplier": 0.1, # learning_rate_multiplier on Vertex
"adapter_size": "ADAPTER_SIZE_ONE" # type: ignore, vertex specific hyperparameter
},
extra_body={
"custom_llm_provider": "vertex_ai",
},
)
curl https://:4000/v1/fine_tuning/jobs \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"custom_llm_provider": "vertex_ai",
"model": "gemini-1.0-pro-002",
"training_file": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",
"hyperparameters": {
"n_epochs": 3,
"learning_rate_multiplier": 0.1,
"adapter_size": "ADAPTER_SIZE_ONE"
}
}'
Extra
Verwendung von GOOGLE_APPLICATION_CREDENTIALS
Hier ist der Code zum Speichern Ihrer Dienstkontodaten als Umgebungsvariable GOOGLE_APPLICATION_CREDENTIALS
import os
import tempfile
def load_vertex_ai_credentials():
# Define the path to the vertex_key.json file
print("loading vertex ai credentials")
filepath = os.path.dirname(os.path.abspath(__file__))
vertex_key_path = filepath + "/vertex_key.json"
# Read the existing content of the file or create an empty dictionary
try:
with open(vertex_key_path, "r") as file:
# Read the file content
print("Read vertexai file path")
content = file.read()
# If the file is empty or not valid JSON, create an empty dictionary
if not content or not content.strip():
service_account_key_data = {}
else:
# Attempt to load the existing JSON content
file.seek(0)
service_account_key_data = json.load(file)
except FileNotFoundError:
# If the file doesn't exist, create an empty dictionary
service_account_key_data = {}
# Create a temporary file
with tempfile.NamedTemporaryFile(mode="w+", delete=False) as temp_file:
# Write the updated content to the temporary file
json.dump(service_account_key_data, temp_file, indent=2)
# Export the temporary file as GOOGLE_APPLICATION_CREDENTIALS
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.abspath(temp_file.name)
Verwendung eines GCP-Dienstkontos
Versuchen Sie, LiteLLM auf Google Cloud Run bereitzustellen? Tutorial hier
- Ermitteln Sie das Dienstkonto, das an den Google Cloud Run-Dienst gebunden ist.
Holen Sie sich die VOLLSTÄNDIGE E-Mail-Adresse des entsprechenden Dienstkontos.
Gehen Sie als Nächstes zu IAM & Verwaltung > Ressourcen verwalten, wählen Sie Ihr übergeordnetes Projekt aus, das Ihren Google Cloud Run-Dienst enthält.
Klicken Sie auf Principal hinzufügen.
- Geben Sie das Dienstkonto als Principal und Vertex AI User als Rolle an.
Sobald dies geschehen ist, hat LiteLLM beim Bereitstellen des neuen Containers im Google Cloud Run-Dienst automatischen Zugriff auf alle Vertex AI-Endpunkte.
s/o @Darien Kindlund für dieses Tutorial