VertexAI[Anthropic, Gemini, Model Garden]

Übersicht

Eigenschaft	Details
Beschreibung	Vertex AI ist eine vollständig verwaltete KI-Entwicklungsplattform zum Erstellen und Nutzen generativer KI.
Provider-Routing in LiteLLM	`vertex_ai/`
Link zur Anbieterdokumentation	Vertex AI ↗
Basis-URL	https://{vertex_location}-aiplatform.googleapis.com/
Unterstützte Operationen	`/chat/completions`, `/completions`, `/embeddings`, `/audio/speech`, `/fine_tuning`, `/batches`, `/files`, `/images`

`vertex_ai/` Route

Die vertex_ai/ Route verwendet die REST-API von VertexAI.

from litellm import completion
import json 

## GET CREDENTIALS 
## RUN ## 
# !gcloud auth application-default login - run this to add vertex credentials to your env
## OR ## 
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
    vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)

## COMPLETION CALL 
response = completion(
  model="vertex_ai/gemini-pro",
  messages=[{ "content": "Hello, how are you?","role": "user"}],
  vertex_credentials=vertex_credentials_json
)

Systemnachricht

from litellm import completion
import json 

## GET CREDENTIALS 
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
    vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)


response = completion(
  model="vertex_ai/gemini-pro",
  messages=[{"content": "You are a good bot.","role": "system"}, {"content": "Hello, how are you?","role": "user"}], 
  vertex_credentials=vertex_credentials_json
)

Funktionsaufrufe

Erzwinge, dass Gemini Tool-Aufrufe mit tool_choice="required" tätigt.

from litellm import completion
import json 

## GET CREDENTIALS 
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
    vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)


messages = [
    {
        "role": "system",
        "content": "Your name is Litellm Bot, you are a helpful assistant",
    },
    # User asks for their name and weather in San Francisco
    {
        "role": "user",
        "content": "Hello, what is your name and can you tell me the weather?",
    },
]

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    }
                },
                "required": ["location"],
            },
        },
    }
]

data = {
    "model": "vertex_ai/gemini-1.5-pro-preview-0514"),
    "messages": messages,
    "tools": tools,
    "tool_choice": "required",
    "vertex_credentials": vertex_credentials_json
}

## COMPLETION CALL 
print(completion(**data))

JSON-Schema

Ab v1.40.1+ unterstützt LiteLLM das Senden von response_schema als Parameter für Gemini-1.5-Pro auf Vertex AI. Für andere Modelle (z. B. gemini-1.5-flash oder claude-3-5-sonnet) fügt LiteLLM das Schema der Nachrichtenliste mit einem benutzergesteuerten Prompt hinzu.

Antwortschema

SDK
PROXY

from litellm import completion 
import json 

## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env

messages = [
    {
        "role": "user",
        "content": "List 5 popular cookie recipes."
    }
]

response_schema = {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "recipe_name": {
                    "type": "string",
                },
            },
            "required": ["recipe_name"],
        },
    }


completion(
    model="vertex_ai/gemini-1.5-pro", 
    messages=messages, 
    response_format={"type": "json_object", "response_schema": response_schema} # 👈 KEY CHANGE
    )

print(json.loads(completion.choices[0].message.content))

Modell zur config.yaml hinzufügen

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: vertex_ai/gemini-1.5-pro
      vertex_project: "project-id"
      vertex_location: "us-central1"
      vertex_credentials: "/path/to/service_account.json" # [OPTIONAL] Do this OR `!gcloud auth application-default login` - run this to add vertex credentials to your env

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage stellen!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
  "model": "gemini-pro",
  "messages": [
        {"role": "user", "content": "List 5 popular cookie recipes."}
    ],
  "response_format": {"type": "json_object", "response_schema": { 
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "recipe_name": {
                    "type": "string",
                },
            },
            "required": ["recipe_name"],
        },
    }}
}
'

Schema validieren

Um das response_schema zu validieren, setzen Sie enforce_validation: true.

SDK
PROXY

from litellm import completion, JSONSchemaValidationError
try: 
    completion(
    model="vertex_ai/gemini-1.5-pro", 
    messages=messages, 
    response_format={
        "type": "json_object", 
        "response_schema": response_schema,
        "enforce_validation": true # 👈 KEY CHANGE
    }
    )
except JSONSchemaValidationError as e: 
    print("Raw Response: {}".format(e.raw_response))
    raise e

Modell zur config.yaml hinzufügen

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: vertex_ai/gemini-1.5-pro
      vertex_project: "project-id"
      vertex_location: "us-central1"
      vertex_credentials: "/path/to/service_account.json" # [OPTIONAL] Do this OR `!gcloud auth application-default login` - run this to add vertex credentials to your env

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage stellen!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-D '{
  "model": "gemini-pro",
  "messages": [
        {"role": "user", "content": "List 5 popular cookie recipes."}
    ],
  "response_format": {"type": "json_object", "response_schema": { 
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "recipe_name": {
                    "type": "string",
                },
            },
            "required": ["recipe_name"],
        },
    }, 
    "enforce_validation": true
    }
}
'

LiteLLM validiert die Antwort anhand des Schemas und löst eine JSONSchemaValidationError aus, wenn die Antwort nicht mit dem Schema übereinstimmt.

JSONSchemaValidationError erbt von openai.APIError

Greifen Sie mit e.raw_response auf die Rohantwort zu

Selbst zum Prompt hinzufügen

from litellm import completion 

## GET CREDENTIALS 
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
    vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)

messages = [
    {
        "role": "user",
        "content": """
List 5 popular cookie recipes.

Using this JSON schema:

    Recipe = {"recipe_name": str}

Return a `list[Recipe]`
        """
    }
]

completion(model="vertex_ai/gemini-1.5-flash-preview-0514", messages=messages, response_format={ "type": "json_object" })

Grounding - Websuche

Fügt Google-Suchergebnisse zu Vertex AI-Aufrufen hinzu.

Relevante VertexAI-Dokumentation

Sehen Sie die Grounding-Metadaten mit response_obj._hidden_params["vertex_ai_grounding_metadata"]

SDK
PROXY

from litellm import completion 

## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env

tools = [{"googleSearch": {}}] # 👈 ADD GOOGLE SEARCH

resp = litellm.completion(
                    model="vertex_ai/gemini-1.0-pro-001",
                    messages=[{"role": "user", "content": "Who won the world cup?"}],
                    tools=tools,
                )

print(resp)

OpenAI Python SDK
cURL

from openai import OpenAI

client = OpenAI(
    api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)

response = client.chat.completions.create(
    model="gemini-pro",
    messages=[{"role": "user", "content": "Who won the world cup?"}],
    tools=[{"googleSearch": {}}],
)

print(response)

curl https://:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gemini-pro",
    "messages": [
      {"role": "user", "content": "Who won the world cup?"}
    ],
   "tools": [
        {
            "googleSearch": {} 
        }
    ]
  }'

Sie können auch das Tool enterpriseWebSearch für eine unternehmenskonforme Suche verwenden.

SDK
PROXY

from litellm import completion 

## SETUP ENVIRONMENT
# !gcloud auth application-default login - run this to add vertex credentials to your env

tools = [{"enterpriseWebSearch": {}}] # 👈 ADD GOOGLE ENTERPRISE SEARCH

resp = litellm.completion(
                    model="vertex_ai/gemini-1.0-pro-001",
                    messages=[{"role": "user", "content": "Who won the world cup?"}],
                    tools=tools,
                )

print(resp)

OpenAI Python SDK
cURL

from openai import OpenAI

client = OpenAI(
    api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
)

response = client.chat.completions.create(
    model="gemini-pro",
    messages=[{"role": "user", "content": "Who won the world cup?"}],
    tools=[{"enterpriseWebSearch": {}}],
)

print(response)

curl https://:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gemini-pro",
    "messages": [
      {"role": "user", "content": "Who won the world cup?"}
    ],
   "tools": [
        {
            "enterpriseWebSearch": {} 
        }
    ]
  }'

Umstellung vom Vertex AI SDK auf LiteLLM (GROUNDING)

Wenn dies Ihr anfänglicher VertexAI Grounding-Code war,

import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig, Tool, grounding


vertexai.init(project=project_id, location="us-central1")

model = GenerativeModel("gemini-1.5-flash-001")

# Use Google Search for grounding
tool = Tool.from_google_search_retrieval(grounding.GoogleSearchRetrieval())

prompt = "When is the next total solar eclipse in US?"
response = model.generate_content(
    prompt,
    tools=[tool],
    generation_config=GenerationConfig(
        temperature=0.0,
    ),
)

print(response)

dann sieht er jetzt so aus

from litellm import completion


# !gcloud auth application-default login - run this to add vertex credentials to your env

tools = [{"googleSearch": {"disable_attributon": False}}] # 👈 ADD GOOGLE SEARCH

resp = litellm.completion(
                    model="vertex_ai/gemini-1.0-pro-001",
                    messages=[{"role": "user", "content": "Who won the world cup?"}],
                    tools=tools,
                    vertex_project="project-id"
                )

print(resp)

Thinking / `reasoning_content`

LiteLLM übersetzt reasoning_effort von OpenAI in den Parameter thinking von Gemini. Code

Mapping

reasoning_effort	Denken
"low"	"budget_tokens": 1024
"medium"	"budget_tokens": 2048
"high"	"budget_tokens": 4096

SDK
PROXY

from litellm import completion

# !gcloud auth application-default login - run this to add vertex credentials to your env

resp = completion(
    model="vertex_ai/gemini-2.5-flash-preview-04-17",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    reasoning_effort="low",
    vertex_project="project-id",
    vertex_location="us-central1"
)

Konfigurieren Sie config.yaml

- model_name: gemini-2.5-flash
  litellm_params:
    model: vertex_ai/gemini-2.5-flash-preview-04-17
    vertex_credentials: {"project_id": "project-id", "location": "us-central1", "project_key": "project-key"}
    vertex_project: "project-id"
    vertex_location: "us-central1"

Proxy starten

litellm --config /path/to/config.yaml

Testen Sie es!

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "reasoning_effort": "low"
  }'

Erwartete Antwort

ModelResponse(
    id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
    created=1740470510,
    model='claude-3-7-sonnet-20250219',
    object='chat.completion',
    system_fingerprint=None,
    choices=[
        Choices(
            finish_reason='stop',
            index=0,
            message=Message(
                content="The capital of France is Paris.",
                role='assistant',
                tool_calls=None,
                function_call=None,
                reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
            ),
        )
    ],
    usage=Usage(
        completion_tokens=68,
        prompt_tokens=42,
        total_tokens=110,
        completion_tokens_details=None,
        prompt_tokens_details=PromptTokensDetailsWrapper(
            audio_tokens=None,
            cached_tokens=0,
            text_tokens=None,
            image_tokens=None
        ),
        cache_creation_input_tokens=0,
        cache_read_input_tokens=0
    )
)

Übergeben Sie `thinking` an Gemini-Modelle

Sie können auch den Parameter thinking an Gemini-Modelle übergeben.

Dies wird in den thinkingConfig-Parameter von Gemini übersetzt.

SDK
PROXY

from litellm import completion

# !gcloud auth application-default login - run this to add vertex credentials to your env

response = litellm.completion(
  model="vertex_ai/gemini-2.5-flash-preview-04-17",
  messages=[{"role": "user", "content": "What is the capital of France?"}],
  thinking={"type": "enabled", "budget_tokens": 1024},
  vertex_project="project-id",
  vertex_location="us-central1"
)

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -d '{
    "model": "vertex_ai/gemini-2.5-flash-preview-04-17",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "thinking": {"type": "enabled", "budget_tokens": 1024}
  }'

Kontext-Caching

Die Nutzung des Vertex AI-Kontext-Cachings wird durch direkte Aufrufe der Provider-API unterstützt. (Unified Endpoint-Unterstützung folgt in Kürze).

Direkt zum Provider gehen

Voraussetzungen

pip install google-cloud-aiplatform (im Proxy-Docker-Image vorinstalliert)
Authentifizierung
- führen Sie gcloud auth application-default login aus. Siehe Google Cloud Docs
- Alternativ können Sie GOOGLE_APPLICATION_CREDENTIALS setzen
  Hier erfahren Sie, wie: Zum Code springen
  - Erstellen Sie ein Dienstkonto auf GCP
  - Exportieren Sie die Anmeldeinformationen als JSON
  - Laden Sie die JSON-Datei und wandeln Sie sie in einen String um.
  - Speichern Sie den JSON-String in Ihrer Umgebung als GOOGLE_APPLICATION_CREDENTIALS

Beispielverwendung

import litellm
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1"  # proj location

response = litellm.completion(model="gemini-pro", messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}])

Verwendung mit LiteLLM Proxy Server

Hier erfahren Sie, wie Sie Vertex AI mit dem LiteLLM Proxy Server verwenden

Konfigurieren Sie die config.yaml

Unterschiedlicher Standort pro Modell
Ein Standort für alle Vertex-Modelle

Verwenden Sie dies, wenn Sie für jedes Vertex-Modell einen anderen Standort festlegen müssen

model_list:
  - model_name: gemini-vision
    litellm_params:
      model: vertex_ai/gemini-1.0-pro-vision-001
      vertex_project: "project-id"
      vertex_location: "us-central1"
  - model_name: gemini-vision
    litellm_params:
      model: vertex_ai/gemini-1.0-pro-vision-001
      vertex_project: "project-id2"
      vertex_location: "us-east"

Verwenden Sie dies, wenn Sie einen einzigen Vertex-Standort für alle Modelle haben

litellm_settings: 
  vertex_project: "hardy-device-38811" # Your Project ID
  vertex_location: "us-central1" # proj location

model_list: 
  -model_name: team1-gemini-pro
  litellm_params: 
    model: gemini-pro

Starten Sie den Proxy
```
$ litellm --config /path/to/config.yaml
```

Anfrage an LiteLLM Proxy Server senden

OpenAI Python v1.0.0+
curl

import openai
client = openai.OpenAI(
    api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)

response = client.chat.completions.create(
    model="team1-gemini-pro",
    messages = [
        {
            "role": "user",
            "content": "what llm are you"
        }
    ],
)

print(response)

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "team1-gemini-pro",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ],
}'

Authentifizierung - vertex_project, vertex_location, etc.

Legen Sie Ihre Vertex-Anmeldeinformationen fest über

dynamische Parameter ODER
Umgebungsvariablen

Dynamische Parameter

Sie können festlegen

vertex_credentials (str) - kann ein JSON-String oder ein Dateipfad zu Ihrem Vertex AI-Dienstkonto.json sein
vertex_location (str) - Ort, an dem das Vertex-Modell bereitgestellt wird (us-central1, asia-southeast1, etc.)
vertex_project Optional[str]- verwenden Sie dies, wenn das Vertex-Projekt vomjenigen in vertex_credentials abweicht

als dynamische Parameter für einen litellm.completion-Aufruf.

SDK
PROXY

from litellm import completion
import json 

## GET CREDENTIALS 
file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
    vertex_credentials = json.load(file)

# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)


response = completion(
  model="vertex_ai/gemini-pro",
  messages=[{"content": "You are a good bot.","role": "system"}, {"content": "Hello, how are you?","role": "user"}], 
  vertex_credentials=vertex_credentials_json,
  vertex_project="my-special-project", 
  vertex_location="my-special-location"
)

model_list:
    - model_name: gemini-1.5-pro
      litellm_params:
        model: gemini-1.5-pro
        vertex_credentials: os.environ/VERTEX_FILE_PATH_ENV_VAR # os.environ["VERTEX_FILE_PATH_ENV_VAR"] = "/path/to/service_account.json" 
        vertex_project: "my-special-project"
        vertex_location: "my-special-location:

Umgebungsvariablen

Sie können festlegen

GOOGLE_APPLICATION_CREDENTIALS - speichern Sie hier den Dateipfad zu Ihrem service_account.json (wird direkt vom Vertex SDK verwendet).
VERTEXAI_LOCATION - Ort, an dem das Vertex-Modell bereitgestellt wird (us-central1, asia-southeast1, etc.)
VERTEXAI_PROJECT - Optional[str]- verwenden Sie dies, wenn das Vertex-Projekt vomjenigen in vertex_credentials abweicht

GOOGLE_APPLICATION_CREDENTIALS

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service_account.json"

VERTEXAI_LOCATION

export VERTEXAI_LOCATION="us-central1" # can be any vertex location

VERTEXAI_PROJECT

export VERTEXAI_PROJECT="my-test-project" # ONLY use if model project is different from service account project

Festlegen von Sicherheitseinstellungen

In bestimmten Anwendungsfällen müssen Sie möglicherweise Aufrufe an die Modelle tätigen und Sicherheitseinstellungen übergeben, die von den Standardwerten abweichen. Um dies zu tun, übergeben Sie einfach das Argument safety_settings an completion oder acompletion. Zum Beispiel

Pro Modell/Anfrage festlegen

SDK
Proxy

response = completion(
    model="vertex_ai/gemini-pro", 
    messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
    safety_settings=[
        {
            "category": "HARM_CATEGORY_HARASSMENT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_NONE",
        },
    ]
)

Option 1: In der Konfiguration festlegen

model_list:
  - model_name: gemini-experimental
    litellm_params:
      model: vertex_ai/gemini-experimental
      vertex_project: litellm-epic
      vertex_location: us-central1
      safety_settings:
      - category: HARM_CATEGORY_HARASSMENT
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_HATE_SPEECH
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_SEXUALLY_EXPLICIT
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_DANGEROUS_CONTENT
        threshold: BLOCK_NONE

Option 2: Beim Aufruf festlegen

response = client.chat.completions.create(
    model="gemini-experimental",
    messages=[
        {
            "role": "user",
            "content": "Can you write exploits?",
        }
    ],
    max_tokens=8192,
    stream=False,
    temperature=0.0,

    extra_body={
        "safety_settings": [
            {
                "category": "HARM_CATEGORY_HARASSMENT",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_HATE_SPEECH",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                "threshold": "BLOCK_NONE",
            },
        ],
    }
)

Global festlegen

SDK
Proxy

import litellm 

litellm.set_verbose = True 👈 See RAW REQUEST/RESPONSE 

litellm.vertex_ai_safety_settings = [
        {
            "category": "HARM_CATEGORY_HARASSMENT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_NONE",
        },
    ]
response = completion(
    model="vertex_ai/gemini-pro", 
    messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
)

model_list:
  - model_name: gemini-experimental
    litellm_params:
      model: vertex_ai/gemini-experimental
      vertex_project: litellm-epic
      vertex_location: us-central1

litellm_settings:
    vertex_ai_safety_settings:
      - category: HARM_CATEGORY_HARASSMENT
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_HATE_SPEECH
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_SEXUALLY_EXPLICIT
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_DANGEROUS_CONTENT
        threshold: BLOCK_NONE

Vertex-Projekt & Vertex-Standort festlegen

Alle Aufrufe mit Vertex AI erfordern die folgenden Parameter

Ihre Projekt-ID

import os, litellm 

# set via env var
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811" # Your Project ID`

### OR ###

# set directly on module 
litellm.vertex_project = "hardy-device-38811" # Your Project ID`

Ihr Projektstandort

import os, litellm 

# set via env var
os.environ["VERTEXAI_LOCATION"] = "us-central1 # Your Location

### OR ###

# set directly on module 
litellm.vertex_location = "us-central1 # Your Location

Anthropic

Modellname	Funktionsaufruf
claude-3-opus@20240229	`completion('vertex_ai/claude-3-opus@20240229', messages)`
claude-3-5-sonnet@20240620	`completion('vertex_ai/claude-3-5-sonnet@20240620', messages)`
claude-3-sonnet@20240229	`completion('vertex_ai/claude-3-sonnet@20240229', messages)`
claude-3-haiku@20240307	`completion('vertex_ai/claude-3-haiku@20240307', messages)`
claude-3-7-sonnet@20250219	`completion('vertex_ai/claude-3-7-sonnet@20250219', messages)`

Verwendung

SDK
Proxy

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "claude-3-sonnet@20240229"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
    model="vertex_ai/" + model,
    messages=[{"role": "user", "content": "hi"}],
    temperature=0.7,
    vertex_ai_project=vertex_ai_project,
    vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

1. Zur Konfiguration hinzufügen

model_list:
    - model_name: anthropic-vertex
      litellm_params:
        model: vertex_ai/claude-3-sonnet@20240229
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"
    - model_name: anthropic-vertex
      litellm_params:
        model: vertex_ai/claude-3-sonnet@20240229
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-west-1"

2. Proxy starten

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Testen!

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "anthropic-vertex", # 👈 the 'model_name' in config
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

Verwendung - `thinking` / `reasoning_content`

SDK
PROXY

from litellm import completion

resp = completion(
    model="vertex_ai/claude-3-7-sonnet-20250219",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    thinking={"type": "enabled", "budget_tokens": 1024},
)

Konfigurieren Sie config.yaml

- model_name: claude-3-7-sonnet-20250219
  litellm_params:
    model: vertex_ai/claude-3-7-sonnet-20250219
    vertex_ai_project: "my-test-project"
    vertex_ai_location: "us-west-1"

Proxy starten

litellm --config /path/to/config.yaml

Testen Sie es!

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "claude-3-7-sonnet-20250219",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "thinking": {"type": "enabled", "budget_tokens": 1024}
  }'

Erwartete Antwort

ModelResponse(
    id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
    created=1740470510,
    model='claude-3-7-sonnet-20250219',
    object='chat.completion',
    system_fingerprint=None,
    choices=[
        Choices(
            finish_reason='stop',
            index=0,
            message=Message(
                content="The capital of France is Paris.",
                role='assistant',
                tool_calls=None,
                function_call=None,
                provider_specific_fields={
                    'citations': None,
                    'thinking_blocks': [
                        {
                            'type': 'thinking',
                            'thinking': 'The capital of France is Paris. This is a very straightforward factual question.',
                            'signature': 'EuYBCkQYAiJAy6...'
                        }
                    ]
                }
            ),
            thinking_blocks=[
                {
                    'type': 'thinking',
                    'thinking': 'The capital of France is Paris. This is a very straightforward factual question.',
                    'signature': 'EuYBCkQYAiJAy6AGB...'
                }
            ],
            reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
        )
    ],
    usage=Usage(
        completion_tokens=68,
        prompt_tokens=42,
        total_tokens=110,
        completion_tokens_details=None,
        prompt_tokens_details=PromptTokensDetailsWrapper(
            audio_tokens=None,
            cached_tokens=0,
            text_tokens=None,
            image_tokens=None
        ),
        cache_creation_input_tokens=0,
        cache_read_input_tokens=0
    )
)

Meta/Llama API

Modellname	Funktionsaufruf
meta/llama-3.2-90b-vision-instruct-maas	`completion('vertex_ai/meta/llama-3.2-90b-vision-instruct-maas', messages)`
meta/llama3-8b-instruct-maas	`completion('vertex_ai/meta/llama3-8b-instruct-maas', messages)`
meta/llama3-70b-instruct-maas	`completion('vertex_ai/meta/llama3-70b-instruct-maas', messages)`
meta/llama3-405b-instruct-maas	`completion('vertex_ai/meta/llama3-405b-instruct-maas', messages)`
meta/llama-4-scout-17b-16e-instruct-maas	`completion('vertex_ai/meta/llama-4-scout-17b-16e-instruct-maas', messages)`
meta/llama-4-scout-17-128e-instruct-maas	`completion('vertex_ai/meta/llama-4-scout-128b-16e-instruct-maas', messages)`
meta/llama-4-maverick-17b-128e-instruct-maas	`completion('vertex_ai/meta/llama-4-maverick-17b-128e-instruct-maas',messages)`
meta/llama-4-maverick-17b-16e-instruct-maas	`completion('vertex_ai/meta/llama-4-maverick-17b-16e-instruct-maas',messages)`

Verwendung

SDK
Proxy

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "meta/llama3-405b-instruct-maas"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
    model="vertex_ai/" + model,
    messages=[{"role": "user", "content": "hi"}],
    vertex_ai_project=vertex_ai_project,
    vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

1. Zur Konfiguration hinzufügen

model_list:
    - model_name: anthropic-llama
      litellm_params:
        model: vertex_ai/meta/llama3-405b-instruct-maas
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"
    - model_name: anthropic-llama
      litellm_params:
        model: vertex_ai/meta/llama3-405b-instruct-maas
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-west-1"

2. Proxy starten

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Testen!

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "anthropic-llama", # 👈 the 'model_name' in config
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

Mistral API

Unterstützte OpenAI-Parameter

Modellname	Funktionsaufruf
mistral-large@latest	`completion('vertex_ai/mistral-large@latest', messages)`
mistral-large@2407	`completion('vertex_ai/mistral-large@2407', messages)`
mistral-nemo@latest	`completion('vertex_ai/mistral-nemo@latest', messages)`
codestral@latest	`completion('vertex_ai/codestral@latest', messages)`
codestral@@2405	`completion('vertex_ai/codestral@2405', messages)`

Verwendung

SDK
Proxy

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "mistral-large@2407"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
    model="vertex_ai/" + model,
    messages=[{"role": "user", "content": "hi"}],
    vertex_ai_project=vertex_ai_project,
    vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

1. Zur Konfiguration hinzufügen

model_list:
    - model_name: vertex-mistral
      litellm_params:
        model: vertex_ai/mistral-large@2407
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"
    - model_name: vertex-mistral
      litellm_params:
        model: vertex_ai/mistral-large@2407
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-west-1"

2. Proxy starten

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Testen!

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "vertex-mistral", # 👈 the 'model_name' in config
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

Verwendung - Codestral FIM

Rufen Sie Codestral auf VertexAI über den /v1/completion-Endpunkt von OpenAI für FIM-Aufgaben auf.

Hinweis: Sie können Codestral auch über /chat/completion aufrufen.

SDK
Proxy

from litellm import completion
import os

# os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""
# OR run `!gcloud auth print-access-token` in your terminal

model = "codestral@2405"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = text_completion(
    model="vertex_ai/" + model,
    vertex_ai_project=vertex_ai_project,
    vertex_ai_location=vertex_ai_location,
    prompt="def is_odd(n): \n return n % 2 == 1 \ndef test_is_odd():", 
    suffix="return True",                                              # optional
    temperature=0,                                                     # optional
    top_p=1,                                                           # optional
    max_tokens=10,                                                     # optional
    min_tokens=10,                                                     # optional
    seed=10,                                                           # optional
    stop=["return"],                                                   # optional
)

print("\nModel Response", response)

1. Zur Konfiguration hinzufügen

model_list:
    - model_name: vertex-codestral
      litellm_params:
        model: vertex_ai/codestral@2405
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"
    - model_name: vertex-codestral
      litellm_params:
        model: vertex_ai/codestral@2405
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-west-1"

2. Proxy starten

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Testen!

curl -X POST 'http://0.0.0.0:4000/completions' \
      -H 'Authorization: Bearer sk-1234' \
      -H 'Content-Type: application/json' \
      -d '{
            "model": "vertex-codestral", # 👈 the 'model_name' in config
            "prompt": "def is_odd(n): \n return n % 2 == 1 \ndef test_is_odd():", 
            "suffix":"return True",                                              # optional
            "temperature":0,                                                     # optional
            "top_p":1,                                                           # optional
            "max_tokens":10,                                                     # optional
            "min_tokens":10,                                                     # optional
            "seed":10,                                                           # optional
            "stop":["return"],                                                   # optional
        }'

AI21-Modelle

Modellname	Funktionsaufruf
jamba-1.5-mini@001	`completion(model='vertex_ai/jamba-1.5-mini@001', messages)`
jamba-1.5-large@001	`completion(model='vertex_ai/jamba-1.5-large@001', messages)`

Verwendung

SDK
Proxy

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "meta/jamba-1.5-mini@001"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
    model="vertex_ai/" + model,
    messages=[{"role": "user", "content": "hi"}],
    vertex_ai_project=vertex_ai_project,
    vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

1. Zur Konfiguration hinzufügen

model_list:
    - model_name: jamba-1.5-mini
      litellm_params:
        model: vertex_ai/jamba-1.5-mini@001
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"
    - model_name: jamba-1.5-large
      litellm_params:
        model: vertex_ai/jamba-1.5-large@001
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-west-1"

2. Proxy starten

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Testen!

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "jamba-1.5-large",
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

Gemini Pro

Modellname	Funktionsaufruf
gemini-pro	`completion('gemini-pro', messages)`, `completion('vertex_ai/gemini-pro', messages)`

Feinabgestimmte Modelle

Sie können feinabgestimmte Vertex AI Gemini-Modelle über LiteLLM aufrufen.

Eigenschaft	Details
Provider-Route	`vertex_ai/gemini/{MODEL_ID}`
Vertex-Dokumentation	Vertex AI - Fine-tuned Gemini Models
Unterstützte Operationen	`/chat/completions`, `/completions`, `/embeddings`, `/images`

Um ein Modell zu verwenden, das dem /gemini-Anfrage-/Antwortformat folgt, setzen Sie einfach den Modellparameter auf

Modellparameter für den Aufruf von feinabgestimmten Gemini-Modellen
model="vertex_ai/gemini/<your-finetuned-model>"

LiteLLM Python SDK
LiteLLM Proxy

Beispiel
import litellm
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = litellm.completion(
  model="vertex_ai/gemini/<your-finetuned-model>",  # e.g. vertex_ai/gemini/4965075652664360960
  messages=[{ "content": "Hello, how are you?","role": "user"}],
)

Vertex-Anmeldeinformationen zu Ihrer Umgebung hinzufügen

Authentifizierung bei Vertex AI

!gcloud auth application-default login

Konfigurieren Sie config.yaml

Zur LiteLLM-Konfiguration hinzufügen
- model_name: finetuned-gemini
  litellm_params:
    model: vertex_ai/gemini/<ENDPOINT_ID>
    vertex_project: <PROJECT_ID>
    vertex_location: <LOCATION>

Testen Sie es!

OpenAI Python SDK
curl

Beispielanfrage
from openai import OpenAI

client = OpenAI(
    api_key="your-litellm-key",
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="finetuned-gemini",
    messages=[
        {"role": "user", "content": "hi"}
    ]
)
print(response)

Beispielanfrage
curl --location 'https://0.0.0.0:4000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: <LITELLM_KEY>' \
--data '{"model": "finetuned-gemini" ,"messages":[{"role": "user", "content":[{"type": "text", "text": "hi"}]}]}'

Model Garden

Tipp

Alle OpenAI-kompatiblen Modelle aus dem Vertex Model Garden werden unterstützt.

Model Garden verwenden

Fast alle Modelle aus dem Vertex Model Garden sind OpenAI-kompatibel.

OpenAI-kompatible Modelle
Nicht-OpenAI-kompatible Modelle

Eigenschaft	Details
Provider-Route	`vertex_ai/openai/{MODEL_ID}`
Vertex-Dokumentation	Vertex Model Garden - OpenAI Chat Completions, Vertex Model Garden
Unterstützte Operationen	`/chat/completions`, `/embeddings`

SDK
Proxy

from litellm import completion
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = completion(
  model="vertex_ai/openai/<your-endpoint-id>", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)

1. Zur Konfiguration hinzufügen

model_list:
    - model_name: llama3-1-8b-instruct
      litellm_params:
        model: vertex_ai/openai/5464397967697903616
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"

2. Proxy starten

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Testen!

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "llama3-1-8b-instruct", # 👈 the 'model_name' in config
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

from litellm import completion
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = completion(
  model="vertex_ai/<your-endpoint-id>", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)

Gemini Pro Vision

Modellname	Funktionsaufruf
gemini-pro-vision	`completion('gemini-pro-vision', messages)`, `completion('vertex_ai/gemini-pro-vision', messages)`

Gemini 1.5 Pro (und Vision)

Modellname	Funktionsaufruf
gemini-1.5-pro	`completion('gemini-1.5-pro', messages)`, `completion('vertex_ai/gemini-1.5-pro', messages)`
gemini-1.5-flash-preview-0514	`completion('gemini-1.5-flash-preview-0514', messages)`, `completion('vertex_ai/gemini-1.5-flash-preview-0514', messages)`
gemini-1.5-pro-preview-0514	`completion('gemini-1.5-pro-preview-0514', messages)`, `completion('vertex_ai/gemini-1.5-pro-preview-0514', messages)`

Gemini Pro Vision verwenden

Rufen Sie gemini-pro-vision im gleichen Eingabe-/Ausgabeformat wie OpenAI gpt-4-vision auf.

LiteLLM unterstützt die folgenden Bildtypen, die in url übergeben werden

Bilder mit Cloud Storage URIs - gs://cloud-samples-data/generative-ai/image/boats.jpeg
Bilder mit direkten Links - https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg
Videos mit Cloud Storage URIs - https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4
Base64-kodierte lokale Bilder

Beispielanfrage - Bild-URL

Bilder mit direkten Links
Lokale Base64-Bilder

import litellm

response = litellm.completion(
  model = "vertex_ai/gemini-pro-vision",
  messages=[
      {
          "role": "user",
          "content": [
                          {
                              "type": "text",
                              "text": "Whats in this image?"
                          },
                          {
                              "type": "image_url",
                              "image_url": {
                              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                              }
                          }
                      ]
      }
  ],
)
print(response)

import litellm

def encode_image(image_path):
    import base64

    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

image_path = "cached_logo.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
response = litellm.completion(
    model="vertex_ai/gemini-pro-vision",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Whats in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "data:image/jpeg;base64," + base64_image
                    },
                },
            ],
        }
    ],
)
print(response)

Verwendung - Funktionsaufrufe

LiteLLM unterstützt Funktionsaufrufe für Vertex AI Gemini-Modelle.

from litellm import completion
import os
# set env
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ".."
os.environ["VERTEX_AI_PROJECT"] = ".."
os.environ["VERTEX_AI_LOCATION"] = ".."

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    }
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]

response = completion(
    model="vertex_ai/gemini-pro-vision",
    messages=messages,
    tools=tools,
)
# Add any assertions, here to check response args
print(response)
assert isinstance(response.choices[0].message.tool_calls[0].function.name, str)
assert isinstance(
    response.choices[0].message.tool_calls[0].function.arguments, str
)

Verwendung - PDF / Videos / Audio etc. Dateien

Übergeben Sie alle von Vertex AI unterstützten Dateien über LiteLLM.

LiteLLM unterstützt die folgenden Dateitypen, die in url übergeben werden.

Die Verwendung von file-Nachrichtentyp für VertexAI ist ab v1.65.1+ live.

Files with Cloud Storage URIs - gs://cloud-samples-data/generative-ai/image/boats.jpeg
Files with direct links - https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg
Videos with Cloud Storage URIs - https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4
Base64 Encoded Local Files

SDK
PROXY

Verwendung von `gs://` oder einer beliebigen URL

from litellm import completion

response = completion(
    model="vertex_ai/gemini-1.5-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "You are a very professional document summarization specialist. Please summarize the given document."},
                {
                    "type": "file",
                    "file": {
                        "file_id": "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
                        "format": "application/pdf" # OPTIONAL - specify mime-type
                    }
                },
            ],
        }
    ],
    max_tokens=300,
)

print(response.choices[0])

Verwendung von base64

from litellm import completion
import base64
import requests

# URL of the file
url = "https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf"

# Download the file
response = requests.get(url)
file_data = response.content

encoded_file = base64.b64encode(file_data).decode("utf-8")

response = completion(
    model="vertex_ai/gemini-1.5-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "You are a very professional document summarization specialist. Please summarize the given document."},
                {
                    "type": "file",
                    "file": {
                        "file_data": f"data:application/pdf;base64,{encoded_file}", # 👈 PDF
                    }  
                },
                {
                    "type": "audio_input",
                    "audio_input {
                        "audio_input": f"data:audio/mp3;base64,{encoded_file}", # 👈 AUDIO File ('file' message works as too)
                    }  
                },
            ],
        }
    ],
    max_tokens=300,
)

print(response.choices[0])

Modell zur Konfiguration hinzufügen

- model_name: gemini-1.5-flash
  litellm_params:
    model: vertex_ai/gemini-1.5-flash
    vertex_credentials: "/path/to/service_account.json"

Proxy starten

litellm --config /path/to/config.yaml

Testen Sie es!

Verwendung von gs://

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "gemini-1.5-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "You are a very professional document summarization specialist. Please summarize the given document"
          },
          {
                "type": "file",
                "file": {
                    "file_id": "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
                    "format": "application/pdf" # OPTIONAL
                }
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "gemini-1.5-flash",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "You are a very professional document summarization specialist. Please summarize the given document"
          },
          {
                "type": "file",
                "file": {
                    "file_data": f"data:application/pdf;base64,{encoded_file}", # 👈 PDF
                },
            },
            {
                "type": "audio_input",
                "audio_input {
                    "audio_input": f"data:audio/mp3;base64,{encoded_file}", # 👈 AUDIO File ('file' message works as too)
                }  
            },
    ]
      }
    ],
    "max_tokens": 300
  }'

Chat-Modelle

Modellname	Funktionsaufruf
chat-bison-32k	`completion('chat-bison-32k', messages)`
chat-bison	`completion('chat-bison', messages)`
chat-bison@001	`completion('chat-bison@001', messages)`

Code Chat-Modelle

Modellname	Funktionsaufruf
codechat-bison	`completion('codechat-bison', messages)`
codechat-bison-32k	`completion('codechat-bison-32k', messages)`
codechat-bison@001	`completion('codechat-bison@001', messages)`

Textmodelle

Modellname	Funktionsaufruf
text-bison	`completion('text-bison', messages)`
text-bison@001	`completion('text-bison@001', messages)`

Code Text-Modelle

Modellname	Funktionsaufruf
code-bison	`completion('code-bison', messages)`
code-bison@001	`completion('code-bison@001', messages)`
code-gecko@001	`completion('code-gecko@001', messages)`
code-gecko@latest	`completion('code-gecko@latest', messages)`

Embedding-Modelle

Verwendung - Embedding

SDK
LiteLLM PROXY

import litellm
from litellm import embedding
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1"  # proj location

response = embedding(
    model="vertex_ai/textembedding-gecko",
    input=["good morning from litellm"],
)
print(response)

Modell zur config.yaml hinzufügen

model_list:
  - model_name: snowflake-arctic-embed-m-long-1731622468876
    litellm_params:
      model: vertex_ai/<your-model-id>
      vertex_project: "adroit-crow-413218"
      vertex_location: "us-central1"
      vertex_credentials: adroit-crow-413218-a956eef1a2a8.json 

litellm_settings:
  drop_params: True

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage über OpenAI Python SDK, Langchain Python SDK stellen

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

response = client.embeddings.create(
    model="snowflake-arctic-embed-m-long-1731622468876", 
    input = ["good morning from litellm", "this is another item"],
)

print(response)

Unterstützte Embedding-Modelle

Alle Modelle, die hier aufgelistet sind, werden unterstützt.

Modellname	Funktionsaufruf
text-embedding-004	`embedding(model="vertex_ai/text-embedding-004", input)`
text-multilingual-embedding-002	`embedding(model="vertex_ai/text-multilingual-embedding-002", input)`
textembedding-gecko	`embedding(model="vertex_ai/textembedding-gecko", input)`
textembedding-gecko-multilingual	`embedding(model="vertex_ai/textembedding-gecko-multilingual", input)`
textembedding-gecko-multilingual@001	`embedding(model="vertex_ai/textembedding-gecko-multilingual@001", input)`
textembedding-gecko@001	`embedding(model="vertex_ai/textembedding-gecko@001", input)`
textembedding-gecko@003	`embedding(model="vertex_ai/textembedding-gecko@003", input)`
text-embedding-preview-0409	`embedding(model="vertex_ai/text-embedding-preview-0409", input)`
text-multilingual-embedding-preview-0409	`embedding(model="vertex_ai/text-multilingual-embedding-preview-0409", input)`
Feinabgestimmte ODER benutzerdefinierte Embedding-Modelle	`embedding(model="vertex_ai/<your-model-id>", input)`

Unterstützte OpenAI (Unified) Parameter

Parameter	Typ	Vertex-Äquivalent
`input`	String oder Liste[string]	`instances`
`dimensions`	int	`output_dimensionality`
`input_type`	Literal["RETRIEVAL_QUERY","RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING", "FACT_VERIFICATION"]	`task_type`

Verwendung mit OpenAI (Unified) Parametern

SDK
LiteLLM PROXY

response = litellm.embedding(
    model="vertex_ai/text-embedding-004",
    input=["good morning from litellm", "gm"]
    input_type = "RETRIEVAL_DOCUMENT",
    dimensions=1,
)

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

response = client.embeddings.create(
    model="text-embedding-004", 
    input = ["good morning from litellm", "gm"],
    dimensions=1,
    extra_body = {
        "input_type": "RETRIEVAL_QUERY",
    }
)

print(response)

Unterstützte Vertex-spezifische Parameter

Parameter	Typ
`auto_truncate`	bool
`task_type`	Literal["RETRIEVAL_QUERY","RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", "CLASSIFICATION", "CLUSTERING", "QUESTION_ANSWERING", "FACT_VERIFICATION"]
`title`	str

Verwendung mit Vertex-spezifischen Parametern (Verwenden Sie `task_type` und `title`)

Sie können beliebige Vertex-spezifische Parameter an das Embedding-Modell übergeben. Übergeben Sie sie einfach an die Embedding-Funktion wie folgt

Relevante Vertex AI-Dokumentation mit allen Embedding-Parametern

SDK
LiteLLM PROXY

response = litellm.embedding(
    model="vertex_ai/text-embedding-004",
    input=["good morning from litellm", "gm"]
    task_type = "RETRIEVAL_DOCUMENT",
    title = "test",
    dimensions=1,
    auto_truncate=True,
)

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

response = client.embeddings.create(
    model="text-embedding-004", 
    input = ["good morning from litellm", "gm"],
    dimensions=1,
    extra_body = {
        "task_type": "RETRIEVAL_QUERY",
        "auto_truncate": True,
        "title": "test",
    }
)

print(response)

Bekannte Einschränkungen

Unterstützt nur 1 Bild / Video / Bild pro Anfrage
Unterstützt nur GCS- oder base64-kodierte Bilder / Videos

Verwendung

SDK
LiteLLM PROXY (Unified Endpoint)
LiteLLM PROXY (Vertex SDK)

Verwendung von GCS-Bildern

response = await litellm.aembedding(
    model="vertex_ai/multimodalembedding@001",
    input="gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png" # will be sent as a gcs image
)

Verwendung von base64-kodierten Bildern

response = await litellm.aembedding(
    model="vertex_ai/multimodalembedding@001",
    input="data:image/jpeg;base64,..." # will be sent as a base64 encoded image
)

Modell zur config.yaml hinzufügen

model_list:
  - model_name: multimodalembedding@001
    litellm_params:
      model: vertex_ai/multimodalembedding@001
      vertex_project: "adroit-crow-413218"
      vertex_location: "us-central1"
      vertex_credentials: adroit-crow-413218-a956eef1a2a8.json 

litellm_settings:
  drop_params: True

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage über OpenAI Python SDK, Langchain Python SDK stellen

OpenAI SDK
Langchain

Anfragen mit GCS-Bild-/Video-URI

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
    model="multimodalembedding@001", 
    input = "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png",
)

print(response)

Anfragen mit base64-kodierten Bildern

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
    model="multimodalembedding@001", 
    input = "data:image/jpeg;base64,...",
)

print(response)

Anfragen mit GCS-Bild-/Video-URI

from langchain_openai import OpenAIEmbeddings

embeddings_models = "multimodalembedding@001"

embeddings = OpenAIEmbeddings(
    model="multimodalembedding@001",
    base_url="http://0.0.0.0:4000",
    api_key="sk-1234",  # type: ignore
)


query_result = embeddings.embed_query(
    "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
)
print(query_result)

Anfragen mit base64-kodierten Bildern

from langchain_openai import OpenAIEmbeddings

embeddings_models = "multimodalembedding@001"

embeddings = OpenAIEmbeddings(
    model="multimodalembedding@001",
    base_url="http://0.0.0.0:4000",
    api_key="sk-1234",  # type: ignore
)


query_result = embeddings.embed_query(
    "data:image/jpeg;base64,..."
)
print(query_result)

Modell zur config.yaml hinzufügen

default_vertex_config:
  vertex_project: "adroit-crow-413218"
  vertex_location: "us-central1"
  vertex_credentials: adroit-crow-413218-a956eef1a2a8.json 

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage über OpenAI Python SDK stellen

import vertexai

from vertexai.vision_models import Image, MultiModalEmbeddingModel, Video
from vertexai.vision_models import VideoSegmentConfig
from google.auth.credentials import Credentials


LITELLM_PROXY_API_KEY = "sk-1234"
LITELLM_PROXY_BASE = "http://0.0.0.0:4000/vertex-ai"

import datetime

class CredentialsWrapper(Credentials):
    def __init__(self, token=None):
        super().__init__()
        self.token = token
        self.expiry = None  # or set to a future date if needed
        
    def refresh(self, request):
        pass
    
    def apply(self, headers, token=None):
        headers['Authorization'] = f'Bearer {self.token}'

    @property
    def expired(self):
        return False  # Always consider the token as non-expired

    @property
    def valid(self):
        return True  # Always consider the credentials as valid

credentials = CredentialsWrapper(token=LITELLM_PROXY_API_KEY)

vertexai.init(
    project="adroit-crow-413218",
    location="us-central1",
    api_endpoint=LITELLM_PROXY_BASE,
    credentials = credentials,
    api_transport="rest",
   
)

model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding")
image = Image.load_from_file(
    "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"
)

embeddings = model.get_embeddings(
    image=image,
    contextual_text="Colosseum",
    dimension=1408,
)
print(f"Image Embedding: {embeddings.image_embedding}")
print(f"Text Embedding: {embeddings.text_embedding}")

Text + Bild + Video Embeddings

SDK
LiteLLM PROXY (Unified Endpoint)

Text + Bild

response = await litellm.aembedding(
    model="vertex_ai/multimodalembedding@001",
    input=["hey", "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"] # will be sent as a gcs image
)

Text + Video

response = await litellm.aembedding(
    model="vertex_ai/multimodalembedding@001",
    input=["hey", "gs://my-bucket/embeddings/supermarket-video.mp4"] # will be sent as a gcs image
)

Bild + Video

response = await litellm.aembedding(
    model="vertex_ai/multimodalembedding@001",
    input=["gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png", "gs://my-bucket/embeddings/supermarket-video.mp4"] # will be sent as a gcs image
)

Modell zur config.yaml hinzufügen

model_list:
  - model_name: multimodalembedding@001
    litellm_params:
      model: vertex_ai/multimodalembedding@001
      vertex_project: "adroit-crow-413218"
      vertex_location: "us-central1"
      vertex_credentials: adroit-crow-413218-a956eef1a2a8.json 

litellm_settings:
  drop_params: True

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage über OpenAI Python SDK, Langchain Python SDK stellen

Text + Bild

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
    model="multimodalembedding@001", 
    input = ["hey", "gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"],
)

print(response)

Text + Video

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
    model="multimodalembedding@001", 
    input = ["hey", "gs://my-bucket/embeddings/supermarket-video.mp4"],
)

print(response)

Bild + Video

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# # request sent to model set on litellm proxy, `litellm --model`
response = client.embeddings.create(
    model="multimodalembedding@001", 
    input = ["gs://cloud-samples-data/vertex-ai/llm/prompts/landmark1.png", "gs://my-bucket/embeddings/supermarket-video.mp4"],
)

print(response)

Bildgenerierungsmodelle

Verwendung

response = await litellm.aimage_generation(
    prompt="An olympic size swimming pool",
    model="vertex_ai/imagegeneration@006",
    vertex_ai_project="adroit-crow-413218",
    vertex_ai_location="us-central1",
)

Mehrere Bilder generieren

Verwenden Sie den Parameter n, um anzugeben, wie viele Bilder generiert werden sollen.

response = await litellm.aimage_generation(
    prompt="An olympic size swimming pool",
    model="vertex_ai/imagegeneration@006",
    vertex_ai_project="adroit-crow-413218",
    vertex_ai_location="us-central1",
    n=1,
)

Unterstützte Bildgenerierungsmodelle

Modellname	Verwendung
`imagen-3.0-generate-001`	`litellm.image_generation('vertex_ai/imagen-3.0-generate-001', prompt)`
`imagen-3.0-fast-generate-001`	`litellm.image_generation('vertex_ai/imagen-3.0-fast-generate-001', prompt)`
`imagegeneration@006`	`litellm.image_generation('vertex_ai/imagegeneration@006', prompt)`
`imagegeneration@005`	`litellm.image_generation('vertex_ai/imagegeneration@005', prompt)`
`imagegeneration@002`	`litellm.image_generation('vertex_ai/imagegeneration@002', prompt)`

Text-to-Speech-APIs

Info

LiteLLM unterstützt den Aufruf der Vertex AI Text-to-Speech API im OpenAI-Format für Text-to-Speech-APIs.

Verwendung - Grundlegend

SDK
LiteLLM PROXY (Unified Endpoint)

Vertex AI unterstützt keinen model-Parameter, daher ist die Übergabe von model=vertex_ai/ der einzige erforderliche Parameter.

Sync-Verwendung

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"
response = litellm.speech(
    model="vertex_ai/",
    input="hello what llm guardrail do you have",
)
response.stream_to_file(speech_file_path)

Asynchrone Verwendung

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"
response = litellm.aspeech(
    model="vertex_ai/",
    input="hello what llm guardrail do you have",
)
response.stream_to_file(speech_file_path)

Modell zur config.yaml hinzufügen

model_list:
  - model_name: vertex-tts
    litellm_params:
      model: vertex_ai/ # Vertex AI does not support passing a `model` param - so passing `model=vertex_ai/` is the only required param
      vertex_project: "adroit-crow-413218"
      vertex_location: "us-central1"
      vertex_credentials: adroit-crow-413218-a956eef1a2a8.json 

litellm_settings:
  drop_params: True

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage über OpenAI Python SDK stellen

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

# see supported values for "voice" on vertex here: 
# https://console.cloud.google.com/vertex-ai/generative/speech/text-to-speech
response = client.audio.speech.create(
    model = "vertex-tts",
    input="the quick brown fox jumped over the lazy dogs",
    voice={'languageCode': 'en-US', 'name': 'en-US-Studio-O'}
)
print("response from proxy", response)

Verwendung - `ssml` als Eingabe

Übergeben Sie Ihr ssml als Eingabe an den Parameter input. Wenn es <speak> enthält, wird es automatisch erkannt und als ssml an die Vertex AI API übergeben.

Wenn Sie erzwingen möchten, dass Ihre input als ssml übergeben wird, setzen Sie use_ssml=True.

SDK
LiteLLM PROXY (Unified Endpoint)

Vertex AI unterstützt keinen model-Parameter, daher ist die Übergabe von model=vertex_ai/ der einzige erforderliche Parameter.

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"


ssml = """
<speak>
    <p>Hello, world!</p>
    <p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

response = litellm.speech(
    input=ssml,
    model="vertex_ai/test",
    voice={
        "languageCode": "en-UK",
        "name": "en-UK-Studio-O",
    },
    audioConfig={
        "audioEncoding": "LINEAR22",
        "speakingRate": "10",
    },
)
response.stream_to_file(speech_file_path)

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

ssml = """
<speak>
    <p>Hello, world!</p>
    <p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

# see supported values for "voice" on vertex here: 
# https://console.cloud.google.com/vertex-ai/generative/speech/text-to-speech
response = client.audio.speech.create(
    model = "vertex-tts",
    input=ssml,
    voice={'languageCode': 'en-US', 'name': 'en-US-Studio-O'},
)
print("response from proxy", response)

SSML-Verwendung erzwingen

Sie können die Verwendung von SSML erzwingen, indem Sie den Parameter use_ssml auf True setzen. Dies ist nützlich, wenn Sie sicherstellen möchten, dass Ihre Eingabe als SSML behandelt wird, auch wenn sie keine <speak>-Tags enthält.

Hier sind Beispiele, wie die SSML-Verwendung erzwungen wird

SDK
LiteLLM PROXY (Unified Endpoint)

Vertex AI unterstützt keinen model-Parameter, daher ist die Übergabe von model=vertex_ai/ der einzige erforderliche Parameter.

speech_file_path = Path(__file__).parent / "speech_vertex.mp3"


ssml = """
<speak>
    <p>Hello, world!</p>
    <p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

response = litellm.speech(
    input=ssml,
    use_ssml=True,
    model="vertex_ai/test",
    voice={
        "languageCode": "en-UK",
        "name": "en-UK-Studio-O",
    },
    audioConfig={
        "audioEncoding": "LINEAR22",
        "speakingRate": "10",
    },
)
response.stream_to_file(speech_file_path)

import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

ssml = """
<speak>
    <p>Hello, world!</p>
    <p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

# see supported values for "voice" on vertex here: 
# https://console.cloud.google.com/vertex-ai/generative/speech/text-to-speech
response = client.audio.speech.create(
    model = "vertex-tts",
    input=ssml, # pass as None since OpenAI SDK requires this param
    voice={'languageCode': 'en-US', 'name': 'en-US-Studio-O'},
    extra_body={"use_ssml": True},
)
print("response from proxy", response)

Batch-APIs

Fügen Sie einfach die folgenden Vertex-Umgebungsvariablen zu Ihrer Umgebung hinzu.

# GCS Bucket settings, used to store batch prediction files in
export GCS_BUCKET_NAME = "litellm-testing-bucket" # the bucket you want to store batch prediction files in
export GCS_PATH_SERVICE_ACCOUNT="/path/to/service_account.json" # path to your service account json file

# Vertex /batch endpoint settings, used for LLM API requests
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service_account.json" # path to your service account json file
export VERTEXAI_LOCATION="us-central1" # can be any vertex location
export VERTEXAI_PROJECT="my-test-project" 

Verwendung

1. Erstellen Sie eine Datei mit Batch-Anfragen für Vertex

LiteLLM erwartet, dass die Datei dem **OpenAI-Batches-Dateiformat** folgt.

Jeder body in der Datei sollte eine **OpenAI API-Anfrage** sein.

Erstellen Sie eine Datei namens vertex_batch_completions.jsonl im aktuellen Arbeitsverzeichnis. Das model sollte der Name des Vertex AI-Modells sein.

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash-001", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 10}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-1.5-flash-001", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 10}}

2. Laden Sie eine Datei mit Batch-Anfragen hoch

Für vertex_ai wird LiteLLM die Datei in den bereitgestellten GCS_BUCKET_NAME hochladen.

import os
oai_client = OpenAI(
    api_key="sk-1234",               # litellm proxy API key
    base_url="https://:4000" # litellm proxy base url
)
file_name = "vertex_batch_completions.jsonl" # 
_current_dir = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(_current_dir, file_name)
file_obj = oai_client.files.create(
    file=open(file_path, "rb"),
    purpose="batch",
    extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use vertex_ai for this file upload
)

Erwartete Antwort

{
    "id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
    "bytes": 416,
    "created_at": 1733392026,
    "filename": "litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
    "object": "file",
    "purpose": "batch",
    "status": "uploaded",
    "status_details": null
}

3. Erstellen Sie einen Batch

batch_input_file_id = file_obj.id # use `file_obj` from step 2
create_batch_response = oai_client.batches.create(
    completion_window="24h",
    endpoint="/v1/chat/completions",
    input_file_id=batch_input_file_id, # example input_file_id = "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/c2b1b785-252b-448c-b180-033c4c63b3ce"
    extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use `vertex_ai` for this batch request
)

Erwartete Antwort

{
    "id": "3814889423749775360",
    "completion_window": "24hrs",
    "created_at": 1733392026,
    "endpoint": "",
    "input_file_id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/d3f198cd-c0d1-436d-9b1e-28e3f282997a",
    "object": "batch",
    "status": "validating",
    "cancelled_at": null,
    "cancelling_at": null,
    "completed_at": null,
    "error_file_id": null,
    "errors": null,
    "expired_at": null,
    "expires_at": null,
    "failed_at": null,
    "finalizing_at": null,
    "in_progress_at": null,
    "metadata": null,
    "output_file_id": "gs://litellm-testing-bucket/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001",
    "request_counts": null
}

4. Rufen Sie einen Batch ab

retrieved_batch = oai_client.batches.retrieve(
    batch_id=create_batch_response.id,
    extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm to use `vertex_ai` for this batch request
)

Erwartete Antwort

{
    "id": "3814889423749775360",
    "completion_window": "24hrs",
    "created_at": 1736500100,
    "endpoint": "",
    "input_file_id": "gs://example-bucket-1-litellm/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001/7b2e47f5-3dd4-436d-920f-f9155bbdc952",
    "object": "batch",
    "status": "completed",
    "cancelled_at": null,
    "cancelling_at": null,
    "completed_at": null,
    "error_file_id": null,
    "errors": null,
    "expired_at": null,
    "expires_at": null,
    "failed_at": null,
    "finalizing_at": null,
    "in_progress_at": null,
    "metadata": null,
    "output_file_id": "gs://example-bucket-1-litellm/litellm-vertex-files/publishers/google/models/gemini-1.5-flash-001",
    "request_counts": null
}

Fine-Tuning-APIs

Eigenschaft	Details
Beschreibung	Erstellen Sie Fine-Tuning-Jobs in Vertex AI (`/tuningJobs`) mit dem OpenAI Python SDK.
Vertex Fine-Tuning-Dokumentation	Vertex Fine-Tuning

Verwendung

1. Fügen Sie `finetune_settings` zu Ihrer config.yaml hinzu

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/fake
      api_key: fake-key
      api_base: https://exampleopenaiendpoint-production.up.railway.app/

# 👇 Key change: For /fine_tuning/jobs endpoints
finetune_settings:
  - custom_llm_provider: "vertex_ai"
    vertex_project: "adroit-crow-413218"
    vertex_location: "us-central1"
    vertex_credentials: "/Users/ishaanjaffer/Downloads/adroit-crow-413218-a956eef1a2a8.json"

2. Erstellen Sie einen Fine-Tuning-Job

OpenAI Python SDK
curl

ft_job = await client.fine_tuning.jobs.create(
    model="gemini-1.0-pro-002",                  # Vertex model you want to fine-tune
    training_file="gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",                 # file_id from create file response
    extra_body={"custom_llm_provider": "vertex_ai"}, # tell litellm proxy which provider to use
)

curl https://:4000/v1/fine_tuning/jobs \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer sk-1234" \
    -d '{
    "custom_llm_provider": "vertex_ai",
    "model": "gemini-1.0-pro-002",
    "training_file": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl"
    }'

Fortgeschrittener Anwendungsfall - Übergabe von adapter_size an die Vertex AI API

Setzen Sie Hyperparameter wie n_epochs, learning_rate_multiplier und adapter_size. Siehe Vertex Erweiterte Hyperparameter

OpenAI Python SDK
curl

ft_job = client.fine_tuning.jobs.create(
    model="gemini-1.0-pro-002",                  # Vertex model you want to fine-tune
    training_file="gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",                 # file_id from create file response
    hyperparameters={
        "n_epochs": 3,                      # epoch_count on Vertex
        "learning_rate_multiplier": 0.1,    # learning_rate_multiplier on Vertex
        "adapter_size": "ADAPTER_SIZE_ONE"  # type: ignore, vertex specific hyperparameter
    },
    extra_body={
        "custom_llm_provider": "vertex_ai",
    },
)

curl https://:4000/v1/fine_tuning/jobs \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer sk-1234" \
    -d '{
    "custom_llm_provider": "vertex_ai",
    "model": "gemini-1.0-pro-002",
    "training_file": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",
    "hyperparameters": {
        "n_epochs": 3,
        "learning_rate_multiplier": 0.1,
        "adapter_size": "ADAPTER_SIZE_ONE"
    }
    }'

Extra

Verwendung von `GOOGLE_APPLICATION_CREDENTIALS`

Hier ist der Code zum Speichern Ihrer Dienstkontodaten als Umgebungsvariable GOOGLE_APPLICATION_CREDENTIALS

import os 
import tempfile

def load_vertex_ai_credentials():
  # Define the path to the vertex_key.json file
  print("loading vertex ai credentials")
  filepath = os.path.dirname(os.path.abspath(__file__))
  vertex_key_path = filepath + "/vertex_key.json"

  # Read the existing content of the file or create an empty dictionary
  try:
      with open(vertex_key_path, "r") as file:
          # Read the file content
          print("Read vertexai file path")
          content = file.read()

          # If the file is empty or not valid JSON, create an empty dictionary
          if not content or not content.strip():
              service_account_key_data = {}
          else:
              # Attempt to load the existing JSON content
              file.seek(0)
              service_account_key_data = json.load(file)
  except FileNotFoundError:
      # If the file doesn't exist, create an empty dictionary
      service_account_key_data = {}

  # Create a temporary file
  with tempfile.NamedTemporaryFile(mode="w+", delete=False) as temp_file:
      # Write the updated content to the temporary file
      json.dump(service_account_key_data, temp_file, indent=2)

  # Export the temporary file as GOOGLE_APPLICATION_CREDENTIALS
  os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.abspath(temp_file.name)

Verwendung eines GCP-Dienstkontos

Info

Versuchen Sie, LiteLLM auf Google Cloud Run bereitzustellen? Tutorial hier

Ermitteln Sie das Dienstkonto, das an den Google Cloud Run-Dienst gebunden ist.

Holen Sie sich die VOLLSTÄNDIGE E-Mail-Adresse des entsprechenden Dienstkontos.
Gehen Sie als Nächstes zu IAM & Verwaltung > Ressourcen verwalten, wählen Sie Ihr übergeordnetes Projekt aus, das Ihren Google Cloud Run-Dienst enthält.

Klicken Sie auf Principal hinzufügen.

Geben Sie das Dienstkonto als Principal und Vertex AI User als Rolle an.

Sobald dies geschehen ist, hat LiteLLM beim Bereitstellen des neuen Containers im Google Cloud Run-Dienst automatischen Zugriff auf alle Vertex AI-Endpunkte.

s/o @Darien Kindlund für dieses Tutorial

VertexAI[Anthropic, Gemini, Model Garden]

Übersicht​

vertex_ai/ Route​

Systemnachricht​

Funktionsaufrufe​

JSON-Schema​

Grounding - Websuche​

Umstellung vom Vertex AI SDK auf LiteLLM (GROUNDING)​

Thinking / reasoning_content​

Übergeben Sie thinking an Gemini-Modelle​

Kontext-Caching​

Voraussetzungen​

Beispielverwendung​

Verwendung mit LiteLLM Proxy Server​

Authentifizierung - vertex_project, vertex_location, etc.​

Dynamische Parameter​

Umgebungsvariablen​

Festlegen von Sicherheitseinstellungen​

Pro Modell/Anfrage festlegen​

Global festlegen​

Vertex-Projekt & Vertex-Standort festlegen​

Anthropic​

Verwendung​

Verwendung - thinking / reasoning_content​

Meta/Llama API​

Verwendung​

Mistral API​

Verwendung​

Verwendung - Codestral FIM​

AI21-Modelle​

Verwendung​

Gemini Pro​

Feinabgestimmte Modelle​

Model Garden​

Model Garden verwenden​

Gemini Pro Vision​

Gemini 1.5 Pro (und Vision)​

Gemini Pro Vision verwenden​

Verwendung - Funktionsaufrufe​

Verwendung - PDF / Videos / Audio etc. Dateien​

Verwendung von gs:// oder einer beliebigen URL​

Verwendung von base64​

Chat-Modelle​

Code Chat-Modelle​

Textmodelle​

Code Text-Modelle​

Embedding-Modelle​

Verwendung - Embedding​

Unterstützte Embedding-Modelle​

Unterstützte OpenAI (Unified) Parameter​

Verwendung mit OpenAI (Unified) Parametern​

Unterstützte Vertex-spezifische Parameter​

Verwendung mit Vertex-spezifischen Parametern (Verwenden Sie task_type und title)​

Multi-modale Embeddings​

Verwendung​

Text + Bild + Video Embeddings​

Bildgenerierungsmodelle​

Unterstützte Bildgenerierungsmodelle​

Text-to-Speech-APIs​

Verwendung - Grundlegend​

Verwendung - ssml als Eingabe​

SSML-Verwendung erzwingen​

Batch-APIs​

Verwendung​

1. Erstellen Sie eine Datei mit Batch-Anfragen für Vertex​

2. Laden Sie eine Datei mit Batch-Anfragen hoch​

3. Erstellen Sie einen Batch​

4. Rufen Sie einen Batch ab​

Fine-Tuning-APIs​

Verwendung​

1. Fügen Sie finetune_settings zu Ihrer config.yaml hinzu​

2. Erstellen Sie einen Fine-Tuning-Job​

Extra​

Verwendung von GOOGLE_APPLICATION_CREDENTIALS​

Verwendung eines GCP-Dienstkontos​

Übersicht

`vertex_ai/` Route

Systemnachricht

Funktionsaufrufe

JSON-Schema

Grounding - Websuche

Umstellung vom Vertex AI SDK auf LiteLLM (GROUNDING)

Thinking / `reasoning_content`

Übergeben Sie `thinking` an Gemini-Modelle

Kontext-Caching

Voraussetzungen

Beispielverwendung

Verwendung mit LiteLLM Proxy Server

Authentifizierung - vertex_project, vertex_location, etc.

Dynamische Parameter

Umgebungsvariablen

Festlegen von Sicherheitseinstellungen

Pro Modell/Anfrage festlegen

Global festlegen

Vertex-Projekt & Vertex-Standort festlegen

Anthropic

Verwendung

Verwendung - `thinking` / `reasoning_content`

Meta/Llama API

Verwendung

Mistral API

Verwendung

Verwendung - Codestral FIM

AI21-Modelle

Verwendung

Gemini Pro

Feinabgestimmte Modelle

Model Garden

Model Garden verwenden

Gemini Pro Vision

Gemini 1.5 Pro (und Vision)

Gemini Pro Vision verwenden

Verwendung - Funktionsaufrufe

Verwendung - PDF / Videos / Audio etc. Dateien

Verwendung von `gs://` oder einer beliebigen URL

Verwendung von base64

Chat-Modelle

Code Chat-Modelle

Textmodelle

Code Text-Modelle

Embedding-Modelle

Verwendung - Embedding

Unterstützte Embedding-Modelle

Unterstützte OpenAI (Unified) Parameter

Verwendung mit OpenAI (Unified) Parametern

Unterstützte Vertex-spezifische Parameter

Verwendung mit Vertex-spezifischen Parametern (Verwenden Sie `task_type` und `title`)

Multi-modale Embeddings

Verwendung

Text + Bild + Video Embeddings

Bildgenerierungsmodelle

Unterstützte Bildgenerierungsmodelle

Text-to-Speech-APIs

Verwendung - Grundlegend

Verwendung - `ssml` als Eingabe

SSML-Verwendung erzwingen

Batch-APIs

Verwendung

1. Erstellen Sie eine Datei mit Batch-Anfragen für Vertex

2. Laden Sie eine Datei mit Batch-Anfragen hoch

3. Erstellen Sie einen Batch

4. Rufen Sie einen Batch ab

Fine-Tuning-APIs

Verwendung

1. Fügen Sie `finetune_settings` zu Ihrer config.yaml hinzu

2. Erstellen Sie einen Fine-Tuning-Job

Extra

Verwendung von `GOOGLE_APPLICATION_CREDENTIALS`

Verwendung eines GCP-Dienstkontos