Gemini - Google AI Studio

Eigenschaft	Details
Beschreibung	Google AI Studio ist eine vollständig verwaltete KI-Entwicklungsplattform zum Erstellen und Verwenden generativer KI.
Provider-Routing in LiteLLM	`gemini/`
Provider-Dokumentation	Google AI Studio ↗
API-Endpunkt für Anbieter	https://generativelanguage.googleapis.com
Unterstützte OpenAI-Endpunkte	`/chat/completions`, `/embeddings`, `/completions`
Durchleitungs-Endpunkt	Unterstützt

API-Schlüssel

import os
os.environ["GEMINI_API_KEY"] = "your-api-key"

Beispielverwendung

from litellm import completion
import os

os.environ['GEMINI_API_KEY'] = ""
response = completion(
    model="gemini/gemini-pro", 
    messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
)

Unterstützte OpenAI-Parameter

temperature
top_p
max_tokens
max_completion_tokens
stream
tools
tool_choice
functions
response_format
n
stop
logprobs
frequency_penalty
modalities
reasoning_content

Anthropic-Parameter

thinking (wird verwendet, um das maximale Budget-Token für Anthropic/Gemini-Modelle festzulegen)

Aktualisierte Liste anzeigen

Verwendung - Thinking / `reasoning_content`

LiteLLM übersetzt reasoning_effort von OpenAI in den Parameter thinking von Gemini. Code

Mapping

reasoning_effort	Denken
"low"	"budget_tokens": 1024
"medium"	"budget_tokens": 2048
"high"	"budget_tokens": 4096

SDK
PROXY

from litellm import completion

resp = completion(
    model="gemini/gemini-2.5-flash-preview-04-17",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    reasoning_effort="low",
)

Konfigurieren Sie config.yaml

- model_name: gemini-2.5-flash
  litellm_params:
    model: gemini/gemini-2.5-flash-preview-04-17
    api_key: os.environ/GEMINI_API_KEY

Proxy starten

litellm --config /path/to/config.yaml

Testen Sie es!

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "reasoning_effort": "low"
  }'

Erwartete Antwort

ModelResponse(
    id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
    created=1740470510,
    model='claude-3-7-sonnet-20250219',
    object='chat.completion',
    system_fingerprint=None,
    choices=[
        Choices(
            finish_reason='stop',
            index=0,
            message=Message(
                content="The capital of France is Paris.",
                role='assistant',
                tool_calls=None,
                function_call=None,
                reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
            ),
        )
    ],
    usage=Usage(
        completion_tokens=68,
        prompt_tokens=42,
        total_tokens=110,
        completion_tokens_details=None,
        prompt_tokens_details=PromptTokensDetailsWrapper(
            audio_tokens=None,
            cached_tokens=0,
            text_tokens=None,
            image_tokens=None
        ),
        cache_creation_input_tokens=0,
        cache_read_input_tokens=0
    )
)

`thinking` an Gemini-Modelle übergeben

Sie können den Parameter thinking auch an Gemini-Modelle übergeben.

Dies wird in den Parameter thinkingConfig von Gemini übersetzt. thinkingConfig

SDK
PROXY

response = litellm.completion(
  model="gemini/gemini-2.5-flash-preview-04-17",
  messages=[{"role": "user", "content": "What is the capital of France?"}],
  thinking={"type": "enabled", "budget_tokens": 1024},
)

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -d '{
    "model": "gemini/gemini-2.5-flash-preview-04-17",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "thinking": {"type": "enabled", "budget_tokens": 1024}
  }'

Gemini-spezifische Parameter übergeben

Antwortschema

LiteLLM unterstützt das Senden von response_schema als Parameter für Gemini-1.5-Pro auf Google AI Studio.

Antwortschema

SDK
PROXY

from litellm import completion 
import json 
import os 

os.environ['GEMINI_API_KEY'] = ""

messages = [
    {
        "role": "user",
        "content": "List 5 popular cookie recipes."
    }
]

response_schema = {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "recipe_name": {
                    "type": "string",
                },
            },
            "required": ["recipe_name"],
        },
    }


completion(
    model="gemini/gemini-1.5-pro", 
    messages=messages, 
    response_format={"type": "json_object", "response_schema": response_schema} # 👈 KEY CHANGE
    )

print(json.loads(completion.choices[0].message.content))

Modell zur config.yaml hinzufügen

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: gemini/gemini-1.5-pro
      api_key: os.environ/GEMINI_API_KEY

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage stellen!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "gemini-pro",
  "messages": [
        {"role": "user", "content": "List 5 popular cookie recipes."}
    ],
  "response_format": {"type": "json_object", "response_schema": { 
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "recipe_name": {
                    "type": "string",
                },
            },
            "required": ["recipe_name"],
        },
    }}
}
'

Schema validieren

Um das response_schema zu validieren, setzen Sie enforce_validation: true.

SDK
PROXY

from litellm import completion, JSONSchemaValidationError
try: 
    completion(
    model="gemini/gemini-1.5-pro", 
    messages=messages, 
    response_format={
        "type": "json_object", 
        "response_schema": response_schema,
        "enforce_validation": true # 👈 KEY CHANGE
    }
    )
except JSONSchemaValidationError as e: 
    print("Raw Response: {}".format(e.raw_response))
    raise e

Modell zur config.yaml hinzufügen

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: gemini/gemini-1.5-pro
      api_key: os.environ/GEMINI_API_KEY

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage stellen!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "gemini-pro",
  "messages": [
        {"role": "user", "content": "List 5 popular cookie recipes."}
    ],
  "response_format": {"type": "json_object", "response_schema": { 
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "recipe_name": {
                    "type": "string",
                },
            },
            "required": ["recipe_name"],
        },
    }, 
    "enforce_validation": true
    }
}
'

LiteLLM validiert die Antwort anhand des Schemas und löst eine JSONSchemaValidationError aus, wenn die Antwort nicht mit dem Schema übereinstimmt.

JSONSchemaValidationError erbt von openai.APIError

Greifen Sie mit e.raw_response auf die Rohantwort zu

GenerationConfig-Parameter

Um zusätzliche GenerationConfig-Parameter zu übergeben, z. B. topK, übergeben Sie sie einfach im Anfragekörper des Aufrufs, und LiteLLM leitet sie als Schlüssel-Wert-Paar im Anfragekörper weiter.

Gemini GenerationConfigParams anzeigen

SDK
PROXY

from litellm import completion 
import json 
import os 

os.environ['GEMINI_API_KEY'] = ""

messages = [
    {
        "role": "user",
        "content": "List 5 popular cookie recipes."
    }
]

completion(
    model="gemini/gemini-1.5-pro", 
    messages=messages, 
    topK=1 # 👈 KEY CHANGE
)

print(json.loads(completion.choices[0].message.content))

Modell zur config.yaml hinzufügen

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: gemini/gemini-1.5-pro
      api_key: os.environ/GEMINI_API_KEY

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage stellen!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "gemini-pro",
  "messages": [
        {"role": "user", "content": "List 5 popular cookie recipes."}
    ],
  "topK": 1 # 👈 KEY CHANGE
}
'

Schema validieren

Um das response_schema zu validieren, setzen Sie enforce_validation: true.

SDK
PROXY

from litellm import completion, JSONSchemaValidationError
try: 
    completion(
    model="gemini/gemini-1.5-pro", 
    messages=messages, 
    response_format={
        "type": "json_object", 
        "response_schema": response_schema,
        "enforce_validation": true # 👈 KEY CHANGE
    }
    )
except JSONSchemaValidationError as e: 
    print("Raw Response: {}".format(e.raw_response))
    raise e

Modell zur config.yaml hinzufügen

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: gemini/gemini-1.5-pro
      api_key: os.environ/GEMINI_API_KEY

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage stellen!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "gemini-pro",
  "messages": [
        {"role": "user", "content": "List 5 popular cookie recipes."}
    ],
  "response_format": {"type": "json_object", "response_schema": { 
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "recipe_name": {
                    "type": "string",
                },
            },
            "required": ["recipe_name"],
        },
    }, 
    "enforce_validation": true
    }
}
'

Sicherheitseinstellungen angeben

In bestimmten Anwendungsfällen müssen Sie möglicherweise Aufrufe an die Modelle tätigen und Sicherheitseinstellungen übergeben, die von den Standardeinstellungen abweichen. Übergeben Sie dazu einfach das Argument safety_settings an completion oder acompletion. Zum Beispiel

response = completion(
    model="gemini/gemini-pro", 
    messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}],
    safety_settings=[
        {
            "category": "HARM_CATEGORY_HARASSMENT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_NONE",
        },
    ]
)

Tool-Aufrufe

from litellm import completion
import os
# set env
os.environ["GEMINI_API_KEY"] = ".."

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    }
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]

response = completion(
    model="gemini/gemini-1.5-flash",
    messages=messages,
    tools=tools,
)
# Add any assertions, here to check response args
print(response)
assert isinstance(response.choices[0].message.tool_calls[0].function.name, str)
assert isinstance(
    response.choices[0].message.tool_calls[0].function.arguments, str
)

Google Search Tool

SDK
PROXY

from litellm import completion
import os

os.environ["GEMINI_API_KEY"] = ".."

tools = [{"googleSearch": {}}] # 👈 ADD GOOGLE SEARCH

response = completion(
    model="gemini/gemini-2.0-flash",
    messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
    tools=tools,
)

print(response)

Konfigurieren Sie config.yaml

model_list:
  - model_name: gemini-2.0-flash
    litellm_params:
      model: gemini/gemini-2.0-flash
      api_key: os.environ/GEMINI_API_KEY

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage stellen!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "gemini-2.0-flash",
  "messages": [{"role": "user", "content": "What is the weather in San Francisco?"}],
  "tools": [{"googleSearch": {}}]
}
'

Google Search Retrieval

SDK
PROXY

from litellm import completion
import os

os.environ["GEMINI_API_KEY"] = ".."

tools = [{"googleSearch": {}}] # 👈 ADD GOOGLE SEARCH

response = completion(
    model="gemini/gemini-2.0-flash",
    messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
    tools=tools,
)

print(response)

Konfigurieren Sie config.yaml

model_list:
  - model_name: gemini-2.0-flash
    litellm_params:
      model: gemini/gemini-2.0-flash
      api_key: os.environ/GEMINI_API_KEY

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage stellen!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "gemini-2.0-flash",
  "messages": [{"role": "user", "content": "What is the weather in San Francisco?"}],
  "tools": [{"googleSearch": {}}]
}
'

Code Execution Tool

SDK
PROXY

from litellm import completion
import os

os.environ["GEMINI_API_KEY"] = ".."

tools = [{"codeExecution": {}}] # 👈 ADD GOOGLE SEARCH

response = completion(
    model="gemini/gemini-2.0-flash",
    messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
    tools=tools,
)

print(response)

Konfigurieren Sie config.yaml

model_list:
  - model_name: gemini-2.0-flash
    litellm_params:
      model: gemini/gemini-2.0-flash
      api_key: os.environ/GEMINI_API_KEY

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage stellen!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "gemini-2.0-flash",
  "messages": [{"role": "user", "content": "What is the weather in San Francisco?"}],
  "tools": [{"codeExecution": {}}]
}
'

JSON-Modus

SDK
PROXY

from litellm import completion 
import json 
import os 

os.environ['GEMINI_API_KEY'] = ""

messages = [
    {
        "role": "user",
        "content": "List 5 popular cookie recipes."
    }
]



completion(
    model="gemini/gemini-1.5-pro", 
    messages=messages, 
    response_format={"type": "json_object"} # 👈 KEY CHANGE
)

print(json.loads(completion.choices[0].message.content))

Modell zur config.yaml hinzufügen

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: gemini/gemini-1.5-pro
      api_key: os.environ/GEMINI_API_KEY

Proxy starten

$ litellm --config /path/to/config.yaml

Anfrage stellen!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "model": "gemini-pro",
  "messages": [
        {"role": "user", "content": "List 5 popular cookie recipes."}
    ],
  "response_format": {"type": "json_object"}
}
'

# Gemini-Pro-Vision LiteLLM unterstützt die folgenden Bildtypen, die in `url` übergeben werden - Bilder mit direkten Links - https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg - Bild im lokalen Speicher - ./localimage.jpeg

Beispielnutzung

import os
import litellm
from dotenv import load_dotenv

# Load the environment variables from .env file
load_dotenv()
os.environ["GEMINI_API_KEY"] = os.getenv('GEMINI_API_KEY')

prompt = 'Describe the image in a few sentences.'
# Note: You can pass here the URL or Path of image directly.
image_url = 'https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg'

# Create the messages payload according to the documentation
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": prompt
            },
            {
                "type": "image_url",
                "image_url": {"url": image_url}
            }
        ]
    }
]

# Make the API call to Gemini model
response = litellm.completion(
    model="gemini/gemini-pro-vision",
    messages=messages,
)

# Extract the response content
content = response.get('choices', [{}])[0].get('message', {}).get('content')

# Print the result
print(content)

Verwendung - PDF / Videos / etc. Dateien

Inline-Daten (z.B. Audio-Stream)

LiteLLM folgt dem OpenAI-Format und akzeptiert die Übergabe von Inline-Daten als Base64-kodierte Zeichenkette.

Das zu befolgende Format ist

data:<mime_type>;base64,<encoded_data>

LITELLM AUFRUF

import litellm
from pathlib import Path
import base64
import os

os.environ["GEMINI_API_KEY"] = "" 

litellm.set_verbose = True # 👈 See Raw call 

audio_bytes = Path("speech_vertex.mp3").read_bytes()
encoded_data = base64.b64encode(audio_bytes).decode("utf-8")
print("Audio Bytes = {}".format(audio_bytes))
model = "gemini/gemini-1.5-flash"
response = litellm.completion(
    model=model,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please summarize the audio."},
                {
                    "type": "file",
                    "file": {
                        "file_data": "data:audio/mp3;base64,{}".format(encoded_data), # 👈 SET MIME_TYPE + DATA
                    }
                },
            ],
        }
    ],
)

Entsprechender GOOGLE API AUFRUF

# Initialize a Gemini model appropriate for your use case.
model = genai.GenerativeModel('models/gemini-1.5-flash')

# Create the prompt.
prompt = "Please summarize the audio."

# Load the samplesmall.mp3 file into a Python Blob object containing the audio
# file's bytes and then pass the prompt and the audio to Gemini.
response = model.generate_content([
    prompt,
    {
        "mime_type": "audio/mp3",
        "data": pathlib.Path('samplesmall.mp3').read_bytes()
    }
])

# Output Gemini's response to the prompt and the inline audio.
print(response.text)

https:// Datei

import litellm
import os

os.environ["GEMINI_API_KEY"] = "" 

litellm.set_verbose = True # 👈 See Raw call 

model = "gemini/gemini-1.5-flash"
response = litellm.completion(
    model=model,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please summarize the file."},
                {
                    "type": "file",
                    "file": {
                        "file_id": "https://storage...", # 👈 SET THE IMG URL
                        "format": "application/pdf" # OPTIONAL
                    }
                },
            ],
        }
    ],
)

gs:// Datei

import litellm
import os

os.environ["GEMINI_API_KEY"] = "" 

litellm.set_verbose = True # 👈 See Raw call 

model = "gemini/gemini-1.5-flash"
response = litellm.completion(
    model=model,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please summarize the file."},
                {
                    "type": "file",
                    "file": {
                        "file_id": "gs://storage...", # 👈 SET THE IMG URL
                        "format": "application/pdf" # OPTIONAL
                    }
                },
            ],
        }
    ],
)

Chat-Modelle

Tipp

Wir unterstützen ALLE Gemini-Modelle. Setzen Sie einfach model=gemini/<any-model-on-gemini> als Präfix, wenn Sie LiteLLM-Anfragen senden.

Modellname	Funktionsaufruf	Erforderliche OS-Variablen
gemini-pro	`completion(model='gemini/gemini-pro', messages)`	`os.environ['GEMINI_API_KEY']`
gemini-1.5-pro-latest	`completion(model='gemini/gemini-1.5-pro-latest', messages)`	`os.environ['GEMINI_API_KEY']`
gemini-2.0-flash	`completion(model='gemini/gemini-2.0-flash', messages)`	`os.environ['GEMINI_API_KEY']`
gemini-2.0-flash-exp	`completion(model='gemini/gemini-2.0-flash-exp', messages)`	`os.environ['GEMINI_API_KEY']`
gemini-2.0-flash-lite-preview-02-05	`completion(model='gemini/gemini-2.0-flash-lite-preview-02-05', messages)`	`os.environ['GEMINI_API_KEY']`

Kontext-Caching

Die Verwendung des Kontext-Cachings von Google AI Studio wird unterstützt von

{
    {
        "role": "system",
        "content": ...,
        "cache_control": {"type": "ephemeral"} # 👈 KEY CHANGE
    },
    ...
}

in Ihrem Nachrichten-Inhaltsblock.

Architekturdiagramm

Notizen

Relevanter Code
Gemini Context Caching erlaubt nur 1 Block zusammenhängender Nachrichten, die zwischengespeichert werden können.
Wenn mehrere nicht zusammenhängende Blöcke cache_control enthalten, wird der erste zusammenhängende Block verwendet. (gesendet an /cachedContent im Gemini-Format)

Die Rohanfrage an den Endpunkt /generateContent von Gemini sieht wie folgt aus

curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-001:generateContent?key=$GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
      "contents": [
        {
          "parts":[{
            "text": "Please summarize this transcript"
          }],
          "role": "user"
        },
      ],
      "cachedContent": "'$CACHE_NAME'"
    }'

Beispielverwendung

SDK
PROXY

from litellm import completion 

for _ in range(2): 
    resp = completion(
        model="gemini/gemini-1.5-pro",
        messages=[
        # System Message
            {
                "role": "system",
                "content": [
                    {
                        "type": "text",
                        "text": "Here is the full text of a complex legal agreement" * 4000,
                        "cache_control": {"type": "ephemeral"}, # 👈 KEY CHANGE
                    }
                ],
            },
            # marked for caching with the cache_control parameter, so that this checkpoint can read from the previous cache.
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What are the key terms and conditions in this agreement?",
                        "cache_control": {"type": "ephemeral"},
                    }
                ],
            }]
    )

    print(resp.usage) # 👈 2nd usage block will be less, since cached tokens used

Konfigurieren Sie config.yaml

model_list:
    - model_name: gemini-1.5-pro
      litellm_params:
        model: gemini/gemini-1.5-pro
        api_key: os.environ/GEMINI_API_KEY

Proxy starten

litellm --config /path/to/config.yaml

Testen Sie es!

Beispiele für Langchain, OpenAI JS, Llamaindex usw. anzeigen

Curl
OpenAI Python SDK

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "gemini-1.5-pro",
    "messages": [
        # System Message
            {
                "role": "system",
                "content": [
                    {
                        "type": "text",
                        "text": "Here is the full text of a complex legal agreement" * 4000,
                        "cache_control": {"type": "ephemeral"}, # 👈 KEY CHANGE
                    }
                ],
            },
            # marked for caching with the cache_control parameter, so that this checkpoint can read from the previous cache.
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What are the key terms and conditions in this agreement?",
                        "cache_control": {"type": "ephemeral"},
                    }
                ],
            }],
}'

import openai
client = openai.AsyncOpenAI(
    api_key="anything",            # litellm proxy api key
    base_url="http://0.0.0.0:4000" # litellm proxy base url
)


response = await client.chat.completions.create(
    model="gemini-1.5-pro",
    messages=[
        {
            "role": "system",
            "content": [
                    {
                        "type": "text",
                        "text": "Here is the full text of a complex legal agreement" * 4000,
                        "cache_control": {"type": "ephemeral"}, # 👈 KEY CHANGE
                    }
            ],
        },
        {
            "role": "user",
            "content": "what are the key terms and conditions in this agreement?",
        },
    ]
)

Bilderzeugung

SDK
PROXY

from litellm import completion 

response = completion(
    model="gemini/gemini-2.0-flash-exp-image-generation",
    messages=[{"role": "user", "content": "Generate an image of a cat"}],
    modalities=["image", "text"],
)
assert response.choices[0].message.content is not None # "data:image/png;base64,e4rr.."

Konfigurieren Sie config.yaml

model_list:
  - model_name: gemini-2.0-flash-exp-image-generation
    litellm_params:
      model: gemini/gemini-2.0-flash-exp-image-generation
      api_key: os.environ/GEMINI_API_KEY

Proxy starten

litellm --config /path/to/config.yaml

Testen Sie es!

curl -L -X POST 'https://:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "model": "gemini-2.0-flash-exp-image-generation",
    "messages": [{"role": "user", "content": "Generate an image of a cat"}],
    "modalities": ["image", "text"]
}'

Gemini - Google AI Studio

API-Schlüssel​

Beispielverwendung​

Unterstützte OpenAI-Parameter​

Verwendung - Thinking / reasoning_content​

thinking an Gemini-Modelle übergeben​

Gemini-spezifische Parameter übergeben​

Antwortschema​

GenerationConfig-Parameter​

Sicherheitseinstellungen angeben​

Tool-Aufrufe​

Google Search Tool​

Google Search Retrieval​

Code Execution Tool​

JSON-Modus​

Beispielnutzung​

Verwendung - PDF / Videos / etc. Dateien​

Inline-Daten (z.B. Audio-Stream)​

https:// Datei​

gs:// Datei​

Chat-Modelle​

Kontext-Caching​

Architekturdiagramm​

Beispielverwendung​

Bilderzeugung​

API-Schlüssel

Beispielverwendung

Unterstützte OpenAI-Parameter

Verwendung - Thinking / `reasoning_content`

`thinking` an Gemini-Modelle übergeben

Gemini-spezifische Parameter übergeben

Antwortschema

GenerationConfig-Parameter

Sicherheitseinstellungen angeben

Tool-Aufrufe

Google Search Tool

Google Search Retrieval

Code Execution Tool

JSON-Modus

Beispielnutzung

Verwendung - PDF / Videos / etc. Dateien

Inline-Daten (z.B. Audio-Stream)

https:// Datei

gs:// Datei

Chat-Modelle

Kontext-Caching

Architekturdiagramm

Beispielverwendung

Bilderzeugung