Zum Hauptinhalt springen

Gemini - Google AI Studio

EigenschaftDetails
BeschreibungGoogle AI Studio ist eine vollständig verwaltete KI-Entwicklungsplattform zum Erstellen und Verwenden generativer KI.
Provider-Routing in LiteLLMgemini/
Provider-DokumentationGoogle AI Studio ↗
API-Endpunkt für Anbieterhttps://generativelanguage.googleapis.com
Unterstützte OpenAI-Endpunkte/chat/completions, /embeddings, /completions
Durchleitungs-EndpunktUnterstützt

API-Schlüssel​

import os
os.environ["GEMINI_API_KEY"] = "your-api-key"

Beispielverwendung​

from litellm import completion
import os

os.environ['GEMINI_API_KEY'] = ""
response = completion(
model="gemini/gemini-pro",
messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
)

Unterstützte OpenAI-Parameter​

  • temperature
  • top_p
  • max_tokens
  • max_completion_tokens
  • stream
  • tools
  • tool_choice
  • functions
  • response_format
  • n
  • stop
  • logprobs
  • frequency_penalty
  • modalities
  • reasoning_content

Anthropic-Parameter

  • thinking (wird verwendet, um das maximale Budget-Token für Anthropic/Gemini-Modelle festzulegen)

Aktualisierte Liste anzeigen

Verwendung - Thinking / reasoning_content​

LiteLLM übersetzt reasoning_effort von OpenAI in den Parameter thinking von Gemini. Code

Mapping

reasoning_effortDenken
"low""budget_tokens": 1024
"medium""budget_tokens": 2048
"high""budget_tokens": 4096
from litellm import completion

resp = completion(
model="gemini/gemini-2.5-flash-preview-04-17",
messages=[{"role": "user", "content": "What is the capital of France?"}],
reasoning_effort="low",
)

Erwartete Antwort

ModelResponse(
id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
created=1740470510,
model='claude-3-7-sonnet-20250219',
object='chat.completion',
system_fingerprint=None,
choices=[
Choices(
finish_reason='stop',
index=0,
message=Message(
content="The capital of France is Paris.",
role='assistant',
tool_calls=None,
function_call=None,
reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
),
)
],
usage=Usage(
completion_tokens=68,
prompt_tokens=42,
total_tokens=110,
completion_tokens_details=None,
prompt_tokens_details=PromptTokensDetailsWrapper(
audio_tokens=None,
cached_tokens=0,
text_tokens=None,
image_tokens=None
),
cache_creation_input_tokens=0,
cache_read_input_tokens=0
)
)

thinking an Gemini-Modelle übergeben​

Sie können den Parameter thinking auch an Gemini-Modelle übergeben.

Dies wird in den Parameter thinkingConfig von Gemini übersetzt. thinkingConfig

response = litellm.completion(
model="gemini/gemini-2.5-flash-preview-04-17",
messages=[{"role": "user", "content": "What is the capital of France?"}],
thinking={"type": "enabled", "budget_tokens": 1024},
)

Gemini-spezifische Parameter übergeben​

Antwortschema​

LiteLLM unterstützt das Senden von response_schema als Parameter für Gemini-1.5-Pro auf Google AI Studio.

Antwortschema

from litellm import completion 
import json
import os

os.environ['GEMINI_API_KEY'] = ""

messages = [
{
"role": "user",
"content": "List 5 popular cookie recipes."
}
]

response_schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"recipe_name": {
"type": "string",
},
},
"required": ["recipe_name"],
},
}


completion(
model="gemini/gemini-1.5-pro",
messages=messages,
response_format={"type": "json_object", "response_schema": response_schema} # 👈 KEY CHANGE
)

print(json.loads(completion.choices[0].message.content))

Schema validieren

Um das response_schema zu validieren, setzen Sie enforce_validation: true.

from litellm import completion, JSONSchemaValidationError
try:
completion(
model="gemini/gemini-1.5-pro",
messages=messages,
response_format={
"type": "json_object",
"response_schema": response_schema,
"enforce_validation": true # 👈 KEY CHANGE
}
)
except JSONSchemaValidationError as e:
print("Raw Response: {}".format(e.raw_response))
raise e

LiteLLM validiert die Antwort anhand des Schemas und löst eine JSONSchemaValidationError aus, wenn die Antwort nicht mit dem Schema übereinstimmt.

JSONSchemaValidationError erbt von openai.APIError

Greifen Sie mit e.raw_response auf die Rohantwort zu

GenerationConfig-Parameter​

Um zusätzliche GenerationConfig-Parameter zu übergeben, z. B. topK, übergeben Sie sie einfach im Anfragekörper des Aufrufs, und LiteLLM leitet sie als Schlüssel-Wert-Paar im Anfragekörper weiter.

Gemini GenerationConfigParams anzeigen

from litellm import completion 
import json
import os

os.environ['GEMINI_API_KEY'] = ""

messages = [
{
"role": "user",
"content": "List 5 popular cookie recipes."
}
]

completion(
model="gemini/gemini-1.5-pro",
messages=messages,
topK=1 # 👈 KEY CHANGE
)

print(json.loads(completion.choices[0].message.content))

Schema validieren

Um das response_schema zu validieren, setzen Sie enforce_validation: true.

from litellm import completion, JSONSchemaValidationError
try:
completion(
model="gemini/gemini-1.5-pro",
messages=messages,
response_format={
"type": "json_object",
"response_schema": response_schema,
"enforce_validation": true # 👈 KEY CHANGE
}
)
except JSONSchemaValidationError as e:
print("Raw Response: {}".format(e.raw_response))
raise e

Sicherheitseinstellungen angeben​

In bestimmten Anwendungsfällen müssen Sie möglicherweise Aufrufe an die Modelle tätigen und Sicherheitseinstellungen übergeben, die von den Standardeinstellungen abweichen. Übergeben Sie dazu einfach das Argument safety_settings an completion oder acompletion. Zum Beispiel

response = completion(
model="gemini/gemini-pro",
messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}],
safety_settings=[
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
},
]
)

Tool-Aufrufe​

from litellm import completion
import os
# set env
os.environ["GEMINI_API_KEY"] = ".."

tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]

response = completion(
model="gemini/gemini-1.5-flash",
messages=messages,
tools=tools,
)
# Add any assertions, here to check response args
print(response)
assert isinstance(response.choices[0].message.tool_calls[0].function.name, str)
assert isinstance(
response.choices[0].message.tool_calls[0].function.arguments, str
)


Google Search Tool​

from litellm import completion
import os

os.environ["GEMINI_API_KEY"] = ".."

tools = [{"googleSearch": {}}] # 👈 ADD GOOGLE SEARCH

response = completion(
model="gemini/gemini-2.0-flash",
messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
tools=tools,
)

print(response)

Google Search Retrieval​

from litellm import completion
import os

os.environ["GEMINI_API_KEY"] = ".."

tools = [{"googleSearch": {}}] # 👈 ADD GOOGLE SEARCH

response = completion(
model="gemini/gemini-2.0-flash",
messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
tools=tools,
)

print(response)

Code Execution Tool​

from litellm import completion
import os

os.environ["GEMINI_API_KEY"] = ".."

tools = [{"codeExecution": {}}] # 👈 ADD GOOGLE SEARCH

response = completion(
model="gemini/gemini-2.0-flash",
messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
tools=tools,
)

print(response)

JSON-Modus​

from litellm import completion 
import json
import os

os.environ['GEMINI_API_KEY'] = ""

messages = [
{
"role": "user",
"content": "List 5 popular cookie recipes."
}
]



completion(
model="gemini/gemini-1.5-pro",
messages=messages,
response_format={"type": "json_object"} # 👈 KEY CHANGE
)

print(json.loads(completion.choices[0].message.content))
# Gemini-Pro-Vision LiteLLM unterstützt die folgenden Bildtypen, die in `url` übergeben werden - Bilder mit direkten Links - https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg - Bild im lokalen Speicher - ./localimage.jpeg

Beispielnutzung​

import os
import litellm
from dotenv import load_dotenv

# Load the environment variables from .env file
load_dotenv()
os.environ["GEMINI_API_KEY"] = os.getenv('GEMINI_API_KEY')

prompt = 'Describe the image in a few sentences.'
# Note: You can pass here the URL or Path of image directly.
image_url = 'https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg'

# Create the messages payload according to the documentation
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt
},
{
"type": "image_url",
"image_url": {"url": image_url}
}
]
}
]

# Make the API call to Gemini model
response = litellm.completion(
model="gemini/gemini-pro-vision",
messages=messages,
)

# Extract the response content
content = response.get('choices', [{}])[0].get('message', {}).get('content')

# Print the result
print(content)

Verwendung - PDF / Videos / etc. Dateien​

Inline-Daten (z.B. Audio-Stream)​

LiteLLM folgt dem OpenAI-Format und akzeptiert die Übergabe von Inline-Daten als Base64-kodierte Zeichenkette.

Das zu befolgende Format ist

data:<mime_type>;base64,<encoded_data>

LITELLM AUFRUF

import litellm
from pathlib import Path
import base64
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

audio_bytes = Path("speech_vertex.mp3").read_bytes()
encoded_data = base64.b64encode(audio_bytes).decode("utf-8")
print("Audio Bytes = {}".format(audio_bytes))
model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the audio."},
{
"type": "file",
"file": {
"file_data": "data:audio/mp3;base64,{}".format(encoded_data), # 👈 SET MIME_TYPE + DATA
}
},
],
}
],
)

Entsprechender GOOGLE API AUFRUF

# Initialize a Gemini model appropriate for your use case.
model = genai.GenerativeModel('models/gemini-1.5-flash')

# Create the prompt.
prompt = "Please summarize the audio."

# Load the samplesmall.mp3 file into a Python Blob object containing the audio
# file's bytes and then pass the prompt and the audio to Gemini.
response = model.generate_content([
prompt,
{
"mime_type": "audio/mp3",
"data": pathlib.Path('samplesmall.mp3').read_bytes()
}
])

# Output Gemini's response to the prompt and the inline audio.
print(response.text)

https:// Datei​

import litellm
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the file."},
{
"type": "file",
"file": {
"file_id": "https://storage...", # 👈 SET THE IMG URL
"format": "application/pdf" # OPTIONAL
}
},
],
}
],
)

gs:// Datei​

import litellm
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the file."},
{
"type": "file",
"file": {
"file_id": "gs://storage...", # 👈 SET THE IMG URL
"format": "application/pdf" # OPTIONAL
}
},
],
}
],
)

Chat-Modelle​

Tipp

Wir unterstützen ALLE Gemini-Modelle. Setzen Sie einfach model=gemini/<any-model-on-gemini> als Präfix, wenn Sie LiteLLM-Anfragen senden.

ModellnameFunktionsaufrufErforderliche OS-Variablen
gemini-procompletion(model='gemini/gemini-pro', messages)os.environ['GEMINI_API_KEY']
gemini-1.5-pro-latestcompletion(model='gemini/gemini-1.5-pro-latest', messages)os.environ['GEMINI_API_KEY']
gemini-2.0-flashcompletion(model='gemini/gemini-2.0-flash', messages)os.environ['GEMINI_API_KEY']
gemini-2.0-flash-expcompletion(model='gemini/gemini-2.0-flash-exp', messages)os.environ['GEMINI_API_KEY']
gemini-2.0-flash-lite-preview-02-05completion(model='gemini/gemini-2.0-flash-lite-preview-02-05', messages)os.environ['GEMINI_API_KEY']

Kontext-Caching​

Die Verwendung des Kontext-Cachings von Google AI Studio wird unterstützt von

{
{
"role": "system",
"content": ...,
"cache_control": {"type": "ephemeral"} # 👈 KEY CHANGE
},
...
}

in Ihrem Nachrichten-Inhaltsblock.

Architekturdiagramm​

Notizen

  • Relevanter Code

  • Gemini Context Caching erlaubt nur 1 Block zusammenhängender Nachrichten, die zwischengespeichert werden können.

  • Wenn mehrere nicht zusammenhängende Blöcke cache_control enthalten, wird der erste zusammenhängende Block verwendet. (gesendet an /cachedContent im Gemini-Format)

  • Die Rohanfrage an den Endpunkt /generateContent von Gemini sieht wie folgt aus
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-001:generateContent?key=$GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"contents": [
{
"parts":[{
"text": "Please summarize this transcript"
}],
"role": "user"
},
],
"cachedContent": "'$CACHE_NAME'"
}'

Beispielverwendung​

from litellm import completion 

for _ in range(2):
resp = completion(
model="gemini/gemini-1.5-pro",
messages=[
# System Message
{
"role": "system",
"content": [
{
"type": "text",
"text": "Here is the full text of a complex legal agreement" * 4000,
"cache_control": {"type": "ephemeral"}, # 👈 KEY CHANGE
}
],
},
# marked for caching with the cache_control parameter, so that this checkpoint can read from the previous cache.
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the key terms and conditions in this agreement?",
"cache_control": {"type": "ephemeral"},
}
],
}]
)

print(resp.usage) # 👈 2nd usage block will be less, since cached tokens used

Bilderzeugung​

from litellm import completion 

response = completion(
model="gemini/gemini-2.0-flash-exp-image-generation",
messages=[{"role": "user", "content": "Generate an image of a cat"}],
modalities=["image", "text"],
)
assert response.choices[0].message.content is not None # ".."