Meta Llama
| Eigenschaft | Details |
|---|---|
| Beschreibung | Die Meta Llama API bietet Zugriff auf die Familie der großen Sprachmodelle von Meta. |
| Provider-Routing in LiteLLM | meta_llama/ |
| Unterstützte Endpunkte | /chat/completions, /completions, /responses |
| API-Referenz | Llama API Referenz ↗ |
Erforderliche Variablen​
Umgebungsvariablen
os.environ["LLAMA_API_KEY"] = "" # your Meta Llama API key
Unterstützte Modelle​
Info
Alle hier aufgeführten Modelle unter https://llama.developer.meta.com/docs/models/ werden unterstützt. Wir pflegen aktiv die Liste der Modelle, Token-Fenster usw. hier.
| Modell-ID | Eingabekontextlänge | Ausgabekontextlänge | Eingabemodalitäten | Ausgabemodalitäten |
|---|---|---|---|---|
Llama-4-Scout-17B-16E-Instruct-FP8 | 128k | 4028 | Text, Bild | Text |
Llama-4-Maverick-17B-128E-Instruct-FP8 | 128k | 4028 | Text, Bild | Text |
Llama-3.3-70B-Instruct | 128k | 4028 | Text | Text |
Llama-3.3-8B-Instruct | 128k | 4028 | Text | Text |
Verwendung - LiteLLM Python SDK​
Nicht-streaming​
Meta Llama Nicht-Streaming-Vervollständigung
import os
import litellm
from litellm import completion
os.environ["LLAMA_API_KEY"] = "" # your Meta Llama API key
messages = [{"content": "Hello, how are you?", "role": "user"}]
# Meta Llama call
response = completion(model="meta_llama/Llama-3.3-70B-Instruct", messages=messages)
Streaming​
Meta Llama Streaming-Vervollständigung
import os
import litellm
from litellm import completion
os.environ["LLAMA_API_KEY"] = "" # your Meta Llama API key
messages = [{"content": "Hello, how are you?", "role": "user"}]
# Meta Llama call with streaming
response = completion(
model="meta_llama/Llama-3.3-70B-Instruct",
messages=messages,
stream=True
)
for chunk in response:
print(chunk)
Verwendung - LiteLLM Proxy​
Fügen Sie Folgendes zu Ihrer LiteLLM Proxy Konfigurationsdatei hinzu
config.yaml
model_list:
- model_name: meta_llama/Llama-3.3-70B-Instruct
litellm_params:
model: meta_llama/Llama-3.3-70B-Instruct
api_key: os.environ/LLAMA_API_KEY
- model_name: meta_llama/Llama-3.3-8B-Instruct
litellm_params:
model: meta_llama/Llama-3.3-8B-Instruct
api_key: os.environ/LLAMA_API_KEY
Starten Sie Ihren LiteLLM Proxy Server
LiteLLM Proxy starten
litellm --config config.yaml
# RUNNING on http://0.0.0.0:4000
- OpenAI SDK
- LiteLLM SDK
- cURL
Meta Llama über Proxy - Nicht-Streaming
from openai import OpenAI
# Initialize client with your proxy URL
client = OpenAI(
base_url="https://:4000", # Your proxy URL
api_key="your-proxy-api-key" # Your proxy API key
)
# Non-streaming response
response = client.chat.completions.create(
model="meta_llama/Llama-3.3-70B-Instruct",
messages=[{"role": "user", "content": "Write a short poem about AI."}]
)
print(response.choices[0].message.content)
Meta Llama über Proxy - Streaming
from openai import OpenAI
# Initialize client with your proxy URL
client = OpenAI(
base_url="https://:4000", # Your proxy URL
api_key="your-proxy-api-key" # Your proxy API key
)
# Streaming response
response = client.chat.completions.create(
model="meta_llama/Llama-3.3-70B-Instruct",
messages=[{"role": "user", "content": "Write a short poem about AI."}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
Meta Llama über Proxy - LiteLLM SDK
import litellm
# Configure LiteLLM to use your proxy
response = litellm.completion(
model="litellm_proxy/meta_llama/Llama-3.3-70B-Instruct",
messages=[{"role": "user", "content": "Write a short poem about AI."}],
api_base="https://:4000",
api_key="your-proxy-api-key"
)
print(response.choices[0].message.content)
Meta Llama über Proxy - LiteLLM SDK Streaming
import litellm
# Configure LiteLLM to use your proxy with streaming
response = litellm.completion(
model="litellm_proxy/meta_llama/Llama-3.3-70B-Instruct",
messages=[{"role": "user", "content": "Write a short poem about AI."}],
api_base="https://:4000",
api_key="your-proxy-api-key",
stream=True
)
for chunk in response:
if hasattr(chunk.choices[0], 'delta') and chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
Meta Llama über Proxy - cURL
curl https://:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-proxy-api-key" \
-d '{
"model": "meta_llama/Llama-3.3-70B-Instruct",
"messages": [{"role": "user", "content": "Write a short poem about AI."}]
}'
Meta Llama über Proxy - cURL Streaming
curl https://:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-proxy-api-key" \
-d '{
"model": "meta_llama/Llama-3.3-70B-Instruct",
"messages": [{"role": "user", "content": "Write a short poem about AI."}],
"stream": true
}'
Weitere detaillierte Informationen zur Verwendung des LiteLLM Proxys finden Sie in der LiteLLM Proxy Dokumentation.