Llamafile

LiteLLM unterstützt alle Modelle auf Llamafile.

Eigenschaft	Details
Beschreibung	Llamafile ermöglicht es Ihnen, LLMs mit einer einzigen Datei zu verteilen und auszuführen. Docs
Provider-Routing in LiteLLM	`llamafile/` (für OpenAI-kompatiblen Server)
Provider-Dokumentation	llamafile ↗
Unterstützte Endpunkte	`/chat/completions`, `/embeddings`, `/completions`

Schnellstart

Verwendung - litellm.completion (Aufrufen eines OpenAI-kompatiblen Endpunkts)

llamafile bietet einen OpenAI-kompatiblen Endpunkt für Chat-Vervollständigungen - hier erfahren Sie, wie Sie ihn mit LiteLLM aufrufen können.

Um litellm zum Aufrufen von llamafile zu verwenden, fügen Sie Folgendes zu Ihrem completion-Aufruf hinzu:

model="llamafile/<Ihr-llamafile-Modellname>"
api_base = "Ihr-gehostetes-llamafile"

import litellm 

response = litellm.completion(
            model="llamafile/mistralai/mistral-7b-instruct-v0.2", # pass the llamafile model name for completeness
            messages=messages,
            api_base="https://:8080/v1",
            temperature=0.2,
            max_tokens=80)

print(response)

Verwendung - LiteLLM Proxy Server (Aufrufen eines OpenAI-kompatiblen Endpunkts)

So rufen Sie einen OpenAI-kompatiblen Endpunkt mit dem LiteLLM Proxy Server auf

Konfigurieren Sie die config.yaml

model_list:
  - model_name: my-model
    litellm_params:
      model: llamafile/mistralai/mistral-7b-instruct-v0.2 # add llamafile/ prefix to route as OpenAI provider
      api_base: http://:8080/v1 # add api base for OpenAI compatible provider

Starten Sie den Proxy
```
$ litellm --config /path/to/config.yaml
```

Anfrage an LiteLLM Proxy Server senden

OpenAI Python v1.0.0+
curl

import openai
client = openai.OpenAI(
    api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)

response = client.chat.completions.create(
    model="my-model",
    messages = [
        {
            "role": "user",
            "content": "what llm are you"
        }
    ],
)

print(response)

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "my-model",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ],
}'

Embeddings

SDK
PROXY

from litellm import embedding   
import os

os.environ["LLAMAFILE_API_BASE"] = "https://:8080/v1"

embedding = embedding(model="llamafile/sentence-transformers/all-MiniLM-L6-v2", input=["Hello world"])

print(embedding)

Konfigurieren Sie config.yaml

model_list:
    - model_name: my-model
      litellm_params:
        model: llamafile/sentence-transformers/all-MiniLM-L6-v2 # add llamafile/ prefix to route as OpenAI provider
        api_base: http://:8080/v1 # add api base for OpenAI compatible provider

Starten Sie den Proxy

$ litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

Testen Sie es!

curl -L -X POST 'http://0.0.0.0:4000/embeddings' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{"input": ["hello world"], "model": "my-model"}'

Siehe Beispiele für OpenAI SDK/Langchain/etc.

Llamafile

Schnellstart

Verwendung - litellm.completion (Aufrufen eines OpenAI-kompatiblen Endpunkts)​

Verwendung - LiteLLM Proxy Server (Aufrufen eines OpenAI-kompatiblen Endpunkts)​

Embeddings​

Verwendung - litellm.completion (Aufrufen eines OpenAI-kompatiblen Endpunkts)

Verwendung - LiteLLM Proxy Server (Aufrufen eines OpenAI-kompatiblen Endpunkts)

Embeddings