Replicate

LiteLLM unterstützt alle Modelle auf Replicate

Verwendung

SDK
PROXY

API SCHLÜSSEL

import os 
os.environ["REPLICATE_API_KEY"] = ""

Beispielaufruf

from litellm import completion
import os
## set ENV variables
os.environ["REPLICATE_API_KEY"] = "replicate key"

# replicate llama-3 call
response = completion(
    model="replicate/meta/meta-llama-3-8b-instruct", 
    messages = [{ "content": "Hello, how are you?","role": "user"}]
)

Modelle zu Ihrer config.yaml hinzufügen

model_list:
  - model_name: llama-3
    litellm_params:
      model: replicate/meta/meta-llama-3-8b-instruct
      api_key: os.environ/REPLICATE_API_KEY

Starten Sie den Proxy

$ litellm --config /path/to/config.yaml --debug

Anfrage an LiteLLM Proxy Server senden

OpenAI Python v1.0.0+
curl

import openai
client = openai.OpenAI(
    api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)

response = client.chat.completions.create(
    model="llama-3",
    messages = [
      {
          "role": "system",
          "content": "Be a good human!"
      },
      {
          "role": "user",
          "content": "What do you know about earth?"
      }
  ]
)

print(response)

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "llama-3",
    "messages": [
      {
          "role": "system",
          "content": "Be a good human!"
      },
      {
          "role": "user",
          "content": "What do you know about earth?"
      }
      ],
}'

Erwarteter Replicate-Aufruf

Dies ist der Aufruf, den litellm für Replicate aus dem obigen Beispiel machen wird

POST Request Sent from LiteLLM:
curl -X POST \
https://api.replicate.com/v1/models/meta/meta-llama-3-8b-instruct \
-H 'Authorization: Token your-api-key' -H 'Content-Type: application/json' \
-d '{'version': 'meta/meta-llama-3-8b-instruct', 'input': {'prompt': '<|start_header_id|>system<|end_header_id|>\n\nBe a good human!<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat do you know about earth?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'}}'

Erweiterte Nutzung – Prompt-Formatierung

LiteLLM verfügt über Prompt-Template-Zuordnungen für alle meta-llama llama3 instruct-Modelle. Code anzeigen

Zum Anwenden eines benutzerdefinierten Prompt-Templates

SDK
PROXY

import litellm

import os 
os.environ["REPLICATE_API_KEY"] = ""

# Create your own custom prompt template 
litellm.register_prompt_template(
        model="togethercomputer/LLaMA-2-7B-32K",
        initial_prompt_value="You are a good assistant" # [OPTIONAL]
        roles={
            "system": {
                "pre_message": "[INST] <<SYS>>\n", # [OPTIONAL]
                "post_message": "\n<</SYS>>\n [/INST]\n" # [OPTIONAL]
            },
            "user": { 
                "pre_message": "[INST] ", # [OPTIONAL]
                "post_message": " [/INST]" # [OPTIONAL]
            }, 
            "assistant": {
                "pre_message": "\n" # [OPTIONAL]
                "post_message": "\n" # [OPTIONAL]
            }
        }
        final_prompt_value="Now answer as best you can:" # [OPTIONAL]
)

def test_replicate_custom_model():
    model = "replicate/togethercomputer/LLaMA-2-7B-32K"
    response = completion(model=model, messages=messages)
    print(response['choices'][0]['message']['content'])
    return response

test_replicate_custom_model()

# Model-specific parameters
model_list:
  - model_name: mistral-7b # model alias
    litellm_params: # actual params for litellm.completion()
      model: "replicate/mistralai/Mistral-7B-Instruct-v0.1" 
      api_key: os.environ/REPLICATE_API_KEY
      initial_prompt_value: "\n"
      roles: {"system":{"pre_message":"<|im_start|>system\n", "post_message":"<|im_end|>"}, "assistant":{"pre_message":"<|im_start|>assistant\n","post_message":"<|im_end|>"}, "user":{"pre_message":"<|im_start|>user\n","post_message":"<|im_end|>"}}
      final_prompt_value: "\n"
      bos_token: "<s>"
      eos_token: "</s>"
      max_tokens: 4096

Erweiterte Nutzung – Aufrufen von Replicate-Deployments

Aufrufen eines bereitgestellten Replicate LLM Fügen Sie das Präfix replicate/deployments/ zu Ihrem Modell hinzu, damit litellm den Endpunkt deployments aufruft. Dies ruft die Bereitstellung ishaan-jaff/ishaan-mistral auf Replicate auf

response = completion(
    model="replicate/deployments/ishaan-jaff/ishaan-mistral", 
    messages= [{ "content": "Hello, how are you?","role": "user"}]
)

Replicate Kaltstarts

Replicate-Antworten können aufgrund von Replicate-Kaltstarts 3-5 Minuten dauern. Wenn Sie versuchen, Fehler zu beheben, versuchen Sie, die Anfrage mit litellm.set_verbose=True zu stellen. Weitere Informationen zu Replicate-Kaltstarts

Replicate-Modelle

liteLLM unterstützt alle Replicate LLMs

Stellen Sie für Replicate-Modelle sicher, dass Sie dem Argument model ein Präfix replicate/ voranstellen. liteLLM erkennt dies anhand dieses Arguments.

Nachfolgend finden Sie Beispiele, wie Sie Replicate LLMs mit liteLLM aufrufen können

Modellname	Funktionsaufruf	Erforderliche OS-Variablen
replicate/llama-2-70b-chat	`completion(model='replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf', messages)`	`os.environ['REPLICATE_API_KEY']`
a16z-infra/llama-2-13b-chat	`completion(model='replicate/a16z-infra/llama-2-13b-chat:2a7f981751ec7fdf87b5b91ad4db53683a98082e9ff7bfd12c8cd5ea85980a52', messages)`	`os.environ['REPLICATE_API_KEY']`
replicate/vicuna-13b	`completion(model='replicate/vicuna-13b:6282abe6a492de4145d7bb601023762212f9ddbbe78278bd6771c8b3b2f2a13b', messages)`	`os.environ['REPLICATE_API_KEY']`
daanelson/flan-t5-large	`completion(model='replicate/daanelson/flan-t5-large:ce962b3f6792a57074a601d3979db5839697add2e4e02696b3ced4c022d4767f', messages)`	`os.environ['REPLICATE_API_KEY']`
custom-llm	`completion(model='replicate/custom-llm-version-id', messages)`	`os.environ['REPLICATE_API_KEY']`
replicate deployment	`completion(model='replicate/deployments/ishaan-jaff/ishaan-mistral', messages)`	`os.environ['REPLICATE_API_KEY']`

Übergabe zusätzlicher Parameter - max_tokens, temperature

Alle von litellm.completion unterstützten Parameter finden Sie hier

# !pip install litellm
from litellm import completion
import os
## set ENV variables
os.environ["REPLICATE_API_KEY"] = "replicate key"

# replicate llama-2 call
response = completion(
    model="replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf", 
    messages = [{ "content": "Hello, how are you?","role": "user"}],
    max_tokens=20,
    temperature=0.5
)

Proxy

  model_list:
    - model_name: llama-3
      litellm_params:
        model: replicate/meta/meta-llama-3-8b-instruct
        api_key: os.environ/REPLICATE_API_KEY
        max_tokens: 20
        temperature: 0.5

Übergeben von Replicate-spezifischen Parametern

Senden Sie Parameter, die von litellm.completion() nicht unterstützt werden, aber von Replicate unterstützt werden, indem Sie sie an litellm.completion übergeben

Beispiel: seed, min_tokens sind Replicate-spezifische Parameter

# !pip install litellm
from litellm import completion
import os
## set ENV variables
os.environ["REPLICATE_API_KEY"] = "replicate key"

# replicate llama-2 call
response = completion(
    model="replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf", 
    messages = [{ "content": "Hello, how are you?","role": "user"}],
    seed=-1,
    min_tokens=2,
    top_k=20,
)

Proxy

  model_list:
    - model_name: llama-3
      litellm_params:
        model: replicate/meta/meta-llama-3-8b-instruct
        api_key: os.environ/REPLICATE_API_KEY
        min_tokens: 2
        top_k: 20

Replicate

Verwendung​

API SCHLÜSSEL​

Beispielaufruf​

Erwarteter Replicate-Aufruf​

Erweiterte Nutzung – Prompt-Formatierung​

Erweiterte Nutzung – Aufrufen von Replicate-Deployments​

Replicate-Modelle​

Übergabe zusätzlicher Parameter - max_tokens, temperature​

Übergeben von Replicate-spezifischen Parametern​