Eingehende Anfragen bearbeiten / ablehnen

Daten ändern, bevor LLM-API-Aufrufe über den Proxy erfolgen
Daten ablehnen, bevor LLM-API-Aufrufe erfolgen / bevor die Antwort zurückgegeben wird
Param 'user' für alle OpenAI-Endpunkt-Aufrufe erzwingen

Sehen Sie sich ein vollständiges Beispiel mit unserem Parallel Request Rate Limiter an

Schnellstart

Fügen Sie in Ihrem benutzerdefinierten Handler eine neue Funktion async_pre_call_hook hinzu

Diese Funktion wird direkt vor einem LiteLLM-Completion-Aufruf aufgerufen und ermöglicht es Ihnen, die Daten zu ändern, die in den LiteLLM-Aufruf eingehen Code anzeigen

from litellm.integrations.custom_logger import CustomLogger
import litellm
from litellm.proxy.proxy_server import UserAPIKeyAuth, DualCache
from typing import Optional, Literal

# This file includes the custom callbacks for LiteLLM Proxy
# Once defined, these can be passed in proxy_config.yaml
class MyCustomHandler(CustomLogger): # https://docs.litellm.de/docs/observability/custom_callback#callback-class
    # Class variables or attributes
    def __init__(self):
        pass

    #### CALL HOOKS - proxy only #### 

    async def async_pre_call_hook(self, user_api_key_dict: UserAPIKeyAuth, cache: DualCache, data: dict, call_type: Literal[
            "completion",
            "text_completion",
            "embeddings",
            "image_generation",
            "moderation",
            "audio_transcription",
        ]): 
        data["model"] = "my-new-model"
        return data 

    async def async_post_call_failure_hook(
        self, 
        request_data: dict,
        original_exception: Exception, 
        user_api_key_dict: UserAPIKeyAuth
    ):
        pass

    async def async_post_call_success_hook(
        self,
        data: dict,
        user_api_key_dict: UserAPIKeyAuth,
        response,
    ):
        pass

    async def async_moderation_hook( # call made in parallel to llm api call
        self,
        data: dict,
        user_api_key_dict: UserAPIKeyAuth,
        call_type: Literal["completion", "embeddings", "image_generation", "moderation", "audio_transcription"],
    ):
        pass

    async def async_post_call_streaming_hook(
        self,
        user_api_key_dict: UserAPIKeyAuth,
        response: str,
    ):
        pass

    aasync def async_post_call_streaming_iterator_hook(
        self,
        user_api_key_dict: UserAPIKeyAuth,
        response: Any,
        request_data: dict,
    ) -> AsyncGenerator[ModelResponseStream, None]:
        """
        Passes the entire stream to the guardrail

        This is useful for plugins that need to see the entire stream.
        """
        async for item in response:
            yield item

proxy_handler_instance = MyCustomHandler()

Fügen Sie diese Datei zu Ihrer Proxy-Konfiguration hinzu

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo

litellm_settings:
  callbacks: custom_callbacks.proxy_handler_instance # sets litellm.callbacks = [proxy_handler_instance]

Starten Sie den Server + testen Sie die Anfrage

$ litellm /path/to/config.yaml

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --data ' {
    "model": "gpt-3.5-turbo",
    "messages": [
        {
        "role": "user",
        "content": "good morning good sir"
        }
    ],
    "user": "ishaan-app",
    "temperature": 0.2
    }'

[BETA] NEU async_moderation_hook

Führen Sie eine Moderationsprüfung parallel zum eigentlichen LLM-API-Aufruf durch.

Fügen Sie in Ihrem benutzerdefinierten Handler eine neue Funktion async_moderation_hook hinzu

Dies wird derzeit nur für /chat/completion-Aufrufe unterstützt.
Diese Funktion läuft parallel zum eigentlichen LLM-API-Aufruf.
Wenn Ihr async_moderation_hook eine Ausnahme auslöst, geben wir diese an den Benutzer zurück.

Info

Wir müssen möglicherweise das Funktionsschema in Zukunft aktualisieren, um mehrere Endpunkte zu unterstützen (z. B. einen call_type akzeptieren). Bitte berücksichtigen Sie dies, während Sie dieses Feature ausprobieren.

Sehen Sie sich ein vollständiges Beispiel mit unserem Llama Guard Content Moderation Hook an

from litellm.integrations.custom_logger import CustomLogger
import litellm
from fastapi import HTTPException

# This file includes the custom callbacks for LiteLLM Proxy
# Once defined, these can be passed in proxy_config.yaml
class MyCustomHandler(CustomLogger): # https://docs.litellm.de/docs/observability/custom_callback#callback-class
    # Class variables or attributes
    def __init__(self):
        pass

    #### ASYNC #### 
    
    async def async_log_pre_api_call(self, model, messages, kwargs):
        pass

    async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
        pass

    async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
        pass

    #### CALL HOOKS - proxy only #### 

    async def async_pre_call_hook(self, user_api_key_dict: UserAPIKeyAuth, cache: DualCache, data: dict, call_type: Literal["completion", "embeddings"]):
        data["model"] = "my-new-model"
        return data 
    
    async def async_moderation_hook( ### 👈 KEY CHANGE ###
        self,
        data: dict,
    ):
        messages = data["messages"]
        print(messages)
        if messages[0]["content"] == "hello world": 
            raise HTTPException(
                    status_code=400, detail={"error": "Violated content safety policy"}
                )

proxy_handler_instance = MyCustomHandler()

Fügen Sie diese Datei zu Ihrer Proxy-Konfiguration hinzu

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo

litellm_settings:
  callbacks: custom_callbacks.proxy_handler_instance # sets litellm.callbacks = [proxy_handler_instance]

Starten Sie den Server + testen Sie die Anfrage

$ litellm /path/to/config.yaml

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --data ' {
    "model": "gpt-3.5-turbo",
    "messages": [
        {
        "role": "user",
        "content": "Hello world"
        }
    ],
    }'

Fortgeschritten - 'user'-Parameter erzwingen

Setzen Sie enforce_user_param auf true, um zu verlangen, dass alle Aufrufe an die OpenAI-Endpunkte den 'user'-Parameter enthalten.

Code anzeigen

general_settings:
  enforce_user_param: True

Ergebnis

Fortgeschritten - Abgelehnte Nachricht als Antwort zurückgeben

Für Chat-Completion- und Text-Completion-Aufrufe können Sie eine abgelehnte Nachricht als Benutzerantwort zurückgeben.

Tun Sie dies, indem Sie einen String zurückgeben. LiteLLM kümmert sich darum, die Antwort im richtigen Format zurückzugeben, abhängig vom Endpunkt und ob es sich um Streaming/Nicht-Streaming handelt.

Für Nicht-Chat/Text-Completion-Endpunkte wird diese Antwort als 400er Statuscode-Ausnahme zurückgegeben.

1. Benutzerdefinierten Handler erstellen

from litellm.integrations.custom_logger import CustomLogger
import litellm
from litellm.utils import get_formatted_prompt

# This file includes the custom callbacks for LiteLLM Proxy
# Once defined, these can be passed in proxy_config.yaml
class MyCustomHandler(CustomLogger):
    def __init__(self):
        pass

    #### CALL HOOKS - proxy only #### 

    async def async_pre_call_hook(self, user_api_key_dict: UserAPIKeyAuth, cache: DualCache, data: dict, call_type: Literal[
            "completion",
            "text_completion",
            "embeddings",
            "image_generation",
            "moderation",
            "audio_transcription",
        ]) -> Optional[dict, str, Exception]: 
        formatted_prompt = get_formatted_prompt(data=data, call_type=call_type)

        if "Hello world" in formatted_prompt:
            return "This is an invalid response"

        return data 

proxy_handler_instance = MyCustomHandler()

2. config.yaml aktualisieren

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo

litellm_settings:
  callbacks: custom_callbacks.proxy_handler_instance # sets litellm.callbacks = [proxy_handler_instance]

3. Testen!

$ litellm /path/to/config.yaml

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --data ' {
    "model": "gpt-3.5-turbo",
    "messages": [
        {
        "role": "user",
        "content": "Hello world"
        }
    ],
    }'

Erwartete Antwort

{
    "id": "chatcmpl-d00bbede-2d90-4618-bf7b-11a1c23cf360",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "This is an invalid response.", # 👈 REJECTED RESPONSE
                "role": "assistant"
            }
        }
    ],
    "created": 1716234198,
    "model": null,
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {}
}

Eingehende Anfragen bearbeiten / ablehnen

Schnellstart​

[BETA] NEU async_moderation_hook​

Fortgeschritten - 'user'-Parameter erzwingen​

Fortgeschritten - Abgelehnte Nachricht als Antwort zurückgeben​

1. Benutzerdefinierten Handler erstellen​

2. config.yaml aktualisieren​

3. Testen!​