Infinity

Eigenschaft	Details
Beschreibung	Infinity ist eine hochperformante REST-API mit geringer Latenz für die Bereitstellung von Text-Embeddings, Re-Ranking-Modellen und Clip.
Provider-Routing in LiteLLM	`infinity/`
Unterstützte Operationen	`/rerank`, `/embeddings`
Link zur Anbieterdokumentation	Infinity ↗

Verwendung – LiteLLM Python SDK

from litellm import rerank, embedding
import os

os.environ["INFINITY_API_BASE"] = "https://:8080"

response = rerank(
    model="infinity/rerank",
    query="What is the capital of France?",
    documents=["Paris", "London", "Berlin", "Madrid"],
)

Verwendung – LiteLLM Proxy

LiteLLM bietet einen Cohere-API-kompatiblen /rerank-Endpunkt für Rerank-Aufrufe.

Einrichtung

Fügen Sie dies Ihrer LiteLLM Proxy config.yaml hinzu

model_list:
  - model_name: custom-infinity-rerank
    litellm_params:
      model: infinity/rerank
      api_base: https://:8080
      api_key: os.environ/INFINITY_API_KEY

LiteLLM starten

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

Testanfrage:

Rerank

curl http://0.0.0.0:4000/rerank \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "custom-infinity-rerank",
    "query": "What is the capital of the United States?",
    "documents": [
        "Carson City is the capital city of the American state of Nevada.",
        "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
        "Washington, D.C. is the capital of the United States.",
        "Capital punishment has existed in the United States since before it was a country."
    ],
    "top_n": 3
  }'

Unterstützte Cohere Rerank API Parameter

Parameter	Typ	Beschreibung
`query`	`str`	Die Suchanfrage, gegen die die Dokumente neu sortiert werden sollen
`documents`	`list[str]`	Die neu zu sortierenden Dokumente
`top_n`	`int`	Die Anzahl der zurückzugebenden Dokumente
`return_documents`	`bool`	Ob die Dokumente in der Antwort zurückgegeben werden sollen

Verwendung – Dokumente zurückgeben

SDK
PROXY

response = rerank(
    model="infinity/rerank",
    query="What is the capital of France?",
    documents=["Paris", "London", "Berlin", "Madrid"],
    return_documents=True,
)

curl http://0.0.0.0:4000/rerank \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "custom-infinity-rerank",
    "query": "What is the capital of France?",
    "documents": [
        "Paris",
        "London",
        "Berlin",
        "Madrid"
    ],
    "return_documents": True,
  }'

Anbieter-spezifische Parameter übergeben

Nicht zugeordnete Parameter werden unverändert an den Anbieter weitergeleitet.

SDK
PROXY

from litellm import rerank
import os

os.environ["INFINITY_API_BASE"] = "https://:8080"

response = rerank(
    model="infinity/rerank",
    query="What is the capital of France?",
    documents=["Paris", "London", "Berlin", "Madrid"],
    raw_scores=True, # 👈 PROVIDER-SPECIFIC PARAM
)

Konfigurieren Sie config.yaml

model_list:
  - model_name: custom-infinity-rerank
    litellm_params:
      model: infinity/rerank
      api_base: https://:8080
      raw_scores: True # 👈 EITHER SET PROVIDER-SPECIFIC PARAMS HERE OR IN REQUEST BODY

LiteLLM starten

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

Testen Sie es!

curl http://0.0.0.0:4000/rerank \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "custom-infinity-rerank",
    "query": "What is the capital of the United States?",
    "documents": [
        "Carson City is the capital city of the American state of Nevada.",
        "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
        "Washington, D.C. is the capital of the United States.",
        "Capital punishment has existed in the United States since before it was a country."
    ],
    "raw_scores": True # 👈 PROVIDER-SPECIFIC PARAM
  }'

Embeddings

LiteLLM bietet einen OpenAI-API-kompatiblen Endpunkt /embeddings für Embedding-Aufrufe.

Einrichtung

Fügen Sie dies Ihrer LiteLLM Proxy config.yaml hinzu

model_list:
  - model_name: custom-infinity-embedding
    litellm_params:
      model: infinity/provider/custom-embedding-v1
      api_base: http://:8080
      api_key: os.environ/INFINITY_API_KEY

Testanfrage:

curl http://0.0.0.0:4000/embeddings \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "custom-infinity-embedding",
    "input": ["hello"]
  }'

Unterstützte Embedding API Parameter

Parameter	Typ	Beschreibung
`model`	`str`	Das zu verwendende Embedding-Modell
`input`	`list[str]`	Die Texteingaben, für die Embeddings generiert werden sollen
`encoding_format`	`str`	Das Format, in dem die Embeddings zurückgegeben werden sollen (z. B. "float", "base64")
`modality`	`str`	Der Typ der Eingabe (z. B. "text", "image", "audio")

Verwendung – Grundlegende Beispiele

SDK
PROXY

from litellm import embedding
import os

os.environ["INFINITY_API_BASE"] = "https://:8080"

response = embedding(
    model="infinity/bge-small",
    input=["good morning from litellm"]
)

print(response.data[0]['embedding'])

curl http://0.0.0.0:4000/embeddings \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "custom-infinity-embedding",
    "input": ["hello"]
  }'

Verwendung – OpenAI-Client

SDK
PROXY

from openai import OpenAI

client = OpenAI(
  api_key="<LITELLM_MASTER_KEY>",
  base_url="<LITELLM_URL>"
)

response = client.embeddings.create(
  model="bge-small",
  input=["The food was delicious and the waiter..."],
  encoding_format="float"
)

print(response.data[0].embedding)

curl http://0.0.0.0:4000/embeddings \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-small",
    "input": ["The food was delicious and the waiter..."],
    "encoding_format": "float"
  }'

Infinity

Verwendung – LiteLLM Python SDK​

Verwendung – LiteLLM Proxy​

Testanfrage:​

Rerank​

Unterstützte Cohere Rerank API Parameter​

Verwendung – Dokumente zurückgeben​

Anbieter-spezifische Parameter übergeben​

Embeddings​

Testanfrage:​

Unterstützte Embedding API Parameter​

Verwendung – Grundlegende Beispiele​

Verwendung – OpenAI-Client​

Verwendung – LiteLLM Python SDK

Verwendung – LiteLLM Proxy

Testanfrage:

Rerank

Unterstützte Cohere Rerank API Parameter

Verwendung – Dokumente zurückgeben

Anbieter-spezifische Parameter übergeben

Embeddings

Testanfrage:

Unterstützte Embedding API Parameter

Verwendung – Grundlegende Beispiele

Verwendung – OpenAI-Client