Infinity
| Eigenschaft | Details |
|---|---|
| Beschreibung | Infinity ist eine hochperformante REST-API mit geringer Latenz für die Bereitstellung von Text-Embeddings, Re-Ranking-Modellen und Clip. |
| Provider-Routing in LiteLLM | infinity/ |
| Unterstützte Operationen | /rerank, /embeddings |
| Link zur Anbieterdokumentation | Infinity ↗ |
Verwendung – LiteLLM Python SDK​
from litellm import rerank, embedding
import os
os.environ["INFINITY_API_BASE"] = "https://:8080"
response = rerank(
model="infinity/rerank",
query="What is the capital of France?",
documents=["Paris", "London", "Berlin", "Madrid"],
)
Verwendung – LiteLLM Proxy​
LiteLLM bietet einen Cohere-API-kompatiblen /rerank-Endpunkt für Rerank-Aufrufe.
Einrichtung
Fügen Sie dies Ihrer LiteLLM Proxy config.yaml hinzu
model_list:
- model_name: custom-infinity-rerank
litellm_params:
model: infinity/rerank
api_base: https://:8080
api_key: os.environ/INFINITY_API_KEY
LiteLLM starten
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
Testanfrage:​
Rerank​
curl http://0.0.0.0:4000/rerank \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "custom-infinity-rerank",
"query": "What is the capital of the United States?",
"documents": [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country."
],
"top_n": 3
}'
Unterstützte Cohere Rerank API Parameter​
| Parameter | Typ | Beschreibung |
|---|---|---|
query | str | Die Suchanfrage, gegen die die Dokumente neu sortiert werden sollen |
documents | list[str] | Die neu zu sortierenden Dokumente |
top_n | int | Die Anzahl der zurückzugebenden Dokumente |
return_documents | bool | Ob die Dokumente in der Antwort zurückgegeben werden sollen |
Verwendung – Dokumente zurückgeben​
- SDK
- PROXY
response = rerank(
model="infinity/rerank",
query="What is the capital of France?",
documents=["Paris", "London", "Berlin", "Madrid"],
return_documents=True,
)
curl http://0.0.0.0:4000/rerank \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "custom-infinity-rerank",
"query": "What is the capital of France?",
"documents": [
"Paris",
"London",
"Berlin",
"Madrid"
],
"return_documents": True,
}'
Anbieter-spezifische Parameter übergeben​
Nicht zugeordnete Parameter werden unverändert an den Anbieter weitergeleitet.
- SDK
- PROXY
from litellm import rerank
import os
os.environ["INFINITY_API_BASE"] = "https://:8080"
response = rerank(
model="infinity/rerank",
query="What is the capital of France?",
documents=["Paris", "London", "Berlin", "Madrid"],
raw_scores=True, # 👈 PROVIDER-SPECIFIC PARAM
)
- Konfigurieren Sie config.yaml
model_list:
- model_name: custom-infinity-rerank
litellm_params:
model: infinity/rerank
api_base: https://:8080
raw_scores: True # 👈 EITHER SET PROVIDER-SPECIFIC PARAMS HERE OR IN REQUEST BODY
- LiteLLM starten
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
- Testen Sie es!
curl http://0.0.0.0:4000/rerank \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "custom-infinity-rerank",
"query": "What is the capital of the United States?",
"documents": [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country."
],
"raw_scores": True # 👈 PROVIDER-SPECIFIC PARAM
}'
Embeddings​
LiteLLM bietet einen OpenAI-API-kompatiblen Endpunkt /embeddings für Embedding-Aufrufe.
Einrichtung
Fügen Sie dies Ihrer LiteLLM Proxy config.yaml hinzu
model_list:
- model_name: custom-infinity-embedding
litellm_params:
model: infinity/provider/custom-embedding-v1
api_base: http://:8080
api_key: os.environ/INFINITY_API_KEY
Testanfrage:​
curl http://0.0.0.0:4000/embeddings \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "custom-infinity-embedding",
"input": ["hello"]
}'
Unterstützte Embedding API Parameter​
| Parameter | Typ | Beschreibung |
|---|---|---|
model | str | Das zu verwendende Embedding-Modell |
input | list[str] | Die Texteingaben, für die Embeddings generiert werden sollen |
encoding_format | str | Das Format, in dem die Embeddings zurückgegeben werden sollen (z. B. "float", "base64") |
modality | str | Der Typ der Eingabe (z. B. "text", "image", "audio") |
Verwendung – Grundlegende Beispiele​
- SDK
- PROXY
from litellm import embedding
import os
os.environ["INFINITY_API_BASE"] = "https://:8080"
response = embedding(
model="infinity/bge-small",
input=["good morning from litellm"]
)
print(response.data[0]['embedding'])
curl http://0.0.0.0:4000/embeddings \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "custom-infinity-embedding",
"input": ["hello"]
}'
Verwendung – OpenAI-Client​
- SDK
- PROXY
from openai import OpenAI
client = OpenAI(
api_key="<LITELLM_MASTER_KEY>",
base_url="<LITELLM_URL>"
)
response = client.embeddings.create(
model="bge-small",
input=["The food was delicious and the waiter..."],
encoding_format="float"
)
print(response.data[0].embedding)
curl http://0.0.0.0:4000/embeddings \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-small",
"input": ["The food was delicious and the waiter..."],
"encoding_format": "float"
}'