Skip to Content
Observability

LLM

Welcome to the LLM documentation. This section provides detailed information about the intelligent AI gateway for language models.

Overview

LLM provides a powerful gateway for routing AI requests to the optimal language models based on capabilities, cost, and performance requirements. It enables you to leverage the best AI models for each specific task without being locked into a single provider.

Intelligent AI Gateway

The LLM gateway intelligently routes requests to the most appropriate model based on the task requirements and model capabilities. This allows you to:

  1. Access multiple AI models through a single, unified interface
  2. Automatically select the best model for each specific task
  3. Optimize for cost, performance, or capabilities based on your requirements
  4. Avoid vendor lock-in by abstracting away model-specific implementation details

Model Selection

The gateway can select models based on various criteria:

  • Capabilities: Choose models that support specific features like code generation, reasoning, or vision
  • Cost: Optimize for the lowest cost model that meets your requirements
  • Performance: Select models based on latency, throughput, or other performance metrics
  • Quality: Use models that provide the highest quality results for your specific use case

This intelligent routing happens behind the scenes, allowing you to focus on your application logic rather than model selection and integration.

Endpoints

Generate Text

POST /generate

Generates text based on the provided prompt.

Request Body

{ "prompt": "Write a short story about a robot learning to paint.", "model": "openai/gpt-4", "maxTokens": 500, "temperature": 0.7, "topP": 0.9, "frequencyPenalty": 0, "presencePenalty": 0 }

Response

{ "id": "generation_123", "text": "In a small studio apartment overlooking the city, Robot Unit 7 stood motionless before a blank canvas...", "model": "openai/gpt-4", "promptTokens": 10, "completionTokens": 487, "totalTokens": 497, "createdAt": "2023-01-01T00:00:00Z" }

Chat Completion

POST /chat

Generates a response to a conversation.

Request Body

{ "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is the capital of France?" } ], "model": "anthropic/claude-3-opus", "maxTokens": 100, "temperature": 0.5, "topP": 0.9, "frequencyPenalty": 0, "presencePenalty": 0 }

Response

{ "id": "chat_123", "message": { "role": "assistant", "content": "The capital of France is Paris. It's known as the 'City of Light' and is famous for landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral." }, "model": "anthropic/claude-3-opus", "promptTokens": 25, "completionTokens": 35, "totalTokens": 60, "createdAt": "2023-01-01T00:00:00Z" }

Embeddings

POST /embeddings

Generates embeddings for the provided text.

Request Body

{ "text": "The quick brown fox jumps over the lazy dog.", "model": "openai/text-embedding-3-large" }

Response

{ "id": "embedding_123", "embedding": [0.1, 0.2, 0.3, ...], "model": "openai/text-embedding-3-large", "tokens": 10, "createdAt": "2023-01-01T00:00:00Z" }

List Models

GET /models

Returns a list of available language models.

Response

{ "data": [ { "id": "openai/gpt-4", "provider": "openai", "name": "gpt-4", "capabilities": ["chat", "code", "reasoning"], "maxTokens": 8192 }, { "id": "anthropic/claude-3-opus", "provider": "anthropic", "name": "claude-3-opus", "capabilities": ["chat", "reasoning", "vision"], "maxTokens": 100000 }, { "id": "openai/text-embedding-3-large", "provider": "openai", "name": "text-embedding-3-large", "capabilities": ["embeddings"], "dimensions": 3072 } ] }

Get Model

GET /models/:id

Returns details for a specific model.

Response

{ "id": "openai/gpt-4", "provider": "openai", "name": "gpt-4", "capabilities": ["chat", "code", "reasoning"], "maxTokens": 8192, "pricing": { "input": 0.00003, "output": 0.00006 }, "createdAt": "2023-01-01T00:00:00Z" }

Function Calling

POST /function-calling

Executes a function call using a language model.

Request Body

{ "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What's the weather like in New York?" } ], "functions": [ { "name": "getWeather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use" } }, "required": ["location"] } } ], "model": "openai/gpt-4", "temperature": 0.5 }

Response

{ "id": "function_call_123", "functionCall": { "name": "getWeather", "arguments": { "location": "New York, NY", "unit": "fahrenheit" } }, "model": "openai/gpt-4", "promptTokens": 50, "completionTokens": 20, "totalTokens": 70, "createdAt": "2023-01-01T00:00:00Z" }

Error Handling

The API uses standard HTTP status codes to indicate the success or failure of requests. For more information about error codes and how to handle errors, see the Error Handling documentation.

Rate Limiting

API requests are subject to rate limiting to ensure fair usage and system stability. For more information about rate limits and how to handle rate-limiting responses, see the Rate Limiting documentation.

Webhooks

You can configure webhooks to receive notifications about LLM events. For more information about webhooks, see the Webhooks documentation.

SDKs

We provide SDKs for popular programming languages to make it easier to integrate with the LLM API:

Last updated on