API Documentation
LLM API provides high-availability Claude API service using the same API format as Anthropic. Change your Base URL and start building — no other code changes required.
Base URL
https://llmapi.pro
One-Click Setup (Recommended)
Automatically detects issues, configures environment, and launches Claude Code. Works with all IDEs including Claude Code CLI, VS Code, Cursor, Windsurf, and JetBrains.
irm llmapi.pro/setup.ps1 | iex
CMD users, enter this first powershell then run
curl -fsSL llmapi.pro/setup.sh | bash
The script guides you to enter your API Key, automatically completes all setup, and launches Claude Code.
Get Started in 3 Steps
Create an account and get your API key
Register for free, then copy your API key from the Dashboard.
Set your environment variables
export ANTHROPIC_BASE_URL=https://llmapi.pro
export ANTHROPIC_API_KEY=your-api-key
Launch Claude Code
Open your terminal and type claude. That's it!
claude
Claude Code Integration
Configure the Claude Code CLI to route all requests through LLM API. Set two environment variables and launch Claude Code as usual — no plugins or patches needed.
Temporary (current shell session only)
export ANTHROPIC_BASE_URL=https://llmapi.pro
export ANTHROPIC_API_KEY=your-api-key
Permanent (add to ~/.bashrc or ~/.zshrc)
echo 'export ANTHROPIC_BASE_URL=https://llmapi.pro' >> ~/.zshrc
echo 'export ANTHROPIC_API_KEY=your-api-key' >> ~/.zshrc
source ~/.zshrc
Then simply launch Claude Code
claude
Environment Variables
| Variable | Required | Description |
|---|---|---|
ANTHROPIC_BASE_URL |
Yes | Set to https://llmapi.pro to proxy requests through LLM API. |
ANTHROPIC_API_KEY |
Yes | Your LLM API key (starts with sk-llmapi-). Get it from the Dashboard. |
Supported Models
Supports the full Claude model family. Use the same model as the official API.
| Model | Description | Context |
|---|---|---|
claude-opus-4-8 |
Anthropic's most powerful model. World-class coding, complex reasoning, advanced analysis. | 1M tokens |
claude-sonnet-4-7 |
Best balance of intelligence and speed. Default for Claude Code. | 1M tokens |
claude-haiku-4-5 |
Fastest and most cost-effective. Ideal for lightweight tasks and high throughput. | 200K tokens |
Hermes Agent Integration
LLM API is a Claude-compatible relay that pools multiple upstream coding plans behind one Anthropic-shape endpoint (x-api-key, /v1/messages, SSE, tool_use). It plugs into Hermes Agent's anthropic_messages protocol with no upstream patch required.
3-step setup
Create an account and get your API key
Register for free, then copy your API key from the Dashboard.
Edit ~/.hermes/config.yaml
custom_providers:
- name: llmapi
base_url: https://llmapi.pro
api_key: ${LLMAPI_KEY}
api_mode: anthropic_messages
Set the key and pick a model
Put the key in ~/.hermes/.env, then in Hermes run the model command:
LLMAPI_KEY=sk-cp-xxxxxxxx
/model custom:llmapi:claude-sonnet-4-6
Why route Hermes through LLM API
- Anthropic-compatible (Hermes's native
anthropic_messagesprotocol, including SSE andtool_use). - One key, multiple upstreams: Sonnet/Opus class plus M2-class compatible routes for cost-sensitive sessions.
- Subscription option in addition to per-token billing.
- China-region egress for users where direct upstreams are slow.
Notes
- Belt-and-suspenders: after the first switch, run
hermes config set model.api_mode anthropic_messagesto explicitly setapi_modeafter the named-custom switch. No-op on Hermes ≥ v0.5; safety net for older versions. - LLM API is a Claude-compatible relay, not Anthropic. Models named “Claude *” on LLM API route to Claude-compatible upstreams; check the model list for the current routes.
OpenAI / Codex CLI Integration
In addition to the Anthropic /v1/messages endpoint, LLM API also exposes OpenAI-compatible endpoints at the same base URL. Point any OpenAI SDK, LiteLLM, or compatible CLI at https://llmapi.pro/v1 and it works out of the box.
Endpoints:
POST /v1/chat/completions— OpenAI Chat Completions API (OpenAI SDK, LiteLLM, older Codex versions)POST /v1/responses— OpenAI Responses API (Codex CLI0.130+)
Codex CLI
Codex CLI 0.130+ uses the OpenAI Responses API. Add an entry to ~/.codex/config.toml:
model = "claude-sonnet-4-7"
model_provider = "llmapi"
[model_providers.llmapi]
name = "llmapi"
base_url = "https://llmapi.pro/v1"
wire_api = "responses"
env_key = "OPENAI_API_KEY"
Then export your LLM API key and run Codex:
export OPENAI_API_KEY=sk-relay-your-key-here
codex exec "list the files in this folder"
OpenAI SDK (Python)
Set base_url and api_key. No other code changes.
from openai import OpenAI
client = OpenAI(
base_url="https://llmapi.pro/v1",
api_key="sk-relay-your-key-here",
)
resp = client.chat.completions.create(
model="claude-sonnet-4-7",
messages=[{"role": "user", "content": "Say hello in 3 words"}],
)
print(resp.choices[0].message.content)
OpenAI SDK (Node.js)
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://llmapi.pro/v1',
apiKey: 'sk-relay-your-key-here',
});
const resp = await client.chat.completions.create({
model: 'claude-sonnet-4-7',
messages: [{ role: 'user', content: 'Say hello in 3 words' }],
});
console.log(resp.choices[0].message.content);
LiteLLM
LiteLLM works with any OpenAI-compatible endpoint. Use the openai/<model> prefix:
import litellm
resp = litellm.completion(
model="openai/claude-sonnet-4-7",
api_base="https://llmapi.pro/v1",
api_key="sk-relay-your-key-here",
messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content)
curl
curl https://llmapi.pro/v1/chat/completions \
-H "Authorization: Bearer sk-relay-your-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-7",
"messages": [{"role":"user","content":"hello"}],
"stream": false
}'
Notes
- Streaming is fully supported. Pass
stream: truefor SSE (chat.completion.chunkevents) or use Codex CLI streaming out of the box. - Tool calling works in both directions. Pass OpenAI-style
toolsand we translate to Anthropictool_useinternally; responses come back astool_calls. - Same API key works on both
/v1/messagesand/v1/chat/completions— pick whichever protocol your client speaks. - Available models are the same as the Anthropic endpoint. The OpenAI-shape response simply echoes back the model name you requested.
Authentication
Every API request must include a valid API key. You can pass your key using either of the two headers below. Create and manage keys in your Dashboard.
Supported Auth Headers
| Header | Format | Description |
|---|---|---|
x-api-key |
sk-llmapi-xxxx |
Primary method. Compatible with the Anthropic SDK. |
Authorization |
Bearer sk-llmapi-xxxx |
Alternative Bearer-token method. |
Example: cURL with x-api-key
curl https://llmapi.pro/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: sk-llmapi-xxxxxxxxxxxxxxxxxxxxxxxx" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-7",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
}'
Security note: Never expose your API key in client-side code or public repositories. If a key is compromised, delete it immediately from the Dashboard and create a new one.
API Reference
/v1/messages
Create a message. Fully compatible with the Anthropic Messages API.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model identifier, e.g. claude-sonnet-4-7 |
messages |
array | Yes | Array of message objects with role (user | assistant) and content. |
max_tokens |
integer | Yes | Maximum number of tokens to generate in the response. |
system |
string | No | System prompt that sets behavior and context for the model. |
temperature |
number | No | Sampling temperature between 0 and 1. Lower values are more deterministic. |
tools |
array | No | List of tool definitions the model may use (function calling). |
tool_choice |
object | No | Controls tool use: auto, any, or tool with a specific name. |
stream |
boolean | No | Enable Server-Sent Events streaming. Default: false. |
Non-Streaming Response
When stream is false (default), the API returns a single JSON object:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! I can help you with a wide range of tasks..."
}
],
"model": "claude-sonnet-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 58
}
}
Streaming Response (SSE)
When stream is true, the response is delivered as Server-Sent Events. Each event has an event field and a JSON data field:
event: message_start
data: {"type":"message_start","message":{"id":"msg_01...","type":"message","role":"assistant","content":[],"model":"claude-sonnet-4-7","usage":{"input_tokens":12,"output_tokens":0}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"! I can"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":58}}
event: message_stop
data: {"type":"message_stop"}
SSE Event Types
| Event | Description |
|---|---|
message_start |
Sent once at the beginning. Contains the message object with metadata. |
content_block_start |
Marks the start of a new content block (text or tool_use). |
content_block_delta |
Incremental content update. Concatenate delta.text to build the full response. |
content_block_stop |
The content block is complete. |
message_delta |
Final message metadata including stop_reason and total usage. |
message_stop |
End of stream. Close the connection. |
Models
The latest Claude model family from Anthropic.
| Model | Description | Context | Max Output |
|---|---|---|---|
claude-opus-4-8 |
Anthropic's most powerful model. World-class coding, complex reasoning, advanced analysis. | 1M tokens | 128K tokens |
claude-sonnet-4-7 |
Best balance of intelligence and speed. Default for Claude Code. | 1M tokens | 128K tokens |
claude-haiku-4-5 |
Fastest and most cost-effective. Ideal for lightweight tasks and high throughput. | 200K tokens | 64K tokens |
Error Handling
LLM API uses standard HTTP status codes. Error responses always return a JSON body with a machine-readable type and a human-readable message.
Error Response Format
{
"type": "error",
"error": {
"type": "authentication_error",
"message": "Invalid API key provided."
}
}
HTTP Status Codes
| Status | Error Type | Description | Recommended Action |
|---|---|---|---|
400 |
invalid_request_error |
Malformed request body or missing required parameters. | Verify your JSON payload and required fields. |
401 |
authentication_error |
Invalid or missing API key. | Check that x-api-key is set correctly. |
403 |
permission_error |
Insufficient permissions or account suspended. | Verify your account status and plan entitlements. |
429 |
rate_limit_error |
Too many requests. Rate limit exceeded. | Back off and retry. See Rate Limits. |
500 |
api_error |
Internal server error. | Retry after a brief delay. Contact support if it persists. |
503 |
overloaded_error |
Upstream provider is temporarily overloaded. | Wait a moment and retry with exponential backoff. |
Rate Limits
Rate limits vary by plan and are enforced per API key. When you exceed a limit, the API returns a 429 status code. Upgrade your plan for higher throughput.
Per-Plan Limits
| Plan | Request Quota | Monthly Token Quota |
|---|---|---|
| Free | 40 / 5h, 200 / week | Unlimited |
| Pro | 400 / 5h, 2,000 / week | Unlimited |
| Max 5x | 1,200 / 5h, 6,000 / week | Unlimited |
| Max 20x | 3,000 / 5h, 15,000 / week | Unlimited |
Rate Limit Headers
Every API response includes headers to help you track your usage in real time:
anthropic-ratelimit-requests-limit: 60
anthropic-ratelimit-requests-remaining: 58
anthropic-ratelimit-requests-reset: 2026-04-08T12:01:00.000Z
anthropic-ratelimit-tokens-limit: 800000
anthropic-ratelimit-tokens-remaining: 800000
| Header | Description |
|---|---|
x-ratelimit-limit |
Maximum number of requests allowed per 5-hour window for your plan. |
x-ratelimit-remaining |
Number of requests remaining in the current rate-limit window. |
x-ratelimit-reset |
ISO 8601 timestamp when the rate-limit window resets. |
Ready to get started?
Create a free account and send your first API request in under a minute.