API Documentation

LLM API provides high-availability Claude API service using the same API format as Anthropic. Change your Base URL and start building — no other code changes required.

Base URL

https://llmapi.pro

One-Click Setup (Recommended)

Automatically detects issues, configures environment, and launches Claude Code. Works with all IDEs including Claude Code CLI, VS Code, Cursor, Windsurf, and JetBrains.

Windows(PowerShell)
irm llmapi.pro/setup.ps1 | iex

CMD users, enter this first powershell then run

macOS / Linux
curl -fsSL llmapi.pro/setup.sh | bash

The script guides you to enter your API Key, automatically completes all setup, and launches Claude Code.

Get Started in 3 Steps

1

Create an account and get your API key

Register for free, then copy your API key from the Dashboard.

2

Set your environment variables

export ANTHROPIC_BASE_URL=https://llmapi.pro
export ANTHROPIC_API_KEY=your-api-key
3

Launch Claude Code

Open your terminal and type claude. That's it!

claude

Claude Code Integration

Configure the Claude Code CLI to route all requests through LLM API. Set two environment variables and launch Claude Code as usual — no plugins or patches needed.

Temporary (current shell session only)

export ANTHROPIC_BASE_URL=https://llmapi.pro
export ANTHROPIC_API_KEY=your-api-key

Permanent (add to ~/.bashrc or ~/.zshrc)

echo 'export ANTHROPIC_BASE_URL=https://llmapi.pro' >> ~/.zshrc
echo 'export ANTHROPIC_API_KEY=your-api-key' >> ~/.zshrc
source ~/.zshrc

Then simply launch Claude Code

claude

Environment Variables

Variable Required Description
ANTHROPIC_BASE_URL Yes Set to https://llmapi.pro to proxy requests through LLM API.
ANTHROPIC_API_KEY Yes Your LLM API key (starts with sk-llmapi-). Get it from the Dashboard.

Supported Models

Supports the full Claude model family. Use the same model as the official API.

Model Description Context
claude-opus-4-8 Anthropic's most powerful model. World-class coding, complex reasoning, advanced analysis. 1M tokens
claude-sonnet-4-7 Best balance of intelligence and speed. Default for Claude Code. 1M tokens
claude-haiku-4-5 Fastest and most cost-effective. Ideal for lightweight tasks and high throughput. 200K tokens

Hermes Agent Integration

LLM API is a Claude-compatible relay that pools multiple upstream coding plans behind one Anthropic-shape endpoint (x-api-key, /v1/messages, SSE, tool_use). It plugs into Hermes Agent's anthropic_messages protocol with no upstream patch required.

3-step setup

1

Create an account and get your API key

Register for free, then copy your API key from the Dashboard.

2

Edit ~/.hermes/config.yaml

YAML
custom_providers:
  - name: llmapi
    base_url: https://llmapi.pro
    api_key: ${LLMAPI_KEY}
    api_mode: anthropic_messages
3

Set the key and pick a model

Put the key in ~/.hermes/.env, then in Hermes run the model command:

LLMAPI_KEY=sk-cp-xxxxxxxx
/model custom:llmapi:claude-sonnet-4-6

Why route Hermes through LLM API

  • Anthropic-compatible (Hermes's native anthropic_messages protocol, including SSE and tool_use).
  • One key, multiple upstreams: Sonnet/Opus class plus M2-class compatible routes for cost-sensitive sessions.
  • Subscription option in addition to per-token billing.
  • China-region egress for users where direct upstreams are slow.

Notes

  • Belt-and-suspenders: after the first switch, run hermes config set model.api_mode anthropic_messages to explicitly set api_mode after the named-custom switch. No-op on Hermes ≥ v0.5; safety net for older versions.
  • LLM API is a Claude-compatible relay, not Anthropic. Models named “Claude *” on LLM API route to Claude-compatible upstreams; check the model list for the current routes.

OpenAI / Codex CLI Integration

In addition to the Anthropic /v1/messages endpoint, LLM API also exposes OpenAI-compatible endpoints at the same base URL. Point any OpenAI SDK, LiteLLM, or compatible CLI at https://llmapi.pro/v1 and it works out of the box.

Endpoints:

  • POST /v1/chat/completions — OpenAI Chat Completions API (OpenAI SDK, LiteLLM, older Codex versions)
  • POST /v1/responses — OpenAI Responses API (Codex CLI 0.130+)

Codex CLI

Codex CLI 0.130+ uses the OpenAI Responses API. Add an entry to ~/.codex/config.toml:

TOML
model = "claude-sonnet-4-7"
model_provider = "llmapi"

[model_providers.llmapi]
name = "llmapi"
base_url = "https://llmapi.pro/v1"
wire_api = "responses"
env_key = "OPENAI_API_KEY"

Then export your LLM API key and run Codex:

Shell
export OPENAI_API_KEY=sk-relay-your-key-here
codex exec "list the files in this folder"

OpenAI SDK (Python)

Set base_url and api_key. No other code changes.

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://llmapi.pro/v1",
    api_key="sk-relay-your-key-here",
)

resp = client.chat.completions.create(
    model="claude-sonnet-4-7",
    messages=[{"role": "user", "content": "Say hello in 3 words"}],
)
print(resp.choices[0].message.content)

OpenAI SDK (Node.js)

JavaScript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://llmapi.pro/v1',
  apiKey: 'sk-relay-your-key-here',
});

const resp = await client.chat.completions.create({
  model: 'claude-sonnet-4-7',
  messages: [{ role: 'user', content: 'Say hello in 3 words' }],
});
console.log(resp.choices[0].message.content);

LiteLLM

LiteLLM works with any OpenAI-compatible endpoint. Use the openai/<model> prefix:

Python
import litellm

resp = litellm.completion(
    model="openai/claude-sonnet-4-7",
    api_base="https://llmapi.pro/v1",
    api_key="sk-relay-your-key-here",
    messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content)

curl

Shell
curl https://llmapi.pro/v1/chat/completions \
  -H "Authorization: Bearer sk-relay-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-7",
    "messages": [{"role":"user","content":"hello"}],
    "stream": false
  }'

Notes

  • Streaming is fully supported. Pass stream: true for SSE (chat.completion.chunk events) or use Codex CLI streaming out of the box.
  • Tool calling works in both directions. Pass OpenAI-style tools and we translate to Anthropic tool_use internally; responses come back as tool_calls.
  • Same API key works on both /v1/messages and /v1/chat/completions — pick whichever protocol your client speaks.
  • Available models are the same as the Anthropic endpoint. The OpenAI-shape response simply echoes back the model name you requested.

Authentication

Every API request must include a valid API key. You can pass your key using either of the two headers below. Create and manage keys in your Dashboard.

Supported Auth Headers

Header Format Description
x-api-key sk-llmapi-xxxx Primary method. Compatible with the Anthropic SDK.
Authorization Bearer sk-llmapi-xxxx Alternative Bearer-token method.

Example: cURL with x-api-key

curl https://llmapi.pro/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: sk-llmapi-xxxxxxxxxxxxxxxxxxxxxxxx" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-7",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
  }'

Security note: Never expose your API key in client-side code or public repositories. If a key is compromised, delete it immediately from the Dashboard and create a new one.

API Reference

POST /v1/messages

Create a message. Fully compatible with the Anthropic Messages API.

Request Parameters

Parameter Type Required Description
model string Yes Model identifier, e.g. claude-sonnet-4-7
messages array Yes Array of message objects with role (user | assistant) and content.
max_tokens integer Yes Maximum number of tokens to generate in the response.
system string No System prompt that sets behavior and context for the model.
temperature number No Sampling temperature between 0 and 1. Lower values are more deterministic.
tools array No List of tool definitions the model may use (function calling).
tool_choice object No Controls tool use: auto, any, or tool with a specific name.
stream boolean No Enable Server-Sent Events streaming. Default: false.

Non-Streaming Response

When stream is false (default), the API returns a single JSON object:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! I can help you with a wide range of tasks..."
    }
  ],
  "model": "claude-sonnet-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 58
  }
}

Streaming Response (SSE)

When stream is true, the response is delivered as Server-Sent Events. Each event has an event field and a JSON data field:

event: message_start
data: {"type":"message_start","message":{"id":"msg_01...","type":"message","role":"assistant","content":[],"model":"claude-sonnet-4-7","usage":{"input_tokens":12,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"! I can"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":58}}

event: message_stop
data: {"type":"message_stop"}

SSE Event Types

Event Description
message_start Sent once at the beginning. Contains the message object with metadata.
content_block_start Marks the start of a new content block (text or tool_use).
content_block_delta Incremental content update. Concatenate delta.text to build the full response.
content_block_stop The content block is complete.
message_delta Final message metadata including stop_reason and total usage.
message_stop End of stream. Close the connection.

Models

The latest Claude model family from Anthropic.

Model Description Context Max Output
claude-opus-4-8 Anthropic's most powerful model. World-class coding, complex reasoning, advanced analysis. 1M tokens 128K tokens
claude-sonnet-4-7 Best balance of intelligence and speed. Default for Claude Code. 1M tokens 128K tokens
claude-haiku-4-5 Fastest and most cost-effective. Ideal for lightweight tasks and high throughput. 200K tokens 64K tokens

Error Handling

LLM API uses standard HTTP status codes. Error responses always return a JSON body with a machine-readable type and a human-readable message.

Error Response Format

{
  "type": "error",
  "error": {
    "type": "authentication_error",
    "message": "Invalid API key provided."
  }
}

HTTP Status Codes

Status Error Type Description Recommended Action
400 invalid_request_error Malformed request body or missing required parameters. Verify your JSON payload and required fields.
401 authentication_error Invalid or missing API key. Check that x-api-key is set correctly.
403 permission_error Insufficient permissions or account suspended. Verify your account status and plan entitlements.
429 rate_limit_error Too many requests. Rate limit exceeded. Back off and retry. See Rate Limits.
500 api_error Internal server error. Retry after a brief delay. Contact support if it persists.
503 overloaded_error Upstream provider is temporarily overloaded. Wait a moment and retry with exponential backoff.

Rate Limits

Rate limits vary by plan and are enforced per API key. When you exceed a limit, the API returns a 429 status code. Upgrade your plan for higher throughput.

Per-Plan Limits

Plan Request Quota Monthly Token Quota
Free 40 / 5h, 200 / week Unlimited
Pro 400 / 5h, 2,000 / week Unlimited
Max 5x 1,200 / 5h, 6,000 / week Unlimited
Max 20x 3,000 / 5h, 15,000 / week Unlimited

Rate Limit Headers

Every API response includes headers to help you track your usage in real time:

anthropic-ratelimit-requests-limit: 60
anthropic-ratelimit-requests-remaining: 58
anthropic-ratelimit-requests-reset: 2026-04-08T12:01:00.000Z
anthropic-ratelimit-tokens-limit: 800000
anthropic-ratelimit-tokens-remaining: 800000
Header Description
x-ratelimit-limit Maximum number of requests allowed per 5-hour window for your plan.
x-ratelimit-remaining Number of requests remaining in the current rate-limit window.
x-ratelimit-reset ISO 8601 timestamp when the rate-limit window resets.

Ready to get started?

Create a free account and send your first API request in under a minute.