Switching from OpenAI, Anthropic, or OpenRouter?

XALEN is OpenAI-compatible. Change one line of code:

# Before: base_url = "https://api.openai.com/v1"
# After:
base_url = "https://api.xalen.io/v1"

Your existing SDK code works. Same format, same streaming, same function calling. Plus you get 200+ models, 13 pre-built agents, Studio website builder, and custom agent builder.

Authentication

All API requests require a Bearer token. Include your API key in the Authorization header of every request.

Header Authorization: Bearer xln_live_YOUR_API_KEY

API keys start with xln_live_. Generate one from your Dashboard after signing up.

Keep your key secret

Never expose API keys in client-side code or public repositories. Use environment variables or a backend proxy.

Base URL

All endpoints are served from a single base URL. Append the endpoint path to this URL for every request.

Base URL https://api.xalen.io

For example, to call Chat Completions: POST https://api.xalen.io/v1/chat/completions

Quick Start

Make your first API call in under 60 seconds. Install an SDK or use cURL directly.

Python
JavaScript
cURL
pip install xalen

from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

response = client.chat.completions.create(
    model="vedika-standard",
    messages=[{"role": "user", "content": "Hello, world!"}]
)

print(response.choices[0].message.content)
npm install xalen-sdk

import Xalen from "xalen-sdk";

const client = new Xalen({ apiKey: "xln_live_YOUR_KEY" });

const response = await client.chat.completions.create({
  model: "vedika-standard",
  messages: [{ role: "user", content: "Hello, world!" }],
});

console.log(response.choices[0].message.content);
curl https://api.xalen.io/v1/chat/completions \
  -H "Authorization: Bearer xln_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vedika-standard",
    "messages": [{"role": "user", "content": "Hello, world!"}]
  }'

Chat Completions

POST /v1/chat/completions

Generate a model response for a conversation. Compatible with the OpenAI Chat Completions API format, so existing OpenAI SDK code works by changing only the base URL and API key.

Anthropic Claude models now available. Access Claude Opus 4.7, Sonnet 4.6, Opus 4.6, Haiku 4.5, Sonnet 4.5, and Claude 3.5 Haiku through the same /v1/chat/completions endpoint. No code changes needed — just set model to claude-opus-4.7, claude-sonnet-4.6, etc.

Request Body

ParameterTypeRequiredDescription
modelstringRequiredModel ID. e.g. vedika-standard, claude-sonnet-4.6, claude-opus-4.7, llama-4-maverick
messagesarrayRequiredArray of message objects with role (system, user, assistant) and content.
temperaturenumberOptionalSampling temperature between 0 and 2. Default: 1.
max_tokensintegerOptionalMaximum tokens to generate. Default: model-specific.
streambooleanOptionalStream partial responses as Server-Sent Events. Default: false.
top_pnumberOptionalNucleus sampling threshold. Default: 1.
stopstring | arrayOptionalUp to 4 sequences where the model will stop generating.

Code Examples

Python
JavaScript
cURL
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

response = client.chat.completions.create(
    model="vedika-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is my birth chart?"}
    ],
    temperature=0.7,
    max_tokens=1024
)

print(response.choices[0].message.content)
import Xalen from "xalen-sdk";

const client = new Xalen({ apiKey: "xln_live_YOUR_KEY" });

const response = await client.chat.completions.create({
  model: "vedika-pro",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is my birth chart?" },
  ],
  temperature: 0.7,
  max_tokens: 1024,
});

console.log(response.choices[0].message.content);
curl https://api.xalen.io/v1/chat/completions \
  -H "Authorization: Bearer xln_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vedika-pro",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is my birth chart?"}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

Response

JSON Response
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1717200000,
  "model": "vedika-pro",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "To generate your birth chart, I need your date, time, and place of birth..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 52,
    "total_tokens": 80
  }
}
Try it in Playground →

List Models

GET /v1/models

Returns a list of all available models. Use this to discover model IDs, capabilities, and pricing.

Code Examples

Python
JavaScript
cURL
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

models = client.models.list()
for m in models.data:
    print(m.id, m.owned_by)
import Xalen from "xalen-sdk";
const client = new Xalen({ apiKey: "xln_live_YOUR_KEY" });

const models = await client.models.list();
models.data.forEach(m => console.log(m.id, m.owned_by));
curl https://api.xalen.io/v1/models \
  -H "Authorization: Bearer xln_live_YOUR_KEY"

Response

JSON Response
{
  "object": "list",
  "data": [
    {
      "id": "vedika-standard",
      "object": "model",
      "owned_by": "xalen",
      "permission": []
    },
    {
      "id": "vedika-pro",
      "object": "model",
      "owned_by": "xalen",
      "permission": []
    },
    {
      "id": "claude-opus-4.7",
      "object": "model",
      "owned_by": "anthropic",
      "permission": []
    },
    {
      "id": "claude-sonnet-4.6",
      "object": "model",
      "owned_by": "anthropic",
      "permission": []
    }
  ]
}

Embeddings

POST /v1/embeddings

Generate vector embeddings for text input. Use for semantic search, clustering, or recommendation systems.

Request Body

ParameterTypeRequiredDescription
modelstringRequiredEmbedding model ID. e.g. text-embedding-3-small
inputstring | arrayRequiredText to embed. Can be a single string or array of strings.
encoding_formatstringOptionalfloat (default) or base64.

Code Examples

Python
JavaScript
cURL
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Vedic astrology birth chart analysis"
)

print(len(response.data[0].embedding))  # 1536
import Xalen from "xalen-sdk";
const client = new Xalen({ apiKey: "xln_live_YOUR_KEY" });

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: "Vedic astrology birth chart analysis",
});

console.log(response.data[0].embedding.length); // 1536
curl https://api.xalen.io/v1/embeddings \
  -H "Authorization: Bearer xln_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "text-embedding-3-small", "input": "Vedic astrology birth chart analysis"}'

Response

JSON Response
{
  "object": "list",
  "data": [{
    "object": "embedding",
    "index": 0,
    "embedding": [0.0023, -0.0091, 0.0152, ...]
  }],
  "model": "text-embedding-3-small",
  "usage": { "prompt_tokens": 6, "total_tokens": 6 }
}

Image Generation

POST /v1/images/generations

Generate images from text prompts. Returns one or more image URLs or base64-encoded data.

Request Body

ParameterTypeRequiredDescription
promptstringRequiredText description of the image to generate.
modelstringOptionalImage model ID. Default: platform default.
nintegerOptionalNumber of images. Default: 1. Max: 4.
sizestringOptional256x256, 512x512, or 1024x1024. Default: 1024x1024.
response_formatstringOptionalurl (default) or b64_json.

Code Examples

Python
cURL
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

response = client.images.generate(
    prompt="A serene Hindu temple at sunrise, watercolor style",
    size="1024x1024"
)

print(response.data[0].url)
curl https://api.xalen.io/v1/images/generations \
  -H "Authorization: Bearer xln_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A serene Hindu temple at sunrise, watercolor style", "size": "1024x1024"}'

Response

JSON Response
{
  "created": 1717200000,
  "data": [{
    "url": "https://api.xalen.io/files/img-abc123.png"
  }]
}

Text to Speech

POST /v1/audio/speech

Convert text to natural-sounding speech. Supports multiple voices and output formats.

Request Body

ParameterTypeRequiredDescription
modelstringRequiredTTS model ID. e.g. tts-1, tts-1-hd
inputstringRequiredText to convert. Max 4096 characters.
voicestringRequiredVoice ID. Options: alloy, echo, fable, onyx, nova, shimmer
response_formatstringOptionalmp3 (default), opus, aac, flac, wav
speednumberOptional0.25 to 4.0. Default: 1.0.

Code Examples

Python
cURL
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

response = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input="Welcome to your daily horoscope reading."
)

with open("output.mp3", "wb") as f:
    f.write(response.content)
curl https://api.xalen.io/v1/audio/speech \
  -H "Authorization: Bearer xln_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "tts-1", "voice": "nova", "input": "Welcome to your daily horoscope reading."}' \
  --output output.mp3

Returns raw audio bytes in the requested format.

Speech to Text

POST /v1/audio/transcriptions

Transcribe audio to text. Supports multiple languages including 14 Indian languages.

Request Body (multipart/form-data)

ParameterTypeRequiredDescription
filefileRequiredAudio file (mp3, mp4, mpeg, mpga, m4a, wav, webm). Max 25 MB.
modelstringRequiredTranscription model. e.g. whisper-1
languagestringOptionalISO-639-1 code. e.g. hi, ta, te, en
response_formatstringOptionaljson (default), text, verbose_json

Code Examples

Python
cURL
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
        language="hi"
    )

print(transcript.text)
curl https://api.xalen.io/v1/audio/transcriptions \
  -H "Authorization: Bearer xln_live_YOUR_KEY" \
  -F [email protected] \
  -F model=whisper-1 \
  -F language=hi

Response

JSON Response
{
  "text": "Transcribed text content here..."
}

Voice AI

POST /v1/voice/binary

End-to-end voice conversation: send audio, get audio back. Combines speech recognition, AI reasoning, and text-to-speech in a single call. Supports 31 languages with sub-200ms latency.

Request Body (multipart/form-data)

ParameterTypeRequiredDescription
audiofileRequiredAudio input file (wav, mp3, webm, ogg).
languagestringOptionalISO-639-1 code. Auto-detected if omitted.
voicestringOptionalResponse voice ID. Default: nova.
contextstringOptionalSystem prompt for the AI reasoning layer.
birth_detailsobjectOptionalFor astrology queries: { "date": "1990-01-15", "time": "14:30", "place": "Mumbai" }

Code Examples

Python
cURL
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

with open("question.wav", "rb") as f:
    response = client.voice.binary(
        audio=f,
        language="hi",
        voice="nova"
    )

with open("answer.mp3", "wb") as f:
    f.write(response.audio)
curl https://api.xalen.io/v1/voice/binary \
  -H "Authorization: Bearer xln_live_YOUR_KEY" \
  -F [email protected] \
  -F language=hi \
  -F voice=nova \
  --output answer.mp3

Returns binary audio in mp3 format by default. The response includes a X-Transcript header with the text transcription and X-Response-Text with the AI's text reply.

Astrology AI Query

POST /v1/chat/completions

Ask any astrology question in natural language. Use model: "vedika-standard" or model: "vedika-fast" in the standard Chat Completions endpoint. The Vedika engine handles birth chart computation, classical text grounding, RAG retrieval, and multi-language response generation automatically.

Example

from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

response = client.chat.completions.create(
    model="vedika-standard",
    messages=[
        {"role": "user", "content": "I was born on 15 Jan 1990 at 2:30 PM in Pune. What is my current Mahadasha and its effects?"}
    ]
)

print(response.choices[0].message.content)
# Includes: grounded answer, classical citations, follow-up suggestions
curl -X POST "https://api.xalen.io/v1/chat/completions" \
  -H "Authorization: Bearer xln_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vedika-standard",
    "messages": [{"role": "user", "content": "What is Rahu in the 7th house according to BPHS?"}]
  }'

The Vedika AI engine supports: birth chart analysis, dasha predictions, transit effects, compatibility matching, muhurta selection, panchang queries, yoga identification, and remedial suggestions, all through natural language conversation.

Structured Astrology Data

For structured JSON endpoints, use Vedika API directly

If you need raw structured data (birth charts, planetary positions, panchang, dasha timelines, divisional charts D1-D60, yoga calculations, compatibility scores), the Vedika API provides 130+ computation endpoints with structured JSON responses.

✓ Birth chart (Kundali) generation
✓ Vimshottari Dasha timelines
✓ Panchang (Tithi, Nakshatra, Yoga)
✓ Ashtakoot compatibility (36 gunas)
✓ Transit & progression analysis
✓ Divisional charts (D1-D60)
✓ Yoga identification (131 yogas)
✓ Western synastry & composite

When to use XALEN vs Vedika: Use XALEN's /v1/chat/completions with vedika-standard for natural language AI queries with grounding and citations. Use Vedika's structured API directly when you need raw JSON computation data (chart objects, planetary degrees, dasha trees) for building custom UIs.

Kundali Generation

GET /v1/astrology/kundali

Generate a complete Vedic birth chart (Kundali) with planetary positions, house placements, nakshatras, yogas, and dashas. Powered by the Vedika Ephemeris engine with arc-second precision.

Query Parameters

ParameterTypeRequiredDescription
datestringRequiredBirth date in YYYY-MM-DD format.
timestringRequiredBirth time in HH:MM 24-hour format.
latnumberRequiredLatitude of birth place. e.g. 18.5204
lonnumberRequiredLongitude of birth place. e.g. 73.8567
tznumberOptionalTimezone offset in hours. e.g. 5.5 for IST. Auto-detected from coordinates if omitted.
systemstringOptionalvedic (default), western, or kp.
languagestringOptionalResponse language. ISO-639-1 code. Default: en.

Code Examples

Python
cURL
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

chart = client.astrology.kundali(
    date="1990-01-15",
    time="14:30",
    lat=18.5204,
    lon=73.8567
)

print(chart.ascendant)       # "Taurus"
print(chart.moon_sign)       # "Scorpio"
print(chart.planets)         # Detailed planetary positions
print(chart.yogas)           # Active yogas
curl "https://api.xalen.io/v1/astrology/kundali?date=1990-01-15&time=14:30&lat=18.5204&lon=73.8567" \
  -H "Authorization: Bearer xln_live_YOUR_KEY"

Response

JSON Response (abbreviated)
{
  "ascendant": { "sign": "Taurus", "degree": 14.32, "nakshatra": "Rohini" },
  "moon_sign": "Scorpio",
  "sun_sign": "Capricorn",
  "planets": [
    { "name": "Sun", "sign": "Capricorn", "house": 9, "degree": 0.85, "nakshatra": "Uttara Ashadha", "retrograde": false },
    { "name": "Moon", "sign": "Scorpio", "house": 7, "degree": 22.41, "nakshatra": "Jyeshtha", "retrograde": false }
  ],
  "houses": [ ... ],
  "yogas": [
    { "name": "Gaja Kesari Yoga", "description": "Jupiter in kendra from Moon", "strength": "strong" }
  ],
  "dasha": { "current": "Venus", "sub": "Mercury", "start": "2024-03-12", "end": "2026-08-04" },
  "engine": "vedika-ephemeris"
}

Run Agent

POST /v1/agents/run

Execute a pre-built AI agent with a single API call. Agents are purpose-built for specific tasks like temple management, devotional content, and spiritual guidance.

Request Body

ParameterTypeRequiredDescription
agent_idstringRequiredAgent identifier. e.g. temple-assistant, kundali-reader, puja-planner
inputstringRequiredUser message or task description.
contextobjectOptionalAdditional data for the agent (birth details, location, preferences).
streambooleanOptionalStream the response. Default: false.

Code Examples

Python
cURL
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

result = client.agents.run(
    agent_id="kundali-reader",
    input="What career paths suit my chart?",
    context={
        "birth_date": "1990-01-15",
        "birth_time": "14:30",
        "birth_place": "Pune, India"
    }
)

print(result.output)
curl https://api.xalen.io/v1/agents/run \
  -H "Authorization: Bearer xln_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "kundali-reader",
    "input": "What career paths suit my chart?",
    "context": {"birth_date": "1990-01-15", "birth_time": "14:30", "birth_place": "Pune, India"}
  }'

Response

JSON Response
{
  "agent_id": "kundali-reader",
  "output": "Based on your chart with Taurus ascendant and strong 10th house...",
  "usage": { "prompt_tokens": 450, "completion_tokens": 320, "total_tokens": 770 }
}

Check Balance

GET /v1/billing/balance

Retrieve your current wallet balance and usage summary.

Code Examples

Python
cURL
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

balance = client.billing.balance()
print(f"${balance.available / 100:.2f}")  # Wallet in USD
curl https://api.xalen.io/v1/billing/balance \
  -H "Authorization: Bearer xln_live_YOUR_KEY"

Response

JSON Response
{
  "available": 4250,
  "currency": "usd_cents",
  "total_spent": 1750,
  "plan": "pay-as-you-go"
}

Add Funds

POST /v1/billing/deposit

Add funds to your wallet. Returns a Razorpay payment link. Minimum deposit: $10.

Request Body

ParameterTypeRequiredDescription
amountintegerRequiredAmount in USD cents. Minimum: 1000 ($10).
currencystringOptionalusd (default) or inr.

Code Examples

Python
cURL
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

payment = client.billing.deposit(amount=5000)  # $50
print(payment.payment_url)  # Razorpay checkout link
curl https://api.xalen.io/v1/billing/deposit \
  -H "Authorization: Bearer xln_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"amount": 5000}'

Response

JSON Response
{
  "payment_id": "pay_abc123",
  "payment_url": "https://rzp.io/l/xalen-deposit",
  "amount": 5000,
  "currency": "usd_cents",
  "status": "pending"
}

Billing Mechanics

XALEN uses a prepaid wallet model. Deposit funds, then pay per API call. No surprise invoices, no credit card holds, no overages unless you opt in.

Wallet Model

FeatureDetails
DepositAdd funds via card or UPI. Minimum deposit: $10 (Pay-as-you-go).
First deposit bonus$10 bonus on your first $100+ deposit. Applied automatically.
Per-call billingEach API call deducts from your wallet based on model, tokens, and endpoint type.
CurrencyWallet stored in USD cents. Display values divide by 100.

Balance & Usage Tracking

EndpointMethodDescription
/v1/billing/balanceGETReal-time wallet balance, total spent, and current plan.
/v1/billing/usage?period=currentGETDetailed usage breakdown for the current billing period.

Plan Switching

Upgrades take effect immediately. Your new rate limits and features are available within seconds. Downgrades are deferred to the end of the current billing cycle to avoid mid-cycle disruption.

Spending Controls

ControlDetails
Spending alertsConfigurable notifications at 50%, 80%, 90%, and 100% of your budget threshold.
Hard spending capSet via dashboard. API calls return 402 Payment Required when the cap is exceeded.
Auto-pauseOptional. Stops API calls entirely when your budget is reached instead of allowing overage.
No Free Tier

Every API call costs money. XALEN does not offer free credits, trials, or complimentary usage. Use the Playground to test models before committing to a deposit.

Fine-Tuning

Fine-tune any supported model on your own data. Upload JSONL training files, configure hyperparameters, and deploy a custom model tailored to your use case. XALEN supports standard full-parameter fine-tuning and lightweight LoRA adapters depending on the base model.

Supported Base Models

ModelTypeMin Training Examples
Llama 3.3 70BStandard100
DeepSeek V3Standard100
Qwen 2.5 72BStandard100
Llama 4 Scout 17BLoRA50
Qwen3 235BLoRA50

Training Data Format

Upload training data as a JSONL file. Each line must be a valid JSON object containing a messages array with system, user, and assistant turns.

JSONL Format (one line per example)
{"messages": [{"role": "system", "content": "You are a Vedic astrology expert."}, {"role": "user", "content": "What does Saturn in the 7th house mean?"}, {"role": "assistant", "content": "Saturn in the 7th house indicates..."}]}
{"messages": [{"role": "system", "content": "You are a Vedic astrology expert."}, {"role": "user", "content": "Explain Rahu Mahadasha effects."}, {"role": "assistant", "content": "During Rahu Mahadasha..."}]}

Upload Training Data

POST /v1/files

Upload your JSONL training file. The response includes a file_id used when creating a fine-tuning job.

ParameterTypeRequiredDescription
filefileRequiredThe JSONL file to upload.
purposestringRequiredMust be fine-tune.

Create Fine-Tuning Job

POST /v1/fine-tuning/jobs

Create a new fine-tuning job. The job trains asynchronously and you can poll its status or list all jobs.

ParameterTypeRequiredDescription
modelstringRequiredBase model ID. e.g. llama-3.3-70b, deepseek-v3, qwen-2.5-72b
training_filestringRequiredFile ID returned from the upload endpoint.
hyperparametersobjectOptionalTraining config: epochs (default 3), learning_rate (default auto), batch_size (default auto).
suffixstringOptionalCustom suffix for the fine-tuned model name. Max 18 characters.

List Fine-Tuning Jobs

GET /v1/fine-tuning/jobs

Returns a list of all fine-tuning jobs for your account, including status, progress, and result model IDs.

Get Job Status

GET /v1/fine-tuning/jobs/{id}

Retrieve the current status and details of a specific fine-tuning job.

Cancel Job

POST /v1/fine-tuning/jobs/{id}/cancel

Cancel a running fine-tuning job. Jobs that have already completed cannot be cancelled.

Code Examples

Python
JavaScript
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

# 1. Upload training data
training_file = client.files.create(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune"
)

# 2. Create fine-tuning job
job = client.fine_tuning.jobs.create(
    model="llama-3.3-70b",
    training_file=training_file.id,
    hyperparameters={
        "epochs": 3,
        "learning_rate": 1e-5,
        "batch_size": 4
    },
    suffix="astro-expert"
)

print(f"Job created: {job.id}, status: {job.status}")

# 3. Poll for completion
import time
while job.status not in ["succeeded", "failed", "cancelled"]:
    time.sleep(30)
    job = client.fine_tuning.jobs.retrieve(job.id)
    print(f"Status: {job.status}")

# 4. Use the fine-tuned model
if job.status == "succeeded":
    response = client.chat.completions.create(
        model=job.fine_tuned_model,
        messages=[{"role": "user", "content": "Analyze my chart"}]
    )
    print(response.choices[0].message.content)
import Xalen from "xalen-sdk";
import fs from "fs";

const client = new Xalen({ apiKey: "xln_live_YOUR_KEY" });

// 1. Upload training data
const trainingFile = await client.files.create({
  file: fs.createReadStream("training_data.jsonl"),
  purpose: "fine-tune",
});

// 2. Create fine-tuning job
const job = await client.fineTuning.jobs.create({
  model: "llama-3.3-70b",
  training_file: trainingFile.id,
  hyperparameters: {
    epochs: 3,
    learning_rate: 1e-5,
    batch_size: 4,
  },
  suffix: "astro-expert",
});

console.log(`Job created: ${job.id}, status: ${job.status}`);

// 3. Poll for completion
let status = job.status;
while (!["succeeded", "failed", "cancelled"].includes(status)) {
  await new Promise((r) => setTimeout(r, 30000));
  const updated = await client.fineTuning.jobs.retrieve(job.id);
  status = updated.status;
  console.log(`Status: ${status}`);
}

// 4. Use the fine-tuned model
if (status === "succeeded") {
  const updated = await client.fineTuning.jobs.retrieve(job.id);
  const response = await client.chat.completions.create({
    model: updated.fine_tuned_model,
    messages: [{ role: "user", content: "Analyze my chart" }],
  });
  console.log(response.choices[0].message.content);
}

Response

JSON Response
{
  "id": "ftjob-abc123",
  "object": "fine_tuning.job",
  "model": "llama-3.3-70b",
  "status": "running",
  "training_file": "file-xyz789",
  "hyperparameters": {
    "epochs": 3,
    "learning_rate": 1e-5,
    "batch_size": 4
  },
  "fine_tuned_model": null,
  "created_at": 1717200000,
  "finished_at": null,
  "trained_tokens": 45200
}
Fine-Tuning Pricing

Fine-tuning is billed per training token. Pricing varies by base model. Check the Pricing page for current per-token training rates. Inference on fine-tuned models is billed at the base model rate.

Dedicated Endpoints

Reserved GPU capacity for predictable performance. Dedicated endpoints give you isolated compute with no shared rate limits, built-in prompt caching, and support for hot-swapping LoRA adapters. Ideal for production workloads that demand consistent latency and throughput.

Hardware Options

HardwareVRAMPriceBest For
NVIDIA H100 80GB80 GB$3.49/GPU-hrLarge models, high throughput
NVIDIA A100 80GB80 GB$2.09/GPU-hrMid-range models
NVIDIA L40S 48GB48 GB$1.19/GPU-hrSmaller models, cost-optimized

Features

FeatureDetails
Prompt CachingUp to 90% cost reduction on repeated prompt prefixes. Cached automatically on dedicated infrastructure.
AutoscalingScale from 1 to 16 replicas based on traffic. Configure min/max replicas and scaling thresholds.
LoRA AdaptersHot-swap fine-tuned LoRA weights without redeploying. Attach multiple adapters to a single base model.
Custom Scaling PoliciesDefine scaling rules based on request queue depth, latency percentiles, or GPU utilization.
Isolated Rate LimitsYour endpoint is not affected by other tenants. Full throughput capacity is reserved for your traffic.

Create Dedicated Endpoint

POST /v1/endpoints
ParameterTypeRequiredDescription
modelstringRequiredModel to deploy. e.g. llama-3.3-70b, deepseek-v3
hardwarestringRequiredGPU type: h100-80gb, a100-80gb, or l40s-48gb
min_replicasintegerOptionalMinimum replicas. Default: 1.
max_replicasintegerOptionalMaximum replicas for autoscaling. Default: 1.

List Endpoints

GET /v1/endpoints

Returns all dedicated endpoints for your account, including status, hardware, model, and current replica count.

Delete Endpoint

POST /v1/endpoints/{id}/delete

Shut down a dedicated endpoint. Billing stops when the endpoint is fully deprovisioned.

Code Examples

Python
JavaScript
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

# Create a dedicated endpoint
endpoint = client.endpoints.create(
    model="llama-3.3-70b",
    hardware="h100-80gb",
    min_replicas=1,
    max_replicas=4
)

print(f"Endpoint: {endpoint.id}, status: {endpoint.status}")

# Use the dedicated endpoint for inference
response = client.chat.completions.create(
    model=endpoint.model,
    messages=[{"role": "user", "content": "Hello!"}],
    extra_headers={"X-Endpoint-Id": endpoint.id}
)

print(response.choices[0].message.content)

# List all endpoints
endpoints = client.endpoints.list()
for ep in endpoints.data:
    print(f"{ep.id}: {ep.model} on {ep.hardware} ({ep.status})")
import Xalen from "xalen-sdk";

const client = new Xalen({ apiKey: "xln_live_YOUR_KEY" });

// Create a dedicated endpoint
const endpoint = await client.endpoints.create({
  model: "llama-3.3-70b",
  hardware: "h100-80gb",
  min_replicas: 1,
  max_replicas: 4,
});

console.log(`Endpoint: ${endpoint.id}, status: ${endpoint.status}`);

// Use the dedicated endpoint for inference
const response = await client.chat.completions.create({
  model: endpoint.model,
  messages: [{ role: "user", content: "Hello!" }],
}, {
  headers: { "X-Endpoint-Id": endpoint.id },
});

console.log(response.choices[0].message.content);

// List all endpoints
const endpoints = await client.endpoints.list();
for (const ep of endpoints.data) {
  console.log(`${ep.id}: ${ep.model} on ${ep.hardware} (${ep.status})`);
}
Available on Scale Tier and Above

Dedicated endpoints require a Scale ($2,499/mo), Dedicated ($5,000+/mo), or Enterprise plan. Pay-as-you-go and Growth plans use shared serverless infrastructure.

Batch Processing

Process large workloads asynchronously at 50% lower cost. Submit up to 1,000 requests per batch, and XALEN processes them in the background with optimized throughput. Ideal for data labeling, content generation, bulk analysis, and any workload that does not require real-time responses.

Create Batch

POST /v1/batch
ParameterTypeRequiredDescription
requestsarrayRequiredArray of up to 1,000 request objects. Each object has the same schema as a /v1/chat/completions request body.
modelstringRequiredModel to use for all requests in the batch. e.g. deepseek-v3, llama-3.3-70b
metadataobjectOptionalKey-value metadata to attach to the batch for tracking.

Get Batch Status

GET /v1/batch/{id}

Retrieve the current status and results of a batch job. When the batch completes, the response includes an array of all outputs.

Code Examples

Python
JavaScript
from xalen import Xalen
import time

client = Xalen(api_key="xln_live_YOUR_KEY")

# Create a batch of requests
batch = client.batch.create(
    model="deepseek-v3",
    requests=[
        {"messages": [{"role": "user", "content": f"Summarize article {i}"}]}
        for i in range(100)
    ],
    metadata={"project": "content-pipeline"}
)

print(f"Batch {batch.id} created, status: {batch.status}")

# Poll for completion
while batch.status not in ["completed", "failed"]:
    time.sleep(10)
    batch = client.batch.retrieve(batch.id)
    print(f"Progress: {batch.completed_requests}/{batch.total_requests}")

# Process results
if batch.status == "completed":
    for result in batch.results:
        print(result.choices[0].message.content[:100])
import Xalen from "xalen-sdk";

const client = new Xalen({ apiKey: "xln_live_YOUR_KEY" });

// Create a batch of requests
const requests = Array.from({ length: 100 }, (_, i) => ({
  messages: [{ role: "user", content: `Summarize article ${i}` }],
}));

const batch = await client.batch.create({
  model: "deepseek-v3",
  requests,
  metadata: { project: "content-pipeline" },
});

console.log(`Batch ${batch.id} created, status: ${batch.status}`);

// Poll for completion
let status = batch.status;
while (!["completed", "failed"].includes(status)) {
  await new Promise((r) => setTimeout(r, 10000));
  const updated = await client.batch.retrieve(batch.id);
  status = updated.status;
  console.log(`Progress: ${updated.completed_requests}/${updated.total_requests}`);
}

// Process results
if (status === "completed") {
  const completed = await client.batch.retrieve(batch.id);
  for (const result of completed.results) {
    console.log(result.choices[0].message.content.slice(0, 100));
  }
}
Batch Pricing

All batch requests are billed at 50% of standard per-token rates. The same model, the same quality, half the cost. Batches typically complete within 1-6 hours depending on size and model.

Evaluations

Automated model evaluation using LLM-as-a-Judge. Compare models head-to-head, score outputs against custom criteria, and track quality over time. Evaluations run server-side so you can benchmark without writing scoring infrastructure.

Evaluation Types

TypeDescriptionOutput
ClassificationJudge assigns a discrete label to each response (e.g. "accurate", "inaccurate", "partial").Label + reasoning
ScoringJudge rates each response on a numeric scale (e.g. 1-5 for relevance, accuracy, helpfulness).Score + reasoning
ComparisonJudge selects the better response from two model outputs for the same prompt.Winner + reasoning

Supported Judge Models

Any chat model available on XALEN can serve as a judge. Recommended judges for high-quality evaluations:

ModelStrengths
Llama 3.3 70BStrong reasoning, good at nuanced scoring. Excellent general-purpose judge.
DeepSeek V3High accuracy on technical and domain-specific evaluations.
Qwen 2.5 72BMultilingual evaluation strength. Good for non-English content scoring.

Create Evaluation

POST /v1/evaluations
ParameterTypeRequiredDescription
typestringRequiredEvaluation type: classification, scoring, or comparison.
judge_modelstringRequiredModel ID for the judge. e.g. llama-3.3-70b
test_casesarrayRequiredArray of test case objects. Each has input (messages array) and output (string or array of strings for comparison).
criteriastringOptionalCustom evaluation criteria for the judge. e.g. "Rate accuracy of astrological predictions on a 1-5 scale."
scaleobjectOptionalFor scoring type: min and max values. Default: {"min": 1, "max": 5}.

Get Evaluation Results

GET /v1/evaluations/{id}

Retrieve the results of a completed evaluation, including per-test-case scores and aggregate metrics.

Code Examples

Python
JavaScript
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

# Score model outputs on a 1-5 scale
evaluation = client.evaluations.create(
    type="scoring",
    judge_model="llama-3.3-70b",
    criteria="Rate the accuracy and helpfulness of each response on a 1-5 scale.",
    scale={"min": 1, "max": 5},
    test_cases=[
        {
            "input": [{"role": "user", "content": "What is Ketu in astrology?"}],
            "output": "Ketu is the south node of the Moon..."
        },
        {
            "input": [{"role": "user", "content": "Explain Venus Mahadasha."}],
            "output": "Venus Mahadasha lasts 20 years..."
        }
    ]
)

# Retrieve results
results = client.evaluations.retrieve(evaluation.id)
print(f"Average score: {results.aggregate.mean_score}")
for case in results.results:
    print(f"Score: {case.score}, Reasoning: {case.reasoning[:80]}")
import Xalen from "xalen-sdk";

const client = new Xalen({ apiKey: "xln_live_YOUR_KEY" });

// Compare two model outputs head-to-head
const evaluation = await client.evaluations.create({
  type: "comparison",
  judge_model: "llama-3.3-70b",
  criteria: "Which response is more accurate and detailed?",
  test_cases: [
    {
      input: [{ role: "user", content: "What is Ketu in astrology?" }],
      output: [
        "Ketu is the south node of the Moon in Vedic astrology...",
        "Ketu represents past karma and spiritual liberation..."
      ],
    },
  ],
});

// Retrieve results
const results = await client.evaluations.retrieve(evaluation.id);
for (const r of results.results) {
  console.log(`Winner: Output ${r.winner}, Reasoning: ${r.reasoning}`);
}

GPU Clusters

On-demand GPU clusters for training, large batch jobs, and custom workloads. Self-service provisioning via API or dashboard. Spin up multi-node clusters with high-bandwidth interconnects, run your training jobs, and tear down when finished.

Available GPUs

GPUVRAMInterconnectPrice
NVIDIA H100 SXM80 GBNVLink 900 GB/s$3.49/GPU-hr
NVIDIA B200192 GBNVLink 1800 GB/s$5.99/GPU-hr
NVIDIA A100 SXM80 GBNVLink 600 GB/s$2.09/GPU-hr
NVIDIA L40S48 GBPCIe Gen4$1.19/GPU-hr

Features

FeatureDetails
Cluster ManagementSlurm and Kubernetes orchestration. Choose your preferred scheduler at provisioning time.
Persistent StorageNVMe SSD storage attached to every node. Data persists across job restarts within the cluster lifetime.
Multi-RegionClusters available in US, Europe, and Asia Pacific. Select region at creation time for data residency compliance.
Health MonitoringReal-time GPU utilization, memory, temperature, and error metrics via API and dashboard.
SSH AccessDirect SSH access to cluster nodes for custom setup, debugging, and environment configuration.

Create Cluster

POST /v1/clusters
ParameterTypeRequiredDescription
gpu_typestringRequiredGPU model: h100-sxm, b200, a100-sxm, or l40s
gpu_countintegerRequiredNumber of GPUs. Must be a multiple of 8 for H100/B200/A100.
regionstringOptionalDeployment region: us-east, us-west, eu-west, ap-south. Default: us-east.
schedulerstringOptionalOrchestration: slurm or kubernetes. Default: kubernetes.
max_runtime_hoursintegerOptionalAuto-terminate after this many hours. Default: no limit.

Get Cluster Status

GET /v1/clusters/{id}

Retrieve cluster status, node health, GPU utilization, and connection details (SSH host, Kubernetes config).

Delete Cluster

POST /v1/clusters/{id}/delete

Terminate a cluster and release all resources. Billing stops when deprovisioning completes.

Code Examples

Python
JavaScript
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

# Provision an 8xH100 cluster
cluster = client.clusters.create(
    gpu_type="h100-sxm",
    gpu_count=8,
    region="us-east",
    scheduler="kubernetes",
    max_runtime_hours=24
)

print(f"Cluster {cluster.id}: {cluster.status}")
print(f"SSH: {cluster.ssh_host}")
print(f"K8s config: {cluster.kubeconfig_url}")

# Monitor GPU utilization
status = client.clusters.retrieve(cluster.id)
for node in status.nodes:
    print(f"Node {node.id}: GPU util {node.gpu_utilization}%, "
          f"memory {node.gpu_memory_used_gb}/{node.gpu_memory_total_gb} GB")

# Tear down when done
client.clusters.delete(cluster.id)
import Xalen from "xalen-sdk";

const client = new Xalen({ apiKey: "xln_live_YOUR_KEY" });

// Provision an 8xH100 cluster
const cluster = await client.clusters.create({
  gpu_type: "h100-sxm",
  gpu_count: 8,
  region: "us-east",
  scheduler: "kubernetes",
  max_runtime_hours: 24,
});

console.log(`Cluster ${cluster.id}: ${cluster.status}`);
console.log(`SSH: ${cluster.ssh_host}`);
console.log(`K8s config: ${cluster.kubeconfig_url}`);

// Monitor GPU utilization
const status = await client.clusters.retrieve(cluster.id);
for (const node of status.nodes) {
  console.log(
    `Node ${node.id}: GPU util ${node.gpu_utilization}%, ` +
    `memory ${node.gpu_memory_used_gb}/${node.gpu_memory_total_gb} GB`
  );
}

// Tear down when done
await client.clusters.delete(cluster.id);
Available on Dedicated and Enterprise Tiers

GPU clusters require a Dedicated ($5,000+/mo) or Enterprise plan. Contact sales for custom cluster configurations and reserved capacity pricing.

Prompt Caching

Reduce costs and latency by caching repeated prompt prefixes. When you send the same system prompt, few-shot examples, or long context prefix across multiple requests, XALEN caches the prefix and reuses it. Cache reads cost up to 90% less than processing the same tokens from scratch.

How It Works

StepDetails
1. Cache CreationOn the first request, the prompt prefix is processed and cached. Cache creation costs the same as standard input tokens.
2. Cache HitSubsequent requests with an identical prefix hit the cache. Cached tokens are billed at 90% less than standard input token rates.
3. Cache EvictionCaches expire after a period of inactivity (typically 5-10 minutes). Frequently used prefixes stay cached indefinitely.

Per-Model Caching Support

ModelServerless CacheDedicated CacheCache Discount
Claude Opus 4.7YesN/A90% on reads
Claude Sonnet 4.6YesN/A90% on reads
Claude Haiku 4.5YesN/A90% on reads
DeepSeek V3NoYes90% on reads
Llama 3.3 70BNoYes90% on reads
Qwen 2.5 72BNoYes90% on reads

Claude models receive automatic caching through the inference provider. Other models require a dedicated endpoint for caching support.

Usage Tracking

When prompt caching is active, the response usage object includes two additional fields:

FieldTypeDescription
cache_creation_tokensintegerNumber of tokens written to cache on this request. Billed at standard input rate.
cache_read_tokensintegerNumber of tokens read from cache. Billed at 90% discount.

Code Examples

Python
JavaScript
from xalen import Xalen

client = Xalen(api_key="xln_live_YOUR_KEY")

# Long system prompt that stays the same across requests
system_prompt = """You are a Vedic astrology expert trained on classical texts
including BPHS, Phaladeepika, and Brihat Jataka. Provide detailed analysis
based on planetary positions, houses, aspects, and dasha periods.
Always cite the relevant classical reference for each interpretation.
...(2000+ tokens of instructions)..."""

# First request: creates cache (billed at standard rate)
r1 = client.chat.completions.create(
    model="claude-sonnet-4.6",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "Analyze Saturn in the 10th house."}
    ]
)
print(f"Cache created: {r1.usage.cache_creation_tokens} tokens")

# Second request: hits cache (90% cheaper on cached prefix)
r2 = client.chat.completions.create(
    model="claude-sonnet-4.6",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "What does Jupiter in the 5th house mean?"}
    ]
)
print(f"Cache read: {r2.usage.cache_read_tokens} tokens (90% discount)")
import Xalen from "xalen-sdk";

const client = new Xalen({ apiKey: "xln_live_YOUR_KEY" });

// Long system prompt that stays the same across requests
const systemPrompt = `You are a Vedic astrology expert trained on classical texts
including BPHS, Phaladeepika, and Brihat Jataka. Provide detailed analysis
based on planetary positions, houses, aspects, and dasha periods.
Always cite the relevant classical reference for each interpretation.
...(2000+ tokens of instructions)...`;

// First request: creates cache
const r1 = await client.chat.completions.create({
  model: "claude-sonnet-4.6",
  messages: [
    { role: "system", content: systemPrompt },
    { role: "user", content: "Analyze Saturn in the 10th house." },
  ],
});
console.log(`Cache created: ${r1.usage.cache_creation_tokens} tokens`);

// Second request: hits cache (90% cheaper on cached prefix)
const r2 = await client.chat.completions.create({
  model: "claude-sonnet-4.6",
  messages: [
    { role: "system", content: systemPrompt },
    { role: "user", content: "What does Jupiter in the 5th house mean?" },
  ],
});
console.log(`Cache read: ${r2.usage.cache_read_tokens} tokens (90% discount)`);
Maximize Cache Hits

Structure your prompts with the static content first (system instructions, few-shot examples, reference data) and dynamic content last (user query). The longer the shared prefix, the greater the cost savings. A 2,000-token cached prefix saves roughly $0.027 per request on Claude Sonnet 4.6.

Model Feature Matrix

Not every model supports every feature. Use this matrix to find the right model for your use case. Features marked "Dedicated" are available only when using a dedicated endpoint.

Model Chat Vision Streaming Fine-tune Batch Dedicated Caching Functions
Claude Opus 4.7 Yes Yes Yes No No No Yes Yes
Claude Sonnet 4.6 Yes Yes Yes No No No Yes Yes
Claude Haiku 4.5 Yes No Yes No No No Yes Yes
DeepSeek V3 Yes No Yes Yes Yes Yes Dedicated Yes
DeepSeek R1 Yes No Yes Yes Yes Yes Dedicated No
Llama 3.3 70B Yes No Yes Yes Yes Yes Dedicated Yes
Llama 4 Scout Yes Yes Yes LoRA Yes Yes Dedicated Yes
Qwen 2.5 72B Yes No Yes Yes Yes Yes Dedicated Yes
Qwen3 235B Yes No Yes LoRA Yes Yes Dedicated Yes
GPT-OSS 120B Yes No Yes No Yes Yes Dedicated Yes
Vedika Standard Yes No Yes No No No No No
Vedika Fast Yes No Yes No No No No No
Legend

Yes = fully supported on serverless. Dedicated = available only on dedicated endpoints. LoRA = LoRA adapter fine-tuning only (not full-parameter). No = not available for this model. Use GET /v1/models for the latest capabilities.

SDKs

Official client libraries for Python, JavaScript, and an MCP server for AI coding assistants.

Python

pip install xalen
from xalen import Xalen

# Uses XALEN_API_KEY env var by default
client = Xalen()

# Or pass explicitly
client = Xalen(api_key="xln_live_YOUR_KEY")

# All OpenAI-compatible methods work
response = client.chat.completions.create(
    model="vedika-standard",
    messages=[{"role": "user", "content": "Hello"}]
)

JavaScript / TypeScript

npm install xalen-sdk
import Xalen from "xalen-sdk";

const client = new Xalen({ apiKey: process.env.XALEN_API_KEY });

const response = await client.chat.completions.create({
  model: "vedika-standard",
  messages: [{ role: "user", content: "Hello" }],
});

MCP Server

Connect XALEN to AI coding assistants like Claude Code, Cursor, and GitHub Copilot using the Model Context Protocol.

npx xalen-mcp

The MCP server exposes 15 tools covering chat, embeddings, image generation, voice, astrology, and billing. See the npm package for configuration details.

Mobile

The XALEN REST API works from any platform, including React Native, Flutter, Swift, and Kotlin. Since the API is OpenAI-compatible, any existing OpenAI mobile wrapper or HTTP client works out of the box.

React Native
Flutter
// Works today — standard fetch from React Native
const response = await fetch('https://api.xalen.io/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer xln_live_YOUR_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'vedika-standard',
    messages: [{ role: 'user', content: 'Hello' }],
  }),
});
const data = await response.json();
// Flutter SDK — pub.dev: xalen ^0.1.0
import 'package:xalen/xalen.dart';

final client = XALEN(apiKey: 'xln_live_YOUR_KEY');

final response = await client.chatCompletion(
  model: 'vedika-standard',
  messages: [ChatMessage(role: 'user', content: 'Hello')],
);
print(response.choices.first.message.content);
Native SDKs Available

React hooks: npm install xalen-react xalen-sdk — includes useChat, useCompletion, useVoice, and useAstrology hooks with XALENProvider context. Flutter/Dart: xalen: ^0.1.0 in pubspec.yaml — typed client with chat completions, astrology endpoints, and error handling. Offline caching and Expo module coming soon.

Errors

XALEN uses standard HTTP status codes. All error responses include a JSON body with error.type and error.message.

StatusTypeDescription
400invalid_requestMissing or invalid parameters.
401authentication_errorInvalid or missing API key.
403permission_deniedKey lacks permission for this endpoint.
404not_foundEndpoint or resource not found.
429rate_limit_exceededToo many requests. Retry after the time in Retry-After header.
500server_errorInternal error. Contact support if persistent.
503service_unavailableTemporary overload. Retry with exponential backoff.

Error Response Format

JSON Error
{
  "error": {
    "type": "authentication_error",
    "message": "Invalid API key. Keys must start with xln_live_.",
    "status": 401
  }
}
Retry Strategy

For 429 and 503 errors, use exponential backoff starting at 1 second. The SDKs handle this automatically with up to 3 retries.

Production Retry Examples

For production workloads, implement retry logic with exponential backoff and proper error classification. These examples handle rate limits, transient server errors, and hard failures differently.

Python
JavaScript
import xalen
import time

client = xalen.Client(api_key="your-key")

def query_with_retry(prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="vedika-pro-ultra",
                messages=[{"role": "user", "content": prompt}]
            )
            return response
        except xalen.RateLimitError as e:
            wait = min(2 ** attempt, 60)
            time.sleep(wait)
        except xalen.APIError as e:
            if e.status_code >= 500:
                time.sleep(2 ** attempt)
                continue
            raise
    raise Exception("Max retries exceeded")
import Xalen from 'xalen-sdk';

const client = new Xalen({ apiKey: 'your-key' });

async function queryWithRetry(prompt, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({
        model: 'vedika-pro-ultra',
        messages: [{ role: 'user', content: prompt }]
      });
    } catch (err) {
      if (err.status === 429 || err.status >= 500) {
        const wait = Math.min(Math.pow(2, attempt) * 1000, 60000);
        await new Promise(r => setTimeout(r, wait));
        continue;
      }
      throw err;
    }
  }
  throw new Error('Max retries exceeded');
}

Rate Limits

Rate limits are applied per API key and vary by plan. Limits are returned in response headers.

PlanRPMTPM
Pay-as-you-go60 RPM100K TPM
Growth300 RPM500K TPM
Scale1,000 RPM2M TPM
Dedicated5,000 RPM10M TPM
EnterpriseCustomCustom

Rate Limit Headers

HeaderDescription
x-ratelimit-limit-requestsMaximum requests per minute for your plan.
x-ratelimit-remaining-requestsRemaining requests in the current window.
x-ratelimit-reset-requestsSeconds until the request limit resets.
x-ratelimit-limit-tokensMaximum tokens per minute for your plan.
x-ratelimit-remaining-tokensRemaining tokens in the current window.

API Versioning & Deprecation

XALEN uses URL-path versioning to guarantee backward compatibility. The current stable version is v1, and all endpoints are prefixed with /v1/.

Versioning Scheme

AspectPolicy
Current versionv1 (stable)
Version in URLMajor version in path: /v1/, /v2/
Backward compatibilityv1 will be supported for a minimum of 24 months after v2 launches
Breaking changesNo breaking changes to existing endpoints without a version bump

Deprecation Notice Periods

When an API version or model is deprecated, you will receive advance notice based on your plan tier.

PlanAPI DeprecationModel Deprecation
Enterprise90 days60 days + migration guide
Scale60 days60 days + migration guide
Growth30 days60 days + migration guide
Pay-as-you-go30 days60 days + migration guide

Sunset Headers

Deprecated endpoints include a Sunset header indicating the final date of availability, plus a Deprecation header with the date the deprecation was announced.

Deprecation Headers
Sunset: Sat, 01 Nov 2027 00:00:00 GMT
Deprecation: Wed, 01 Aug 2027 00:00:00 GMT
Link: <https://xalen.io/docs#versioning>; rel="sunset"
Model Deprecation

When a model is deprecated, all notices include an equivalent model recommendation so you can migrate with minimal code changes. The deprecated model continues to function for the full 60-day notice period.

Data Portability & Export

XALEN is built on open standards. Your data, your code, your choice of provider. No lock-in, no proprietary formats, no exit fees.

Open Standards

AspectDetails
Response formatAll API responses use standard JSON. No proprietary serialization or binary formats.
OpenAI compatibilityDrop-in compatible with any OpenAI-compatible provider. Migrate to or from XALEN by changing the base URL and API key.
Query languageStandard REST API. No proprietary query language or DSL to learn.
SDKsOpen-source Python and JavaScript SDKs. Portable, no vendor-specific runtime dependencies.

Data Export

GET /v1/account/export

Export all account data as a single JSON archive. Includes usage history, billing records, API key metadata, and account settings.

JSON Response
{
  "account": { "email": "[email protected]", "plan": "growth", "created": "2026-01-15" },
  "usage": [ { "date": "2026-05-14", "requests": 1420, "tokens": 892300, "cost_cents": 178 } ],
  "billing": [ { "id": "pay_abc123", "amount": 5000, "date": "2026-05-01", "status": "completed" } ],
  "api_keys": [ { "id": "key_1", "prefix": "xln_live_a3f...", "created": "2026-01-15", "last_used": "2026-05-14" } ]
}

Code Ownership

Code generated through XALEN Studio is 100% customer-owned. Download your projects as a ZIP archive at any time. No licensing restrictions, no attribution requirements, no runtime dependency on XALEN infrastructure.

Regulatory Compliance

XALEN supports data portability rights under GDPR Article 20 and India's DPDPA. Submit export requests via the dashboard or email [email protected] for assisted exports.