Limited-time free access for early AI teams

Serverless Inference

Access production-ready models through a clean API. Start in minutes with usage-based pricing and global low-latency routing.

Designed for teams shipping AI agents, copilots, and workflow automation.

Try Now API Documentation

API Usage

Only two steps to call the Altus API

Obtain API Key

Create a key in your console.

Chat API Call

Send your first inference request.

Request Example

bash

curl --location 'https://api.altuscloud.ai/v1/chat/completions' \
  --header 'Authorization: Bearer your-api-key' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "altus-chat-v3",
    "messages": [
      {
        "role": "user",
        "content": "Hello, Altuscloud!"
      }
    ]
  }'

Advantages

Enterprise-grade AI inference with predictable performance

Full-Fledged API

Access chat, image, and embedding endpoints through one consistent API.

One-Click Access

Deploy models in minutes with defaults tuned for production reliability.

Low Latency

A low-latency global network serves requests near your users.

API Pricing

Transparent pricing with no hidden infrastructure fees.

Momentum

High-throughput serverless inference for growing teams.

Input: $0.45 / 1M tokens

Output: $1.20 / 1M tokens

Autoscale to zero
Batch + streaming support
Regional failover
Usage analytics
Community support

Start Free

Pinnacle

Max performance tier with dedicated capacity and SLAs.

Most Powerful

Input: $0.75 / 1M tokens

Output: $2.10 / 1M tokens

Priority GPU pools
Dedicated routing lanes
Enterprise SLAs
Advanced observability
Designated success team

Start Free