Serverless Inference
Access production-ready models through a clean API. Start in minutes with usage-based pricing and global low-latency routing.
Designed for teams shipping AI agents, copilots, and workflow automation.
API Usage
Only two steps to call the Altus API
Obtain API Key
Create a key in your console.
Chat API Call
Send your first inference request.
Request Example
curl --location 'https://api.altuscloud.ai/v1/chat/completions' \
--header 'Authorization: Bearer your-api-key' \
--header 'Content-Type: application/json' \
--data '{
"model": "altus-chat-v3",
"messages": [
{
"role": "user",
"content": "Hello, Altuscloud!"
}
]
}'Advantages
Enterprise-grade AI inference with predictable performance
Full-Fledged API
Access chat, image, and embedding endpoints through one consistent API.
One-Click Access
Deploy models in minutes with defaults tuned for production reliability.
Low Latency
A low-latency global network serves requests near your users.
API Pricing
Transparent pricing with no hidden infrastructure fees.
Momentum
High-throughput serverless inference for growing teams.
Input: $0.45 / 1M tokens
Output: $1.20 / 1M tokens
- Autoscale to zero
- Batch + streaming support
- Regional failover
- Usage analytics
- Community support
Pinnacle
Max performance tier with dedicated capacity and SLAs.
Input: $0.75 / 1M tokens
Output: $2.10 / 1M tokens
- Priority GPU pools
- Dedicated routing lanes
- Enterprise SLAs
- Advanced observability
- Designated success team
