Integrating AI via API: A Practical Guide
You have an app. You want it to talk to AI. Maybe you want chat completions, maybe image generation, maybe both. The good news: if you have ever used the OpenAI API, you already know how to use ours. The better news: you get access to 367 models from 61 providers through a single endpoint.
No new SDK to learn. No proprietary protocol. Just change the base URL and your API key.
Step 1: Get Your API Key
Sign up at zubnet.com and navigate to your workspace settings. Your API key lives there. It starts with a recognizable prefix:
zub_live_a1b2c3d4e5f6...
Keep it secret. Treat it like a password. If it leaks, rotate it immediately from your workspace settings — the old key stops working the moment you generate a new one.
Step 2: Set Your Base URL
Every request goes to:
https://api.zubnet.ai/v1
That is it. The /v1 prefix matches the OpenAI convention, so any library, tool, or framework that supports OpenAI will work by just swapping the URL.
Step 3: Make Your First Request
Using curl
The simplest possible test — a chat completion with one message:
curl https://api.zubnet.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer zub_live_YOUR_KEY" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'
You will get back a JSON response with the model’s answer in choices[0].message.content. Exactly the same shape as OpenAI’s response.
Using Python with the OpenAI SDK
Install the SDK if you have not already:
pip install openai
Then use it with our base URL:
from openai import OpenAI
client = OpenAI(
api_key="zub_live_YOUR_KEY",
base_url="https://api.zubnet.ai/v1"
)
response = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in three sentences."}
],
max_tokens=500,
temperature=0.7
)
print(response.choices[0].message.content)
That is a complete, working example. Copy it, paste it, replace the API key, run it.
The Model Parameter
The model field is where it gets interesting. Instead of being locked to one provider, you pick from 367 models across 61 providers. Some examples:
• claude-sonnet-4-20250514 — Anthropic Claude Sonnet 4
• gpt-4.1 — OpenAI GPT-4.1
• gemini-2.5-pro-preview-05-06 — Google Gemini 2.5 Pro
• deepseek-r1 — DeepSeek R1 reasoning model
• flux-1.1-pro — Black Forest Labs FLUX for images
• kling-v2.5-master — Kling video generation
The full list is on our models page, filterable by type (LLM, image, video, audio, 3D, code) and provider. The model ID you see there is exactly what goes in the model field.
Image Generation
Image models use the same endpoint pattern. Here is FLUX via Python:
response = client.images.generate(
model="flux-1.1-pro",
prompt="A serene Japanese garden at dawn, soft morning light",
n=1,
size="1024x1024"
)
image_url = response.data[0].url
print(image_url)
The response includes a URL to the generated image hosted on our CDN. Download it, display it, embed it — the URL is valid for 24 hours.
Streaming Responses
For chat models, streaming sends tokens as they are generated instead of waiting for the full response. This is essential for any user-facing chat interface — nobody wants to stare at a blank screen for 10 seconds.
stream = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[
{"role": "user", "content": "Write a short poem about code."}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Under the hood, this uses Server-Sent Events (SSE). Each chunk arrives as a data: line in the HTTP response. The OpenAI SDK handles the parsing for you, but if you are using raw HTTP, you will see lines like:
data: {"choices":[{"delta":{"content":"Once"},"index":0}]}
data: {"choices":[{"delta":{"content":" upon"},"index":0}]}
data: {"choices":[{"delta":{"content":" a"},"index":0}]}
...
data: [DONE]
The stream ends with data: [DONE]. Parse each line, extract delta.content, append to your output buffer. That is the entire protocol.
Error Handling
The API uses standard HTTP status codes. Here are the ones you will actually encounter:
400 Bad Request — Your request body is malformed. Check the JSON structure, make sure messages is an array, make sure model is a valid model ID.
401 Unauthorized — Your API key is missing, invalid, or expired. Check the Authorization: Bearer header.
403 Forbidden — Your account does not have access to this model or feature. Check your workspace subscription level.
429 Too Many Requests — You have hit the rate limit. The response includes a Retry-After header telling you how many seconds to wait. Implement exponential backoff:
import time
from openai import RateLimitError
max_retries = 3
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello"}]
)
break
except RateLimitError:
wait = 2 ** attempt
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
500 / 502 / 503 — Server-side issues. These are rare but they happen. Retry with backoff. If persistent, check our Pulse status page.
Authentication Details
Every request needs an Authorization header:
Authorization: Bearer zub_live_YOUR_KEY
The key is scoped to your workspace. All team members in the same workspace share the same key and usage pool. If you need separate tracking per team member, create separate workspaces.
• Never hardcode keys in source code — use environment variables
• Never commit keys to git — use .env files in .gitignore
• Rotate keys regularly from workspace settings
• Use separate keys for development and production
Rate Limits
Rate limits depend on your plan tier. The defaults are generous for most use cases:
• Requests per minute: varies by plan (60–600 RPM)
• Tokens per minute: varies by model and plan
• Concurrent requests: no hard limit, but 429s will fire if you burst too aggressively
Rate limit headers are included in every response:
x-ratelimit-limit-requests: 120
x-ratelimit-remaining-requests: 119
x-ratelimit-reset-requests: 0.5s
Read these headers. Build your client to respect them. Your users will thank you when their requests do not randomly fail.
Putting It All Together
Here is a production-ready Python function that handles errors, retries, and streaming:
import os
from openai import OpenAI, APIError, RateLimitError
import time
client = OpenAI(
api_key=os.environ["ZUBNET_API_KEY"],
base_url="https://api.zubnet.ai/v1"
)
def ask(prompt, model="claude-sonnet-4-20250514", stream=False):
for attempt in range(3):
try:
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
stream=stream
)
if stream:
return response # iterate over chunks
return response.choices[0].message.content
except RateLimitError:
time.sleep(2 ** attempt)
except APIError as e:
if attempt == 2:
raise
time.sleep(1)
# Usage
print(ask("Summarize the latest in quantum computing"))
print(ask("Same question, different model", model="gpt-4.1"))
Switch models by changing a string. No code refactoring, no new dependencies, no different response format. That is the entire point of OpenAI compatibility.
All code examples in this guide are tested and working as of March 2026. Copy them, run them, build on them. That is what they are for.
Ready to start? Get your API key at zubnet.com