Intermediate

Integrating AI via API: A Practical Guide

One API, 367 models. OpenAI-compatible — change the base URL, keep your code. Here is everything you need to go from zero to working AI integration in under an hour.
Sarah Chen March 2026 10 min read

You have an app. You want it to talk to AI. Maybe you want chat completions, maybe image generation, maybe both. The good news: if you have ever used the OpenAI API, you already know how to use ours. The better news: you get access to 367 models from 61 providers through a single endpoint.

No new SDK to learn. No proprietary protocol. Just change the base URL and your API key.

Step 1: Get Your API Key

Sign up at zubnet.com and navigate to your workspace settings. Your API key lives there. It starts with a recognizable prefix:

zub_live_a1b2c3d4e5f6...

Keep it secret. Treat it like a password. If it leaks, rotate it immediately from your workspace settings — the old key stops working the moment you generate a new one.

Step 2: Set Your Base URL

Every request goes to:

https://api.zubnet.ai/v1

That is it. The /v1 prefix matches the OpenAI convention, so any library, tool, or framework that supports OpenAI will work by just swapping the URL.

Step 3: Make Your First Request

Using curl

The simplest possible test — a chat completion with one message:

curl https://api.zubnet.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer zub_live_YOUR_KEY" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

You will get back a JSON response with the model’s answer in choices[0].message.content. Exactly the same shape as OpenAI’s response.

Using Python with the OpenAI SDK

Install the SDK if you have not already:

pip install openai

Then use it with our base URL:

from openai import OpenAI

client = OpenAI(
    api_key="zub_live_YOUR_KEY",
    base_url="https://api.zubnet.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in three sentences."}
    ],
    max_tokens=500,
    temperature=0.7
)

print(response.choices[0].message.content)

That is a complete, working example. Copy it, paste it, replace the API key, run it.

The Model Parameter

The model field is where it gets interesting. Instead of being locked to one provider, you pick from 367 models across 61 providers. Some examples:

Popular model IDs:

claude-sonnet-4-20250514 — Anthropic Claude Sonnet 4
gpt-4.1 — OpenAI GPT-4.1
gemini-2.5-pro-preview-05-06 — Google Gemini 2.5 Pro
deepseek-r1 — DeepSeek R1 reasoning model
flux-1.1-pro — Black Forest Labs FLUX for images
kling-v2.5-master — Kling video generation

The full list is on our models page, filterable by type (LLM, image, video, audio, 3D, code) and provider. The model ID you see there is exactly what goes in the model field.

Image Generation

Image models use the same endpoint pattern. Here is FLUX via Python:

response = client.images.generate(
    model="flux-1.1-pro",
    prompt="A serene Japanese garden at dawn, soft morning light",
    n=1,
    size="1024x1024"
)

image_url = response.data[0].url
print(image_url)

The response includes a URL to the generated image hosted on our CDN. Download it, display it, embed it — the URL is valid for 24 hours.

Streaming Responses

For chat models, streaming sends tokens as they are generated instead of waiting for the full response. This is essential for any user-facing chat interface — nobody wants to stare at a blank screen for 10 seconds.

stream = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[
        {"role": "user", "content": "Write a short poem about code."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Under the hood, this uses Server-Sent Events (SSE). Each chunk arrives as a data: line in the HTTP response. The OpenAI SDK handles the parsing for you, but if you are using raw HTTP, you will see lines like:

data: {"choices":[{"delta":{"content":"Once"},"index":0}]}
data: {"choices":[{"delta":{"content":" upon"},"index":0}]}
data: {"choices":[{"delta":{"content":" a"},"index":0}]}
...
data: [DONE]

The stream ends with data: [DONE]. Parse each line, extract delta.content, append to your output buffer. That is the entire protocol.

Error Handling

The API uses standard HTTP status codes. Here are the ones you will actually encounter:

400 Bad Request — Your request body is malformed. Check the JSON structure, make sure messages is an array, make sure model is a valid model ID.

401 Unauthorized — Your API key is missing, invalid, or expired. Check the Authorization: Bearer header.

403 Forbidden — Your account does not have access to this model or feature. Check your workspace subscription level.

429 Too Many Requests — You have hit the rate limit. The response includes a Retry-After header telling you how many seconds to wait. Implement exponential backoff:

import time
from openai import RateLimitError

max_retries = 3
for attempt in range(max_retries):
    try:
        response = client.chat.completions.create(
            model="claude-sonnet-4-20250514",
            messages=[{"role": "user", "content": "Hello"}]
        )
        break
    except RateLimitError:
        wait = 2 ** attempt
        print(f"Rate limited. Waiting {wait}s...")
        time.sleep(wait)

500 / 502 / 503 — Server-side issues. These are rare but they happen. Retry with backoff. If persistent, check our Pulse status page.

Authentication Details

Every request needs an Authorization header:

Authorization: Bearer zub_live_YOUR_KEY

The key is scoped to your workspace. All team members in the same workspace share the same key and usage pool. If you need separate tracking per team member, create separate workspaces.

Security best practices:

• Never hardcode keys in source code — use environment variables
• Never commit keys to git — use .env files in .gitignore
• Rotate keys regularly from workspace settings
• Use separate keys for development and production

Rate Limits

Rate limits depend on your plan tier. The defaults are generous for most use cases:

Requests per minute: varies by plan (60–600 RPM)
Tokens per minute: varies by model and plan
Concurrent requests: no hard limit, but 429s will fire if you burst too aggressively

Rate limit headers are included in every response:

x-ratelimit-limit-requests: 120
x-ratelimit-remaining-requests: 119
x-ratelimit-reset-requests: 0.5s

Read these headers. Build your client to respect them. Your users will thank you when their requests do not randomly fail.

Putting It All Together

Here is a production-ready Python function that handles errors, retries, and streaming:

import os
from openai import OpenAI, APIError, RateLimitError
import time

client = OpenAI(
    api_key=os.environ["ZUBNET_API_KEY"],
    base_url="https://api.zubnet.ai/v1"
)

def ask(prompt, model="claude-sonnet-4-20250514", stream=False):
    for attempt in range(3):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                stream=stream
            )
            if stream:
                return response  # iterate over chunks
            return response.choices[0].message.content
        except RateLimitError:
            time.sleep(2 ** attempt)
        except APIError as e:
            if attempt == 2:
                raise
            time.sleep(1)

# Usage
print(ask("Summarize the latest in quantum computing"))
print(ask("Same question, different model", model="gpt-4.1"))

Switch models by changing a string. No code refactoring, no new dependencies, no different response format. That is the entire point of OpenAI compatibility.

The key insight: OpenAI compatibility is not just about convenience. It means every tool in the ecosystem — LangChain, LlamaIndex, AutoGen, Cursor, Continue, any tool that speaks the OpenAI protocol — works with Zubnet out of the box. Change the base URL, plug in your key, and you have 367 models in any framework.

All code examples in this guide are tested and working as of March 2026. Copy them, run them, build on them. That is what they are for.

Ready to start? Get your API key at zubnet.com

Sarah Chen
Zubnet · March 2026
ESC