What 63 Providers Taught Us About the AI Industry

Building Zubnet meant integrating every major AI provider into one platform. Not through wrappers. Not through aggregator APIs. Direct integration — one custom client per provider, speaking each API’s native protocol, handling each provider’s quirks.

Sixty-three providers. Over 360 models. Text, image, video, audio, 3D, code, embeddings. What we learned in the process is a story about the AI industry that nobody else can tell, because nobody else has done this at this scale with this level of directness.

The Good: DeepSeek Changed the Game

Let’s start with the best thing that happened to AI pricing in 2025: DeepSeek.

When DeepSeek-V3 launched, the pricing was so low we double-checked the documentation three times. $0.27 per million input tokens. For context, comparable-quality models from Western providers were charging $15/M input at the time. That’s not a small difference. That’s 55 times cheaper for similar benchmark performance.

DeepSeek didn’t just compete on price. They competed on transparency. Open weights. Published architecture papers. Clear documentation. A straightforward API that follows the OpenAI-compatible format (which, love it or hate it, has become the industry standard).

The impact was immediate. Within weeks of DeepSeek’s pricing becoming public, we watched multiple providers quietly drop their rates. Together.ai started offering DeepSeek models at competitive prices. Fireworks did the same. The market moved because one provider proved the margins were absurd.

DeepSeek’s pricing impact:

• DeepSeek-V3: $0.27/M input, $1.10/M output
• GPT-4o (comparable quality): $2.50/M input, $10.00/M output at the time
• Claude 3.5 Sonnet: $3.00/M input, $15.00/M output
• Result: industry-wide price compression within months

The Bad: APIs That Lie

Here’s something nobody warns you about: some APIs return HTTP 200 OK with errors in the body.

An HTTP 200 means “success.” Every HTTP client, every monitoring tool, every health check in the world treats 200 as “everything is fine.” But multiple providers — we won’t name all of them, but you’d recognize the names — return 200 status codes with JSON bodies that contain error messages. Your request failed, but the HTTP layer says it succeeded.

This means you can’t rely on status codes alone. You have to parse every response body, check for error fields, handle “success” responses that are actually failures, and build retry logic that understands the difference. For every single provider. Each with their own error format.

One provider returns {"status": "failed", "error": "rate_limited"} with a 200. Another returns {"data": null, "message": "insufficient credits"} with a 200. A third returns a valid-looking response with empty arrays where the results should be, and you only discover the failure when you try to use the output.

The Ugly: Silent Deprecations

This one still makes us angry.

Imagine you’ve spent two weeks integrating a video generation provider. You’ve built the client, the queue handler, the polling logic, the thumbnail extraction, the CDN upload pipeline, the error handling. It’s in production. Users are generating videos. Everything works.

Then one Tuesday morning, it stops. No warning. No deprecation notice. No email. No changelog entry. The model ID you’ve been using simply doesn’t exist anymore.

Real examples of silent deprecations we experienced:

• Vidu changed their API model names without notice — model IDs we were using in production suddenly returned “model not found”
• Kling removed the O1 model variant entirely — no deprecation period, no migration guide, just gone
• Multiple providers changed endpoint URLs without redirects — requests that worked yesterday 404 today
• One provider removed a required field from their response, breaking our parser, while simultaneously adding a new required field to requests, breaking our client

We built an API monitor to defend against this. It scans 15+ provider endpoints daily, diffs the results against our database, and alerts us when models appear or disappear. We shouldn’t have had to build it. Providers should version their APIs and communicate changes. Most don’t.

The Fiction: Provider Documentation

Documentation is where the AI industry’s speed collides with basic professional standards. We’ve encountered:

• Documentation showing request formats that the API doesn’t accept
• “Required” fields that are actually optional, and “optional” fields that cause 500 errors if omitted
• Sample responses that bear no resemblance to actual responses
• Entire API endpoints documented but not implemented
• Authentication methods that changed without the docs being updated
• Rate limits listed as “1000 requests/minute” when the actual limit is 10

Our integration process now starts with a protocol we call “trust but verify”: read the docs, then throw test requests at every endpoint and compare the actual behavior to what’s documented. The delta is often significant.

Cold Starts: The Hidden Tax

Serverless GPU inference is the future, or so the pitch goes. You only pay for what you use. No idle GPUs. Efficient. Scalable.

In practice, it means the first person to use an unpopular model each morning waits 2–10 minutes for a GPU to spin up, load multi-gigabyte model weights into memory, warm the inference pipeline, and then — finally — process the request. We’ve seen cold starts exceed 10 minutes on some providers.

Popular models stay warm because there’s always traffic. But niche models — the specialized ones that are often the reason someone chose your platform — go cold. And when they go cold, your users wait. And when your users wait, they leave.

This is why we prioritize direct integration with providers who run dedicated infrastructure over serverless middlemen. Predictable latency beats theoretical cost savings every time.

The Sleeping Giant: Chinese AI

The Western AI conversation revolves around OpenAI, Anthropic, Google, and Meta. Maybe Mistral if you follow the European scene. But the most interesting work in AI right now is happening in China, and most Western developers have no idea.

Chinese providers that deserve your attention:

• Alibaba (Qwen/Wan) — Qwen is among the best open-weight LLMs. Wan 2.1 video generation competes with Runway and Kling. Massive model family across text, vision, and code.
• Zhipu (GLM) — CogVideoX put them on the video generation map. GLM-4 is a serious LLM. Their API is well-designed and reliable.
• MiniMax — Excellent video generation (Hailuo) and surprisingly competitive text models. Fast iteration speed.
• Moonshot (Kimi) — Long-context specialist. Their 128k context window actually works, unlike some providers where “128k context” means “accepts 128k tokens but forgets everything past 32k.”
• DeepSeek — Already covered, but worth repeating: they proved world-class AI doesn’t require world-class pricing.

These providers are underrated in the West for three reasons: language barriers (documentation is often Chinese-first), payment friction (some require Chinese payment methods or special registration), and simple unfamiliarity. We bridged all three by integrating them directly.

The quality is real. Alibaba’s Wan 2.1 generates video that competes with anything Western providers offer. Zhipu’s CogVideoX has a distinctive aesthetic that some of our users prefer. MiniMax’s iteration speed is extraordinary — they ship meaningful improvements weekly.

Why Direct Integration Wins

Every time we evaluated a wrapper or aggregator versus direct integration, direct won. Here’s why:

• Latency. Every intermediary adds a network hop. Direct calls are faster. Period.
• Reliability. When a wrapper goes down, all the providers behind it go down for you. Direct integration means one provider’s outage doesn’t cascade.
• Pricing. Wrappers take a margin. Direct means you pay the provider’s price. On 63 providers, those margins add up fast.
• Features. Wrappers abstract to the lowest common denominator. Direct integration means you can use every feature each provider offers — their streaming format, their unique parameters, their model-specific capabilities.
• Debugging. When something breaks through a wrapper, good luck figuring out whether the issue is in your code, the wrapper, or the underlying provider. Direct integration means there are only two parties: you and them.

It’s more work. Dramatically more work. Each provider has its own authentication scheme, its own request format, its own error handling, its own rate limiting, its own quirks. We have 63 custom integration clients. That’s 63 sets of tests, 63 monitoring endpoints, 63 documentation pages to keep current.

But our users get lower latency, lower prices, higher reliability, and access to features that no wrapper exposes. The work is worth it.

The lesson after 63 integrations: The AI industry moves fast and breaks things — often your things. The providers who communicate changes, maintain accurate documentation, and respect their API contracts are the ones worth building on. The rest will cost you more in maintenance than they save in development time. And the Chinese AI ecosystem is not a curiosity — it’s a serious force that’s reshaping what’s possible at every price point.

These observations come from 18 months of continuous integration work, starting in late 2024 and running through March 2026. Every claim reflects our direct experience. We have API keys, integration code, and battle scars for all 63 providers.

Try the platform where every provider is integrated directly — no wrappers, no middlemen. That’s Zubnet.