At the mechanical level, an AI API call is just an HTTP request — almost always a POST to an HTTPS endpoint with a JSON body. You send your prompt, system instructions, model parameters like temperature and max tokens, and the provider sends back a JSON response containing the model's output. Most providers today follow the pattern OpenAI established: a /v1/chat/completions-style endpoint that accepts a messages array with role/content pairs. Anthropic's Messages API is slightly different in structure but follows the same philosophy. The key thing to understand is that these are stateless calls — the server doesn't remember your previous request unless you explicitly resend the conversation history each time.
Streaming is where things get more interesting. Instead of waiting for the model to finish generating its entire response (which can take 10-30 seconds for a long answer), most AI APIs support Server-Sent Events (SSE). The server sends the response token by token as it's generated, so your user starts seeing text almost immediately. This is why ChatGPT and Claude feel responsive even though the full response takes a while — you're watching the model "think" in real time. Implementing streaming correctly means handling partial JSON chunks, managing connection timeouts, and gracefully recovering when the stream drops mid-response.
Authentication varies across providers but usually falls into one of two patterns: a simple API key passed as a Bearer token in the Authorization header, or a more complex OAuth flow for enterprise setups. Anthropic uses an x-api-key header, OpenAI uses Authorization: Bearer sk-..., and Google Cloud requires service account credentials. If you're working with multiple providers — which most production systems do — you quickly discover that "OpenAI-compatible" is a spectrum. Providers like Together AI, Groq, and Mistral mostly follow OpenAI's schema, but the edge cases in error handling, parameter support, and response formatting are where integration work actually lives.
One misconception worth clearing up: REST APIs are not the only game in town, even if they dominate. Some providers offer gRPC endpoints for lower-overhead communication, and WebSocket-based APIs are becoming more common for real-time voice and streaming use cases. ElevenLabs' voice API, for example, uses WebSockets for bidirectional audio streaming. But for text-in-text-out LLM inference, REST with SSE streaming remains the standard, and that's unlikely to change soon — the overhead of HTTP is negligible compared to the time the model spends generating tokens.