An endpoint is a URL path on a server that accepts a specific type of request and returns a specific type of response. In AI APIs, the most common endpoint is the chat completions endpoint — POST /v1/chat/completions in OpenAI's schema, POST /v1/messages at Anthropic. But modern AI providers expose a constellation of endpoints beyond chat: /v1/embeddings for turning text into vectors, /v1/images/generations for image creation, /v1/audio/transcriptions for speech-to-text, and /v1/models for listing available models. Each endpoint expects different request parameters and returns different response shapes.
The practical challenge is that "OpenAI-compatible" endpoints are only approximately compatible. Groq, Together AI, and Fireworks all advertise OpenAI compatibility, and they'll work fine for basic chat completion requests. But dig into the details and you'll find differences: some don't support the response_format parameter for structured output, others handle tool/function calling differently, and error response formats vary widely. Anthropic doesn't even try to be OpenAI-compatible — their Messages API uses a different structure entirely, with content as an array of blocks rather than a plain string. When you're building a system that routes between multiple providers, these differences are where most of the engineering time goes.
Versioning is another important dimension. Providers evolve their endpoints over time, and breaking changes happen. OpenAI uses date-based model versioning (like gpt-4-0125-preview), while the endpoint paths themselves stay stable. Anthropic includes a version header (anthropic-version: 2023-06-01) that determines the request/response schema. Google's Vertex AI uses version prefixes in the URL path. When a provider deprecates an endpoint version, you typically get a few months' warning, but if you're not watching their changelogs, you might wake up one morning to a broken integration.
Base URLs deserve mention too, because they're not as straightforward as you'd expect. Anthropic's API lives at api.anthropic.com, but OpenAI offers api.openai.com for direct access and separate base URLs for Azure OpenAI Service deployments. Some providers have regional endpoints for data residency compliance — your requests to europe-west1-aiplatform.googleapis.com stay in the EU. For providers that route through inference platforms like HuggingFace's Inference API, the base URL is the platform (router.huggingface.co) and the model identifier goes in the path or headers. Understanding this topology matters because latency, data sovereignty, and billing can all depend on which endpoint you're actually hitting.