The term "open weights" exists because the AI industry's use of "open source" is genuinely misleading. Traditional open source (as defined by the OSI) means you get the source code, can modify it, and can redistribute it. When Meta releases Llama, you get the trained model weights — the billions of numerical parameters that define the model's behavior — but not the training data, not the full training code, and often not the data preprocessing pipeline. You can run inference and fine-tune, but you can't reproduce the model from scratch. The Open Source Initiative released a formal definition of "Open Source AI" in late 2024 attempting to clarify this, but the industry still uses the terms loosely. Knowing the distinction matters when you're evaluating what you can actually do with a model.
The spectrum of openness varies widely between releases. At one end, Meta's Llama models come with a custom license that prohibits use by companies with over 700 million monthly active users (clearly aimed at competitors) and requires attribution. Mistral's models have generally used Apache 2.0, one of the most permissive licenses available. Alibaba's Qwen family uses Apache 2.0 as well. DeepSeek has released weights under MIT license. Meanwhile, projects like BLOOM (BigScience) and OLMo (AI2) went further by also releasing training data and full training code — these are closer to truly open source. For developers, the license determines whether you can use the model commercially, whether you need to share modifications, and whether you can build proprietary products on top of it.
Running open-weights models yourself has gotten dramatically more accessible thanks to quantization and optimized inference engines. A 70-billion-parameter model that would need over 140 GB of VRAM in full precision can run on a single 24 GB consumer GPU at 4-bit quantization with acceptable quality loss. Tools like llama.cpp, vLLM, and Ollama have made local inference almost trivially easy — you can have a capable model running on a gaming laptop in minutes. The practical bottleneck has shifted from "can I run it?" to "is the quality sufficient for my use case?" Quantized smaller models are remarkably good for many tasks, but they do lose performance on complex reasoning and long-context work compared to full-precision frontier models served via API.
The safety implications of open weights are one of the most actively debated topics in AI policy. The concern is straightforward: once weights are released, anyone can fine-tune away the safety training. Researchers have demonstrated that RLHF-based safety guardrails can be removed from open-weights models with just a few hundred examples and minimal compute. This means open-weights models can be turned into uncensored versions that will comply with any request. The counterargument — and it's a strong one — is that the knowledge these models contain is already available on the internet, that the benefits of open research and distributed innovation outweigh the risks, and that trying to restrict model distribution just concentrates power in a few large companies without meaningfully improving safety. Both sides have valid points, and the debate is far from settled.
For practitioners choosing between open-weights and API-based models, the decision comes down to four factors: privacy (open weights keep your data local), cost (self-hosting is cheaper at high volume but more expensive at low volume), control (you can fine-tune and customize freely), and capability (API-only frontier models like GPT-4o and Claude still outperform the best open-weights models on many benchmarks, though the gap narrows with each major release). Many production systems use both — routing simple queries to a local open-weights model for speed and cost, while sending complex tasks to a frontier API. This hybrid approach gives you the best of both worlds, and it's increasingly the pragmatic choice for teams that need both performance and privacy.