Weights & Biases: Definition & Meaning — AI Wiki

The dominant MLOps platform for tracking machine learning experiments. W&B lets you log metrics, hyperparameters, model outputs, and system performance during training, then compare runs visually. It's become the standard tool for ML researchers and engineers to track what they tried, what worked, and why — essentially version control for experiments.

Why it matters

Without experiment tracking, ML development is chaos: which hyperparameters produced that good result? Which dataset version was used? Why did training diverge? W&B solved this problem so well that it's now used by most AI labs, from solo researchers to OpenAI. If you're training models, you're almost certainly using W&B or something inspired by it.

Deep Dive

W&B's core product is experiment tracking: a few lines of code in your training script log loss curves, learning rates, GPU utilization, sample outputs, and any custom metrics to a dashboard. You can compare hundreds of training runs side-by-side, filter by hyperparameters, and identify which configurations worked best. The key insight was making this frictionless — wandb.init() and wandb.log() are all most users need.

Beyond Tracking

W&B expanded into adjacent tools: Sweeps (automated hyperparameter search), Artifacts (dataset and model versioning), Tables (interactive data exploration), and Reports (shareable experiment analyses). Their Weave product targets LLM application development specifically, with tools for prompt evaluation, LLM pipeline tracing, and output quality monitoring. The platform covers the full ML lifecycle from experiment to production monitoring.

Weights & Biases

Why it matters

Deep Dive

Beyond Tracking

Related Concepts