Unsupervised Learning: Definition & Meaning — AI Wiki

A training approach where the model finds patterns in data without being told what to look for. No labels, no correct answers — just raw data and a model that discovers structure on its own. Clustering, dimensionality reduction, and anomaly detection are classic unsupervised tasks. The model groups similar data points, finds compressed representations, or identifies outliers.

Why it matters

Most real-world data is unlabeled — you have millions of transactions but no one has tagged each one as "fraud" or "not fraud." Unsupervised learning finds patterns in this raw data that would be impossible to discover manually. It's also the basis for embeddings, which power semantic search, recommendation systems, and RAG.

Deep Dive

Unsupervised learning encompasses a family of techniques. Clustering algorithms like K-means group similar data points together. Autoencoders learn compressed representations by encoding data to a small bottleneck and then reconstructing it. Dimensionality reduction (PCA, t-SNE, UMAP) projects high-dimensional data into 2D or 3D for visualization. What unites them is the absence of labels — the model defines its own notion of "similar" or "important" based on the data's statistical structure.

Where LLMs Fit

LLM pre-training is often called "self-supervised" rather than truly unsupervised, because the training signal comes from the data itself (predict the next token). But the spirit is unsupervised — no human annotator labels each token. The model discovers language structure, factual knowledge, reasoning patterns, and even some world knowledge purely from the statistical patterns in text. This is why pre-training requires such massive datasets: without labels to guide it, the model needs enormous amounts of data to discover meaningful patterns on its own.

Unsupervised Learning

Why it matters

Deep Dive

Where LLMs Fit

Related Concepts