Unsupervised Aprendering: Definition & Meaning — AI Wiki

Un enfoque de entrenamiento donde el modelo encuentra patrones en los datos sin que le digan qué buscar. Sin etiquetas, sin respuestas correctas — solo datos crudos y un modelo que descubre estructura por sí solo. Clustering, reducción de dimensionalidad y detección de anomalías son tareas no supervisadas clásicas. El modelo agrupa puntos similares, encuentra representaciones comprimidas o identifica outliers.

Por qué importa

La mayoría de datos del mundo real no están etiquetados — tienes millones de transacciones pero nadie ha marcado cada una como «fraude» o «no fraude». El aprendizaje no supervisado encuentra en estos datos crudos patrones que serían imposibles de descubrir a mano. También es la base de los embeddings, que impulsan búsqueda semántica, sistemas de recomendación y RAG.

Deep Dive

Unsupervised learning encompasses a family of techniques. Clustering algorithms like K-means group similar data points together. Autoencoders learn compressed representations by encoding data to a small bottleneck and then reconstructing it. Dimensionality reduction (PCA, t-SNE, UMAP) projects high-dimensional data into 2D or 3D for visualization. What unites them is the absence of labels — the model defines its own notion of "similar" or "important" based on the data's statistical structure.

Where LLMs Fit

LLM pre-training is often called "self-supervised" rather than truly unsupervised, because the training signal comes from the data itself (predict the next token). But the spirit is unsupervised — no human annotator labels each token. The model discovers language structure, factual knowledge, reasoning patterns, and even some world knowledge purely from the statistical patterns in text. This is why pre-training requires such massive datasets: without labels to guide it, the model needs enormous amounts of data to discover meaningful patterns on its own.

Unsupervised Aprendering

Por qué importa

Deep Dive

Where LLMs Fit

Conceptos relacionados