Zubnet AIसीखेंWiki › Unsupervised सीखेंing
Training

Unsupervised सीखेंing

एक training approach जहाँ model data में patterns ढूँढता है, बिना ये बताए कि क्या ढूँढना है। कोई labels नहीं, कोई correct answers नहीं — बस raw data और एक model जो खुद structure discover करता है। Clustering, dimensionality reduction, और anomaly detection classic unsupervised tasks हैं। Model similar data points को group करता है, compressed representations ढूँढता है, या outliers identify करता है।

यह क्यों matter करता है

Real-world data का ज़्यादातर हिस्सा unlabeled है — आपके पास millions of transactions हैं लेकिन किसी ने हर एक को “fraud” या “not fraud” tag नहीं किया। Unsupervised learning इस raw data में वो patterns ढूँढती है जो manually discover करना impossible होता। ये embeddings की भी foundation है, जो semantic search, recommendation systems, और RAG को power देते हैं।

Deep Dive

Unsupervised learning encompasses a family of techniques. Clustering algorithms like K-means group similar data points together. Autoencoders learn compressed representations by encoding data to a small bottleneck and then reconstructing it. Dimensionality reduction (PCA, t-SNE, UMAP) projects high-dimensional data into 2D or 3D for visualization. What unites them is the absence of labels — the model defines its own notion of "similar" or "important" based on the data's statistical structure.

Where LLMs Fit

LLM pre-training is often called "self-supervised" rather than truly unsupervised, because the training signal comes from the data itself (predict the next token). But the spirit is unsupervised — no human annotator labels each token. The model discovers language structure, factual knowledge, reasoning patterns, and even some world knowledge purely from the statistical patterns in text. This is why pre-training requires such massive datasets: without labels to guide it, the model needs enormous amounts of data to discover meaningful patterns on its own.

संबंधित अवधारणाएँ

← सभी Terms
← Twelve Labs Upstage →