Zubnet AI學習Wiki › Unsupervised 學習ing
Training

Unsupervised 學習ing

一種訓練方法,模型在資料中找模式,但沒人告訴它要找什麼。沒有標籤,沒有正確答案 — 只有原始資料和一個自己發現結構的模型。分群、降維和異常偵測是經典的非監督式任務。模型把相似的資料點歸組、找壓縮表示、或辨識異常值。

為什麼重要

現實世界裡大多數資料都沒標籤 — 你有幾百萬筆交易,但沒人把每一筆都標為「詐欺」或「非詐欺」。非監督式學習能在這些原始資料裡找到手動發現不了的模式。它也是 embeddings 的基礎,而 embeddings 驅動語意搜尋、推薦系統和 RAG。

Deep Dive

Unsupervised learning encompasses a family of techniques. Clustering algorithms like K-means group similar data points together. Autoencoders learn compressed representations by encoding data to a small bottleneck and then reconstructing it. Dimensionality reduction (PCA, t-SNE, UMAP) projects high-dimensional data into 2D or 3D for visualization. What unites them is the absence of labels — the model defines its own notion of "similar" or "important" based on the data's statistical structure.

Where LLMs Fit

LLM pre-training is often called "self-supervised" rather than truly unsupervised, because the training signal comes from the data itself (predict the next token). But the spirit is unsupervised — no human annotator labels each token. The model discovers language structure, factual knowledge, reasoning patterns, and even some world knowledge purely from the statistical patterns in text. This is why pre-training requires such massive datasets: without labels to guide it, the model needs enormous amounts of data to discover meaningful patterns on its own.

相關概念

← 所有術語
← Twelve Labs Upstage →