Zubnet AI學習Wiki › Self-Supervised 學習ing
Training

Self-Supervised 學習ing

SSL
一種訓練方法,模型從未標註的資料中生成自己的監督訊號。關鍵技巧:把輸入的一部分藏起來,訓練模型去預測被藏的那部分。對 LLM 來說,這意味著把下一個 token 遮罩掉再去預測。對視覺模型(如 DINO)來說,就是遮罩影像 patch。你得到了監督式學習的好處,但不用花錢請人打標籤。

為什麼重要

自監督式學習是讓現代 AI 成為可能的突破。它是 LLM 如何從原始文字中學語言、BERT 如何學懂句子、視覺模型如何不靠有標籤的影像學會看的方法。它解鎖了在整個網路上訓練的能力,而不再受限於昂貴的手工標註資料集。

Deep Dive

The two dominant self-supervised approaches in NLP are causal language modeling (predict the next token, used by GPT/Claude/Llama) and masked language modeling (mask random tokens and predict them, used by BERT). Causal modeling produces generative models — they can write text. Masked modeling produces understanding models — they excel at classification, search, and analysis but can't generate fluently.

Why It Works So Well

Predicting the next token sounds trivial, but to do it well, a model must learn grammar, facts, reasoning, style, and even some common sense. If the text says "The capital of France is," the model needs world knowledge to predict "Paris." If it says "She picked up the ball and threw it to," the model needs to understand pronouns, physics, and social context. The simple objective of next-token prediction creates pressure to learn deeply about language and the world it describes.

Contrastive 學習ing

In vision and embeddings, self-supervised learning often uses contrastive objectives: learn representations where similar items are close together and dissimilar items are far apart. CLIP (matching images to text descriptions), SimCLR (matching augmented views of the same image), and embedding models all use this approach. The supervision signal comes from the data structure itself — two crops of the same image should have similar representations, while crops of different images should not.

相關概念

← 所有術語
← Self-Attention Semantic Search →