Zubnet AI学习Wiki › Self-Supervised 学习ing
Training

Self-Supervised 学习ing

SSL
一种训练方法,模型从未标注的数据中生成自己的监督信号。关键技巧:把输入的一部分藏起来,训练模型去预测被藏的那部分。对 LLM 来说,这意味着把下一个 token 掩码掉再去预测。对视觉模型(如 DINO)来说,就是掩码图像 patch。你得到了监督学习的好处,但不用花钱请人打标签。

为什么重要

自监督学习是让现代 AI 成为可能的突破。它是 LLM 如何从原始文本中学语言、BERT 如何学懂句子、视觉模型如何不靠带标签的图像学会看的方法。它解锁了在整个互联网上训练的能力,而不再受限于昂贵的手工标注数据集。

Deep Dive

The two dominant self-supervised approaches in NLP are causal language modeling (predict the next token, used by GPT/Claude/Llama) and masked language modeling (mask random tokens and predict them, used by BERT). Causal modeling produces generative models — they can write text. Masked modeling produces understanding models — they excel at classification, search, and analysis but can't generate fluently.

Why It Works So Well

Predicting the next token sounds trivial, but to do it well, a model must learn grammar, facts, reasoning, style, and even some common sense. If the text says "The capital of France is," the model needs world knowledge to predict "Paris." If it says "She picked up the ball and threw it to," the model needs to understand pronouns, physics, and social context. The simple objective of next-token prediction creates pressure to learn deeply about language and the world it describes.

Contrastive 学习ing

In vision and embeddings, self-supervised learning often uses contrastive objectives: learn representations where similar items are close together and dissimilar items are far apart. CLIP (matching images to text descriptions), SimCLR (matching augmented views of the same image), and embedding models all use this approach. The supervision signal comes from the data structure itself — two crops of the same image should have similar representations, while crops of different images should not.

相关概念

← 所有术语
← Self-Attention Semantic Search →