Zubnet AIसीखेंWiki › Self-Supervised सीखेंing
Training

Self-Supervised सीखेंing

SSL
एक training approach जहाँ model unlabeled data से अपना supervision signal खुद generate करता है। Key trick: input का एक हिस्सा छुपाओ और model को hidden part predict करने के लिए train करो। LLMs के लिए, इसका मतलब next token को mask करके उसे predict करना। Vision models (जैसे DINO) के लिए, इसका मतलब image patches mask करना। आपको supervised learning के benefits मिलते हैं, बिना human labels की cost के।

यह क्यों matter करता है

Self-supervised learning वो breakthrough है जिसने modern AI possible बनाई। यही है कैसे LLMs raw text से language सीखते हैं, कैसे BERT sentences समझना सीखता है, और कैसे vision models labeled images के बिना देखना सीखते हैं। इसने पूरे internet पर train करने की ability unlock की, expensive hand-labeled datasets तक limited रहने के बजाय।

Deep Dive

The two dominant self-supervised approaches in NLP are causal language modeling (predict the next token, used by GPT/Claude/Llama) and masked language modeling (mask random tokens and predict them, used by BERT). Causal modeling produces generative models — they can write text. Masked modeling produces understanding models — they excel at classification, search, and analysis but can't generate fluently.

Why It Works So Well

Predicting the next token sounds trivial, but to do it well, a model must learn grammar, facts, reasoning, style, and even some common sense. If the text says "The capital of France is," the model needs world knowledge to predict "Paris." If it says "She picked up the ball and threw it to," the model needs to understand pronouns, physics, and social context. The simple objective of next-token prediction creates pressure to learn deeply about language and the world it describes.

Contrastive सीखेंing

In vision and embeddings, self-supervised learning often uses contrastive objectives: learn representations where similar items are close together and dissimilar items are far apart. CLIP (matching images to text descriptions), SimCLR (matching augmented views of the same image), and embedding models all use this approach. The supervision signal comes from the data structure itself — two crops of the same image should have similar representations, while crops of different images should not.

संबंधित अवधारणाएँ

← सभी Terms
← Self-Attention Semantic Search →