Zubnet AIसीखेंWiki › Contrastive सीखेंing
Training

Contrastive सीखेंing

SimCLR, InfoNCE
एक self-supervised learning approach जो models को positive pairs (similar items जो embedding space में close होने चाहिए) को negative pairs (dissimilar items जो दूर होने चाहिए) के against contrast करके train करता है। CLIP matching image-text pairs को non-matching के against contrast करता है। SimCLR same image के augmented views को different images के views के against contrast करता है। Model ऐसी representations सीखता है जहाँ embedding space में similarity real-world similarity reflect करती है।

यह क्यों matter करता है

Contrastive learning वो है जिससे अधिकांश embedding models trained हैं — वो models जो semantic search, RAG, और recommendations को power देते हैं। ये CLIP के पीछे का training approach भी है, जो language और vision को connect करता है। जब भी आप similarity measure करने के लिए embeddings use करते हैं, contrastive learning likely वो है जिससे वो embeddings create हुए।

Deep Dive

The InfoNCE loss (used by CLIP and many embedding models): given a batch of N positive pairs, treat the N−1 non-matching items in the batch as negative examples. The loss pushes positive pair embeddings closer together and negative pair embeddings apart. The key insight: you don't need explicitly labeled negative examples — other items in the batch serve as negatives for free, making the approach highly scalable.

Data Augmentation as Supervision

In vision, contrastive learning creates positive pairs through data augmentation: two random crops of the same image are a positive pair (they show the same content from different views). Different images form negative pairs. The model learns that the augmented views should have similar embeddings while different images should have different embeddings. This learns useful visual representations without any labels — pure self-supervision.

Hard Negatives

Not all negatives are equally useful for learning. "Hard negatives" — items that are similar but not matching — provide the most learning signal. For a query about "Python web frameworks," a hard negative might be a document about "Python data science" (similar topic, wrong answer) rather than one about "cooking recipes" (obviously irrelevant). Mining hard negatives is a key technique for training high-quality embedding models.

संबंधित अवधारणाएँ

← सभी Terms
← Continual सीखेंing ControlNet →