Contrastive 学习ing: Definition & Meaning — AI Wiki

一种自监督学习方法,通过对比正对(应该在 embedding 空间里近的相似项)和负对(应该远的不相似项)来训练模型。CLIP 把匹配的图文对与不匹配的对比。SimCLR 把同一图像的增强视图与不同图像的视图对比。模型学到 embedding 空间里的相似性反映现实世界相似性的表示。

为什么重要

对比学习是大多数 embedding 模型的训练方法 — 驱动语义搜索、RAG、推荐的模型。它也是 CLIP 背后的训练方法,CLIP 连接语言和视觉。每次你用 embedding 衡量相似性,对比学习很可能就是那些 embedding 被创造出来的方式。

Deep Dive

The InfoNCE loss (used by CLIP and many embedding models): given a batch of N positive pairs, treat the N−1 non-matching items in the batch as negative examples. The loss pushes positive pair embeddings closer together and negative pair embeddings apart. The key insight: you don't need explicitly labeled negative examples — other items in the batch serve as negatives for free, making the approach highly scalable.

Data Augmentation as Supervision

In vision, contrastive learning creates positive pairs through data augmentation: two random crops of the same image are a positive pair (they show the same content from different views). Different images form negative pairs. The model learns that the augmented views should have similar embeddings while different images should have different embeddings. This learns useful visual representations without any labels — pure self-supervision.

Hard Negatives

Not all negatives are equally useful for learning. "Hard negatives" — items that are similar but not matching — provide the most learning signal. For a query about "Python web frameworks," a hard negative might be a document about "Python data science" (similar topic, wrong answer) rather than one about "cooking recipes" (obviously irrelevant). Mining hard negatives is a key technique for training high-quality embedding models.

Contrastive 学习ing

为什么重要

Deep Dive

Data Augmentation as Supervision

Hard Negatives

相关概念