Contrastive 學習ing: Definition & Meaning — AI Wiki

一種自監督學習方法,透過對比正對(應該在 embedding 空間裡近的相似項)和負對(應該遠的不相似項)來訓練模型。CLIP 把匹配的圖文對與不匹配的對比。SimCLR 把同一影像的增強視圖與不同影像的視圖對比。模型學到 embedding 空間裡的相似性反映現實世界相似性的表示。

為什麼重要

對比學習是大多數 embedding 模型的訓練方法 — 驅動語意搜尋、RAG、推薦的模型。它也是 CLIP 背後的訓練方法,CLIP 連接語言和視覺。每次你用 embedding 衡量相似性,對比學習很可能就是那些 embedding 被創造出來的方式。

Deep Dive

The InfoNCE loss (used by CLIP and many embedding models): given a batch of N positive pairs, treat the N−1 non-matching items in the batch as negative examples. The loss pushes positive pair embeddings closer together and negative pair embeddings apart. The key insight: you don't need explicitly labeled negative examples — other items in the batch serve as negatives for free, making the approach highly scalable.

Data Augmentation as Supervision

In vision, contrastive learning creates positive pairs through data augmentation: two random crops of the same image are a positive pair (they show the same content from different views). Different images form negative pairs. The model learns that the augmented views should have similar embeddings while different images should have different embeddings. This learns useful visual representations without any labels — pure self-supervision.

Hard Negatives

Not all negatives are equally useful for learning. "Hard negatives" — items that are similar but not matching — provide the most learning signal. For a query about "Python web frameworks," a hard negative might be a document about "Python data science" (similar topic, wrong answer) rather than one about "cooking recipes" (obviously irrelevant). Mining hard negatives is a key technique for training high-quality embedding models.

Contrastive 學習ing

為什麼重要

Deep Dive

Data Augmentation as Supervision

Hard Negatives

相關概念