The InfoNCE loss (used by CLIP and many embedding models): given a batch of N positive pairs, treat the N−1 non-matching items in the batch as negative examples. The loss pushes positive pair embeddings closer together and negative pair embeddings apart. The key insight: you don't need explicitly labeled negative examples — other items in the batch serve as negatives for free, making the approach highly scalable.
In vision, contrastive learning creates positive pairs through data augmentation: two random crops of the same image are a positive pair (they show the same content from different views). Different images form negative pairs. The model learns that the augmented views should have similar embeddings while different images should have different embeddings. This learns useful visual representations without any labels — pure self-supervision.
Not all negatives are equally useful for learning. "Hard negatives" — items that are similar but not matching — provide the most learning signal. For a query about "Python web frameworks," a hard negative might be a document about "Python data science" (similar topic, wrong answer) rather than one about "cooking recipes" (obviously irrelevant). Mining hard negatives is a key technique for training high-quality embedding models.