Embedding Layer: Definition & Meaning — AI Wiki

Une table de lookup qui mappe chaque token du vocabulaire à un vecteur dense (l'embedding du token). Quand le modèle reçoit le token ID 42, l'embedding layer retourne la ligne 42 d'une matrice apprise. Ce vecteur est la représentation initiale du modèle de ce token — le point de départ pour tout le traitement subséquent à travers les couches d'attention et feedforward.

Pourquoi c'est important

L'embedding layer est là où le texte devient des maths. Chaque LLM commence par convertir des tokens discrets (mots, sous-mots) en vecteurs continus que le réseau de neurones peut traiter. La table d'embedding est aussi un des plus gros composants des petits modèles — un vocabulaire de 128K avec des embeddings de dimension 4096 fait 512 millions de paramètres. Comprendre ça t'aide à raisonner sur les tailles de modèles et le design de vocabulaire.

Deep Dive

The embedding layer is just a matrix E of shape (vocab_size, model_dim). For token ID i, the embedding is E[i] — a simple row lookup, no computation. But these embeddings are learned during training: tokens that appear in similar contexts get similar embeddings. The classic example: the embeddings for "king" − "man" + "woman" ≈ "queen," showing that the embedding space captures semantic relationships.

Tied Embeddings

Many models share (tie) the embedding matrix with the output layer (the "unembedding" or "language model head"). The output layer converts hidden states back into vocabulary probabilities by computing a dot product with each token's embedding. Tying these layers means the same embedding both represents a token on input and predicts it on output, saving parameters and often improving quality. Most modern LLMs use tied embeddings.

Positional + Token Embeddings

The full input representation is typically: token_embedding + positional_encoding. The token embedding captures what the token means. The positional encoding captures where it appears in the sequence. In models with learned position embeddings (BERT), this is a second embedding table indexed by position. In models with RoPE (LLaMA), positional information is injected differently (by rotating Q and K vectors), and the embedding layer only handles token identity.

Embedding Layer

Pourquoi c'est important

Deep Dive

Tied Embeddings

Positional + Token Embeddings

Concepts liés