Zubnet AILearnWiki › LoRA
Training

LoRA

Also known as: Low-Rank Adaptation
A technique that makes fine-tuning dramatically cheaper by only training a small number of additional parameters instead of modifying the entire model. LoRA "adapters" are lightweight add-ons (often just megabytes) that modify a model's behavior without retraining its billions of parameters.

Why it matters

LoRA democratized fine-tuning. Before it, customizing a 7B model required serious GPU resources. Now you can fine-tune on a single consumer GPU in hours and share the tiny adapter file. It's why there are thousands of specialized models on HuggingFace.

Deep Dive

The core insight behind LoRA, published by Hu et al. in 2021, is that the weight updates during fine-tuning tend to be low-rank — meaning the changes can be well-approximated by the product of two much smaller matrices. Instead of updating a weight matrix W (say, 4096 x 4096 = 16 million parameters), LoRA freezes W entirely and adds two small matrices A (4096 x r) and B (r x 4096) where r (the rank) is typically 8, 16, or 64. The effective update is the product BA, which has the same shape as W but is parameterized by only 2 x 4096 x r values. At rank 16, that is about 131,000 trainable parameters instead of 16 million — a 120x reduction for that single layer. Apply this across all attention layers in the model and your total trainable parameters drop from billions to a few million, which is why a LoRA adapter file is often just 10-50 MB compared to the multi-gigabyte base model.

Tuning the Knobs

In practice, you choose which layers to apply LoRA to (typically the attention projection matrices: Q, K, V, and the output projection) and set the rank r and a scaling factor called alpha. The alpha/r ratio controls how much influence the adapter has relative to the frozen base weights. Higher rank means more expressiveness but more parameters and memory; in practice, rank 16 or 32 covers most use cases. QLoRA, introduced by Dettmers et al. in 2023, pushed the efficiency even further by combining LoRA with 4-bit quantization of the base model: the frozen weights are stored in NF4 (a 4-bit format optimized for normally distributed weights) while the LoRA adapters train in bf16. This lets you fine-tune a 70B-parameter model on a single 48GB GPU — something that would otherwise require a multi-GPU setup with hundreds of gigabytes of VRAM.

The Tool Landscape

The LoRA ecosystem has matured rapidly. HuggingFace's PEFT library is the standard implementation, and tools like Axolotl, LLaMA-Factory, and Unsloth wrap it in higher-level interfaces that handle data formatting, hyperparameter defaults, and training loops. One of the most powerful practical features is adapter composability: because LoRA adapters are additive, you can train separate adapters for different tasks and merge or swap them at inference time without reloading the base model. Some serving frameworks like LoRAX and vLLM exploit this to serve hundreds of different LoRA adapters from a single base model in memory, routing each request to the appropriate adapter. This makes it feasible to offer per-customer fine-tuned models without the cost of deploying separate model instances.

The Trade-offs

LoRA is not a free lunch, though. The low-rank constraint means it cannot learn arbitrary weight changes — if the task requires significant restructuring of the model's internal representations, full fine-tuning will outperform LoRA. In practice, this matters most for tasks that are very different from what the base model was pre-trained on, or when you are trying to teach the model substantial new factual knowledge rather than adjusting its style or format. There is also a common mistake of setting the rank too low and wondering why the adapter does not seem to learn, or setting it too high and ending up with an adapter that overfits on small datasets. The rank is a regularization knob as much as a capacity knob, and tuning it alongside the learning rate and number of training steps is essential for good results.

Related Concepts

← All Terms
← Liquid AI Luma AI →
ESC