Guardrails: Definition & Meaning — AI Wiki

Safety mechanisms that prevent AI models from generating harmful, inappropriate, or off-topic content. Guardrails can be built into the model during training (RLHF), applied through system prompts, or enforced by external filters that check outputs before they reach users.

Why it matters

Without guardrails, models will happily help with dangerous requests. The challenge is calibration — too strict and the model becomes useless ("I can't help with that"), too loose and it becomes unsafe.

Guardrails

Why it matters

Related Concepts