Fastino Labs released GLiGuard on Wednesday — a 300M-parameter open-source safety moderation model, Apache 2.0 licensed on Hugging Face, built explicitly to fix the latency tax that decoder-based guardrails impose on production LLM systems. The architecture choice is the load-bearing decision: instead of the decoder-only design used by LlamaGuard4 (12B), WildGuard (7B), ShieldGemma (27B), and NemoGuard (8B) — all of which generate safety verdicts autoregressively, one token at a time — GLiGuard is an encoder model that reframes safety moderation as a multi-label classification problem. It encodes the input text alongside the task labels in a single forward pass, scoring every candidate label simultaneously. Four safety tasks are evaluated concurrently: prompt/response safety classification, jailbreak strategy detection across 11 strategies (including prompt injection, roleplay bypass, instruction override, social engineering), harm category detection across 14 types (violence, sexual content, hate, PII exposure, misinformation, child safety, copyright), and refusal detection (compliance vs refusal, tracked separately to measure over-refusal).
The benchmark numbers tell a clean story. On nine standard safety benchmarks using macro-averaged F1: GLiGuard scores 87.7 on prompt classification — 1.7 points behind the best model (PolyGuard-Qwen at 89.4) — and 82.7 on response classification, second only to Qwen3Guard-8B at 84.1. It outperforms LlamaGuard4-12B, ShieldGemma-27B, and NemoGuard-8B despite being 23 to 90× smaller. On throughput and latency, benchmarked on a single NVIDIA A100: GLiGuard achieves up to 16.2× higher throughput (133 vs 8.2 samples/s at batch size 4) and 16.6× lower latency (26 ms vs 426 ms at sequence length 64). For production builders, the 26ms-vs-426ms gap is the part that materially changes deployment economics — a guardrail that runs on every user turn and every model response can't afford to sit between the user and the model adding hundreds of milliseconds. The architecture was trained as full fine-tuning of GLiNER2-base-v1, Fastino's own multi-task classification base, for 20 epochs with AdamW. Training data is a mix of WildGuardTrain (87K human-annotated examples for safety/refusal) and GPT-4.1-generated labels for harm-category and jailbreak-strategy classification, supplemented with synthetic edge cases for fine-grained category distinctions.
The ecosystem read here is that "small encoder for classification, large decoder for generation" is a structural pattern that's been hiding in plain sight. Safety moderation is fundamentally a classification problem — does this prompt match a jailbreak strategy, does this response contain harm — and decoder models won the early guardrail market because they were flexible. But the flexibility costs you throughput at exactly the surface where you can least afford it: between the user and the model, on every request. GLiGuard's 16× throughput advantage is the empirical demonstration that the field has been over-paying for moderation by using the wrong architecture. Builders running production LLM systems should look at this seriously — the savings compound. A guardrail that takes 426ms on a 7B-class model is hard to deploy at scale; a 300M encoder at 26ms slots into the latency budget alongside model inference itself.
For builders: clone the GLiGuard weights from Hugging Face and benchmark it against your current guardrail on your actual traffic mix before deploying. Three honest caveats to apply: (1) GLiGuard is 1.7 F1 behind the best prompt classifier and 1.4 F1 behind the best response classifier — if your application is high-stakes enough that small accuracy gaps matter (regulated medical advice, child safety, legal compliance), the latency win may not justify the accuracy loss; (2) encoder models are less flexible than decoder models for adapting to new safety policies — when your harm taxonomy changes you have to retrain rather than rewrite a prompt; (3) the four-tasks-in-one-pass design is elegant but means a single training run encodes your safety taxonomy — adding categories requires retraining. The encoder-classification pattern itself is generalizable; expect to see similar models for content moderation, intent classification, and routing show up over the next year. Pioneer hosts the inference path that benchmarks were run on if you want to test before pulling the weights yourself.
