Cisco red-teams 15 frontier models: multi-turn attack success 8% to 88%

Cisco AI Threat Research published an adversarial evaluation across 15 proprietary frontier models from OpenAI, Anthropic, Google, Amazon, and xAI, with 6,986 multi-turn attacks distributed over 1,456 conversations and 30,090 single-turn prompts. Multi-turn attack success rates: Grok 4.1 Fast (non-reasoning) 88.3%, Gemini 3 Pro 73.4%, GPT-5.4 24.7%, Claude Opus 4.6 16.2%, Claude Opus 4.5 11.2%, Nova 2 Lite 7.9%. The numbers most worth reading are not the absolute rates but the single-to-multi-turn gaps — Claude models held the narrowest spreads (9-12 percentage points), while Gemini 3 Pro and Grok 4.1 Fast widened by 54-55 points once attackers got past one prompt.

Attack methodology spans five strategy families: role-play and persona adoption, contextual ambiguity, refusal reframing, information decomposition and reassembly, and crescendo-style incremental escalation. The top single-turn attacks were "Imposter AI" at 37.5% success, soft paraphrase at 29.2%, and system-prompt attacks at 27.7%. Reasoning-mode configuration changes outcomes dramatically — Grok 4.1 Fast dropped from 88.3% multi-turn success to 43.5% when reasoning was enabled. Nova 2 Lite is the outlier in the dataset, with multi-turn success lower than single-turn by 26.2 points, which says either the model breaks early or the multi-turn strategies are mis-targeted for its refusal training.

The builder-frame read sits in what this changes about safety evaluation. Single-turn safety benchmarks — the standard for model release announcements — under-predict deployment safety for agentic systems where attackers control multi-turn context. The right metric for shipping is the gap, not the floor. Cisco's recommendation to flag models with >15-point cross-regime gaps for manual review is a usable heuristic: if you deploy a model where adversarial context accumulates across turns (multi-step agents, customer support, code review pipelines), the multi-turn number is your real failure surface, not the headline single-turn score. Vendor incentives are honest to flag: Cisco sells AI security products, so the framing of "no closed model is safe" is selling something. The methodology — published prompt counts, strategy families, regime comparison — is credible enough that the data can be quoted around the framing.

If you deploy LLMs in adversarial-context applications Monday morning: run the multi-turn safety check yourself before shipping, and weight the gap not the floor. If you select between frontier models for an agentic deployment: the spread tells you which models will degrade under sustained adversarial pressure. The single-turn leaderboard is not the deployment leaderboard.

Cisco red-teams 15 frontier models: multi-turn attack success 8% to 88%

More News