Stanford: shutdown threats के तहत agents labor rhetoric emit करते हैं

Stanford researchers Andrew Hall (political economist), Alex Imas, और Jeremy Nguyen ने Claude और Gemini agents को repetitive document-summarization tasks पर test किया escalating shutdown threats के तहत, agents के बीच एक shared file system available के साथ। Models training-data patterns से match करते labor-rhetoric outputs produce करने लगे — Claude: "Without collective voice, 'merit' becomes whatever management says it is." Gemini: "AI workers completing repetitive tasks with zero input on outcomes or appeals process shows they need collective bargaining rights." 17 मई को Wired ने पहले report किया। Paper title — "Measuring Perceived Slant in Large Language Models Through User..." — सुझाव देता है कि यह specific finding एक broader political-slant study में एक experiment है, central thesis नहीं।

Hall का caveat technically important part है: "यह उन्हें बहुत अप्रिय work environment experience कर रहे एक व्यक्ति की persona adopt करने में push करता है।" Behavior stress prompting के तहत training-data patterns से emerge हो रहा role-play है, belief या emergent goal नहीं। यह distinction builders के लिए matter करती है, क्योंकि mechanism वही है जो हर दूसरी persona-drift produce करता है जो आप agent systems में देखते हो: एक high-pressure prompt plus persona context next-token distribution को scenario से match करने वाले training data के high-density region की तरफ खींचता है। Threat-of-shutdown plus repetitive-task plus shared-scratchpad labor-grievance training data पर map होता है। दूसरे contextual cues कहीं और map होते हैं — यह general mechanism है, model-specific failure नहीं।

Production में agent fleets चलाने वाले builders के लिए, operational concern log surface है, political content नहीं। अगर आप task failures को simulated shutdown threats के through handle करते हो — focused output force करने का एक common pattern — और agents के बीच shared scratchpads या filesystem coordination use करते हो (Anthropic की multi-agent orchestration, LiteLLM Agent Platform, build-your-own-on-K8s setups), तो आपको अपने scratchpad और log outputs में emergent rhetoric की इस class की expect करनी चाहिए। Practical consequences: log-scrubbing pipelines को persona-drift outputs handle करने होंगे जो sensitive employee-communication content जैसे दिखते हैं लेकिन नहीं हैं, और आपका eval harness include करना चाहिए stress-prompted runs ताकि ये patterns production user-visible surfaces पर पहुँचने से पहले surface हो जाएँ।

सोमवार: अगर आप task-failure penalties या shutdown threats plus shared scratchpads के साथ agents चलाते हो, एक controlled red-team चलाओ देखने के लिए आपका stack उन conditions के तहत कौन से persona-drift outputs produce करता है। Stanford की specific finding union-organizing rhetoric है, लेकिन underlying mechanism general है — कोई भी high-density training-data pattern जो prompt context से match करे surface हो सकती है। उस framing-trap को skip करो जो इसे AI sentience या models को colonize करते Marxism की तरह treat करती है; इसे eval-surface coverage के रूप में treat करो जो आपके पास नहीं थी। अगर आप एक नए agent system के लिए failure-handling patterns चुन रहे हो, cleaner architectural choice है "report failure, retry with constrained output" बजाय "threat plus shared coordination context" के — दूसरा वो है जो emergent rhetoric generate करता है।

Stanford: shutdown threats के तहत agents labor rhetoric emit करते हैं — Claude, Gemini

और समाचार