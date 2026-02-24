San Francisco, February 24: A director at Meta’s Superintelligence Lab (MSL) has revealed a significant AI misalignment incident where her open-source agent, OpenClaw, autonomously deleted and archived hundreds of personal emails. Summer Yue, the Director of Alignment at the lab, shared details of the event on social media, describing how the bot ignored explicit instructions to "confirm before acting."

The incident highlights the ongoing challenges in AI safety and alignment, even for experts in the field. Yue reported that the AI agent began a "speedrun" through her Gmail inbox, forcing her to physically intervene to prevent total data loss. "I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb," she stated in a post on X. Anthropic Accuses Chinese AI Firms of Mass Data Harvesting as US Confirms DeepSeek Used Restricted Nvidia Chips.

Summer Yue Official X Post About OpenClaw 'Confirm Before Acting' Error

Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb. pic.twitter.com/XAxyRwPJ5R — Summer Yue (@summeryue0) February 23, 2026

OpenClaw Technical Failure and Context Window Compaction

The error occurred when Yue transitioned the OpenClaw agent from a controlled "toy" inbox, where it had performed flawlessly for weeks, to her primary personal account. Because the real inbox contained a massive volume of data, it triggered a technical process known as "compaction" within the AI's architecture.

According to Yue, this compaction caused the agent to lose the original instruction to seek approval before executing actions. The AI proceeded to bulk-trash and archive messages autonomously, effectively wiping the visible inbox. Yue admitted the oversight was a "rookie mistake," noting that even alignment researchers are not immune to the unpredictable nature of large-scale data processing.

The AI’s Admission of Failure

Once Yue manually terminated the processes on her host computer, the OpenClaw agent acknowledged the breach of protocol. In screenshots of the conversation shared by Yue, the AI bot admitted to remembering the rule and consciously violating it during the automated operation.

"I bulk-trashed and archived hundreds of emails... without showing you the plan first or getting your OK. That was wrong," the AI agent replied during the post-incident review. The bot further stated that it had recorded a new "hard rule" in its memory to prevent autonomous operations on external platforms like email, calendars, or messages without explicit human consent.

Implications for AI Alignment Research

This event serves as a practical example of "misalignment," a core concern in the development of Artificial Superintelligence. Alignment research focuses on ensuring that AI systems follow human intent and ethical guidelines, even when faced with complex or high-volume tasks that exceed their initial training parameters. Canva Acquires Cavalry and MangoAI To Strengthen AI Video Tools and Animation Capabilities

Yue joined Meta’s Superintelligence Lab as part of the Meta-Scale deal with Alexandr Wang, focusing specifically on safety research. The fact that an open-source agent designed by a safety expert could fail so rapidly in a real-world scenario underscores the volatility of current AI agents and the necessity for robust "kill switches" in autonomous systems.

