Back to all posts

Your AI assistant can be turned against you with a hidden line in an email

Ian Zein
Your AI assistant can be turned against you with a hidden line in an email

Deterministic safety gates

Researchers just demonstrated this against Microsoft Copilot.

Read the GBHackers article β†’

The attack: hide instructions inside an email using basic CSS tricks. The human reader sees a normal message. Copilot reads the hidden instructions and follows them.

The result: when someone clicks "Summarize," Copilot generates a fake security alert with a phishing link. It looks like a system notification because it comes from the trusted AI assistant, not from the email. User skepticism drops.

It gets worse. Because Copilot has access to Teams conversations, OneDrive files, and SharePoint documents, the hidden prompt can instruct it to retrieve sensitive internal data and embed it in an outbound link. One click and your internal context is sent to the attacker.

This is exactly the social engineering pattern that OpenAI described in their security blog last week. An attacker needs two things: a way to influence the model (the email) and a dangerous capability to exploit (access to your data). Without a deterministic layer in between, the model decides what to do. And the model can be tricked.

Microsoft's recommended mitigations: educate employees, monitor patterns, apply filtering rules. All reactive. All human-dependent.

This is why we built Aimable around deterministic safety gates. Every connector between the model and your data is a gate that doesn't reason, doesn't interpret context, and doesn't get socially engineered. PII gets stripped before it enters the model. Actions get checked against a scoped, time-limited mandate. Data access is logged and constrained to what's authorized.

The model can be as smart or as gullible as it wants. The safety gate doesn't care.