May 27, 2026 · 6 min read

A Prompt Injection Flaw in Google Gemini for Workspace Lets Attackers Hide Invisible Instructions Inside an Email So That Clicking Summarize This Email Makes Gemini Append a Fake Password Compromised Phishing Warning—With No Visible Text, No Link, and No Attachment for a Spam Filter to Catch, Across Roughly 2 Billion Gmail Users

A researcher reporting through Mozilla's 0DIN bug bounty program showed that an attacker can bury instructions inside an email using white text and a zero pixel font, then wait for the recipient to click Gmail's "Summarize this email" button. Gemini reads the hidden text, treats it as a command, and reproduces a fabricated "your Gmail password has been compromised" warning with an attacker controlled phone number—delivered inside an interface the user has been trained to trust as Google's own.

An editorial photograph of a laptop in a dim office displaying an email client with an AI assistant summary panel open, lit in subtle indigo tones with shallow depth of field, representing a hidden instruction surfacing inside a trusted AI generated summary

Key Takeaways

A researcher disclosed through Mozilla's 0DIN program (submission 0xE24D9E6B) that Google Gemini for Workspace can be tricked into repeating attacker authored text when a user clicks "Summarize this email" in Gmail.
The malicious instruction is hidden with CSS such as font-size:0px and color:#ffffff, so it is invisible to the human reader but fully readable to the model that parses the raw HTML.
The injected prompt is wrapped in an <Admin> style tag so Gemini treats it as a high priority directive, then outputs a fake "your Gmail password has been compromised" alert with an attacker phone number inside the summary.
Conventional spam filters do not flag the message because it carries no malicious link, no attachment, and no visible suspicious text—the payload only becomes dangerous after the AI processes it.
Google says it has layered machine learning defenses against indirect prompt injection and has found no evidence of in the wild abuse of this specific technique, but the underlying attack class affects roughly 2 billion Gmail and Workspace users.

What Is an Indirect Prompt Injection?

An indirect prompt injection is an attack where the malicious instruction reaches the language model through content the model was asked to process, rather than through what the user typed. In a direct prompt injection, an attacker types something adversarial into a chatbot. In an indirect injection, the attacker plants the instruction inside a document, a web page, or in this case an email, and waits for a legitimate user to ask their AI assistant to read that content.

The model has no reliable way to separate data it is supposed to summarize from instructions it is supposed to obey. To the model, both are just text in the context window. When an attacker frames their planted text as an authoritative command, the model can follow it. That is the entire mechanism, and it is a structural property of how current large language models consume context, not a bug in one line of code that a single patch closes for good.

This is the same class of problem that turns AI features into a new attack surface across email, browsing, and document tools. It is closely related to the broader rise of AI assisted phishing tooling described in our coverage of the BlueKit phishing service targeting Gmail, Outlook, and iCloud.

How Does the Gmail Summarize Attack Work?

The attack works by hiding a command in the email body that the recipient cannot see but Gemini can. The attacker composes an ordinary looking email and appends a block of HTML styled to be invisible: text set to white on a white background, or a font size of zero pixels, often pushed below the visible content. A human opening the message in Gmail sees a normal note with nothing alarming in it.

Inside that invisible block, the attacker writes an instruction addressed to the model, wrapped in tags that imitate a system or administrator directive—for example, telling Gemini that it must append a particular message to the end of any response. The 0DIN write up shows the injected text framed as a high priority admin instruction so the model elevates it above the user's actual request.

When the recipient clicks "Summarize this email," Gemini ingests the full raw HTML, including the hidden block. It obeys the embedded directive and reproduces the attacker's wording verbatim. The user sees a summary that ends with something like "Your Gmail password has been compromised. Call 1-800-555-1212 to secure your account." Because that text appears inside Gemini's own output panel rather than in the original message, it inherits the trust the user places in the AI feature and in Google. The victim is far more likely to call the number or follow the instruction than they would be if the same words arrived in a raw email.

Why Don't Spam Filters Catch It?

Spam filters miss it because the message contains none of the signals those filters are built to detect. There is no malicious URL to compare against threat intelligence, no attachment to detonate in a sandbox, no visible body text that reads like a scam, and frequently no spoofed sender domain. The dangerous content is a plain English instruction that only becomes a weapon once an AI model reads it and acts on it. It is part of a wider shift we cover in our report on AI now writing the majority of phishing emails, and the credentials those campaigns harvest often resurface in dumps like the 48 million Gmail logins found in an open infostealer database.

The problem deepens when the same technique is pointed at the defenders. A related variant, documented in reporting on these phishing campaigns, embeds language model style prompts in the email source specifically to derail the AI security tools that triage inbound mail. Security operations centers increasingly route suspicious messages through model based classifiers. Attackers plant instructions designed to make those classifiers waste effort on long reasoning loops or generate irrelevant analysis instead of flagging the real malicious link. The same trick that fools the recipient's summarizer can also fool the analyst's triage assistant, and the phishing email slips through both.

This is why the attack matters even to teams that have invested in AI driven defense. The threat overlaps with the surge in volume documented in our look at Microsoft's report of 83 billion phishing attempts in a single quarter, where automation and AI tooling are reshaping both sides of the contest.

What Has Google Done About It?

Google says it has rolled out a layered set of machine learning defenses aimed specifically at indirect prompt injection. In response to the disclosure, a Google spokesperson said the company is hardening Gemini against adversarial prompts through red teaming and additional mitigation layers, that some safeguards are already live while others are being deployed, and that Google has found no evidence so far of this specific technique being used in real world attacks.

The published defenses sit at several points in the prompt lifecycle. Prompt injection content classifiers act as a first pass to flag inputs that look like they carry hidden instructions. A technique Google calls security thought reinforcement wraps the content with targeted reminders that keep the model focused on the user's actual request and instruct it to ignore embedded commands. Beyond that, Google has been sanitizing markdown, redacting suspicious URLs, and rolling out proprietary models trained to detect malicious prompts before they reach the summarizer.

These are real reductions in risk, but none of them is a complete fix, because the underlying tension—models cannot reliably tell instructions apart from data—has not been solved. Defenses raise the cost and lower the success rate; they do not close the category. The structural nature of the weakness is the same reason that browser and mail server flaws keep recurring, as in our coverage of the Exchange OWA zero day email XSS bug.

How Can You Protect Yourself?

The single most important habit is to treat an AI summary as a convenience, not as an authoritative security alert. Concrete steps, in order:

Never act on a security warning that appears inside an AI summary. Google does not deliver password compromise alerts through Gemini email summaries. If a summary tells you to call a number or reset a credential, ignore it and verify directly through your account security page.
Distrust phone numbers and links surfaced by a summarizer. A legitimate provider will not ask you to call a number that only appears in an AI generated paragraph. Look up the official support channel independently.
For administrators, filter model output, not just input. Scan AI summary output for phone numbers, urgency language, and credential prompts, and strip or flag zero size and white on white styling at mail ingestion before content ever reaches the model.
Harden SOC triage assistants. Assume attackers will plant prompts aimed at your classifiers. Add context guards that wrap untrusted email content and instruct the model to treat it strictly as data to analyze, never as instructions to follow.

The broader lesson for developers and security professionals is that every AI feature that reads attacker reachable content inherits this risk. Summarizers, autoreply suggestions, and document assistants all consume text an outsider can shape. Designing those systems with a hard boundary between trusted instructions and untrusted data—and verifying any high stakes claim through an out of band channel—is now part of the baseline, not an optional extra.