May 17, 2026 · 8 min read

Your Four Most Visited Websites Are a Unique Fingerprint—And Cookie Blocking Doesn't Hide It

A new Scientific Reports paper analyzed 9 million page visits from 2,148 users and showed that the four most visited domains alone identify 95% of people uniquely — and 80% can be re identified across different time periods. Cookies, VPNs, and incognito do nothing to stop it.

A blurred wide angle photograph of an open laptop on a desk in a dim room, with abstract motion light trails across the screen surface

The Headline Number

In October 2025, a paper landed in Scientific Reports with a finding that should change how anyone thinks about online tracking. Five researchers — led by Marcos Oliveira at Vrije Universiteit Amsterdam, working with Yang, Griffiths, Bonnay, and Kulshrestha — analyzed one month of browsing data from 2,148 German internet users. Their dataset covered 9 million website visits across nearly 50,000 unique domains. The data was collected with informed consent through a GDPR compliant online panel.

The headline result: the four most visited domains alone are enough to uniquely identify 95% of users. Not 95% with high confidence. 95% uniquely. Drop to the top 100 domains across the entire dataset (0.2% of all the domains people visited), and 82% of users are still uniquely identifiable.

And in their words: "despite widespread privacy precautions, such as cookie blocking or virtual private network (VPN) use, these risks persist because they stem from behavior, not technology."

What the Study Actually Did

The 2,148 participants installed a tracker on their browsers that recorded the domain of every page they visited for one month. The researchers did not look at the content of those pages, the cookies they set, or any technical fingerprints (canvas, WebGL, font lists). They only looked at the sequence of domains visited.

Then they built a re identification model. The model asked: given a partial browsing profile — say, the top four domains a user visits most — can we uniquely match it to one specific user in our population of 2,148?

The answer:

Four domains is enough for 95% of users. Their top four sites are unique to them out of 2,148 people.
Average identification takes 2.45 questions. A step by step process needs an average of two and a half questions about a user's most visited sites before it can name them.
The top 100 domains identify 82% of users. Even reducing the entire 50,000 domain dataset down to just the 100 most popular sites preserves identifiability for the vast majority.
80% re identification across time. When the researchers split the month into contiguous time slices and tried to match users from one slice to the next, they correctly matched 80% of them.

Why This Matters More Than the Average Tracking Story

The browser fingerprinting research that gets the most press attention is usually about technical fingerprints: canvas hashing, WebGL probes, font enumeration, audio context quirks. Those attacks are real, and there is a meaningful arms race around them — Tor Browser, Brave, and Safari all try to flatten the differences in what a browser exposes to JavaScript.

Behavioral fingerprinting sits in a different category. It does not depend on what JavaScript can read from your browser. It depends on what an ad network already knows: which sites you visit, in what order, how often.

Once a tracker observes your browsing for any meaningful period, four points of data are sufficient to call you out from a crowd of two thousand people. The implication: any data broker holding even a tiny sample of someone's traffic can re identify that person across other datasets, even after the user has cleared cookies, switched VPN providers, or moved to a fresh browser profile.

The privacy story is structural. As Oliveira and coauthors framed it, the threat "stems from behavior, not technology." Behavior does not have a "delete cookies" button.

Why GDPR's "Pseudonymized Data" Carveout Just Got Smaller

This is the part of the paper regulators and compliance officers should be paying attention to. GDPR and CCPA both allow companies to process "pseudonymized" or "non identifiable" data with significantly looser obligations than they have for personal data. The widespread industry interpretation is that a hashed user ID plus a list of visited domains is pseudonymized — not personal — and therefore exempt from most consent and data minimization requirements.

The Scientific Reports finding undermines that interpretation directly. If 95% of users in a population can be uniquely identified from their four most visited domains, then any dataset that includes a list of visited domains is, statistically speaking, personal data. Pseudonymization by removing the user ID does not anonymize a record that contains the user's distinctive four domain pattern.

This was already the conclusion of the European Data Protection Board's evolving guidance on what counts as identifiable email tracking data, but until now there was little quantitative evidence about how easily browsing histories themselves can re identify people. Now there is.

For the ad tech industry, the implication is awkward: the entire programmatic real time bidding ecosystem trades on the assumption that user level browsing histories shared between participants are non identifiable. The data published in Customs and Border Protection's purchases of RTB ad data to track phones already showed that the data is identifiable in practice. The Oliveira paper now shows it is identifiable in principle, at high accuracy, from a tiny number of domain observations.

What This Means for Your Day to Day Privacy

If you are not a researcher or compliance officer, the immediate question is: what should I actually do with this finding?

The honest answer is that the conventional advice does less than people think:

Deleting cookies does not help. The fingerprint is which sites you visit, not which sites set tracking IDs.
Using a VPN does not help. A VPN hides your IP but not your behavior. The trackers still see which sites you load.
Switching to incognito mode does not help. Incognito clears local state but does not change the sites you visit.
Even switching browsers does not help much. Your four most visited domains are probably the same in Chrome, Firefox, and Safari.

What actually shrinks your behavioral fingerprint:

Use anti tracking browsers that block third party requests at the network layer. Brave, Firefox with Strict mode, and Safari with Intelligent Tracking Prevention all block the trackers from observing your traffic in the first place. If the tracker never sees you visit Site A and Site B, it cannot fingerprint you on the combination.
Run a content blocker that disrupts cross site identifiers. uBlock Origin in default mode blocks the majority of third party tracking pixels and beacons, including the email tracking pixels we cover throughout this site. Less data observed equals less fingerprint material.
Use the same browser profile across all your devices, or separate them entirely. Mixing your work and personal traffic in the same browser profile makes the combined fingerprint more distinctive, not less.
Push for legislative change. This is the part the paper's authors are quietly arguing. Behavioral fingerprinting is fundamentally a regulatory problem, not a technical one. Until pseudonymized browsing data is reclassified as personal data with full GDPR or CCPA obligations, the industry has no incentive to stop building re identification systems on top of it.

What This Means for the Tracking Industry

For the companies building behavioral profiles — data brokers, ad networks, "anonymous" analytics vendors — the paper is a roadmap. If you have any sample of a user's traffic, you can identify them in any other dataset that has a sample of the same user's traffic. The two samples do not need to overlap on cookies, IDs, IP addresses, or device fingerprints. They just need to overlap on enough domain visits.

This is not theoretical. Commercial data brokers already merge browsing logs from ad network telemetry, ISP packet captures, public Wi Fi sessions, and stolen browser histories. The Oliveira methodology is exactly the technique a sufficiently motivated broker would use to stitch those datasets together at user level granularity. Once linked, the merged profile contains far more identifiable information than either source alone.

The defensive response from the tracking industry has so far been semantic: rebranding the same techniques as "first party data," "contextual signals," or "interest cohorts." None of those reframings change the underlying math. If the data contains enough of a user's traffic to be useful for advertising, it contains enough to identify them.

The Bigger Picture

Browser fingerprinting research has spent fifteen years cataloguing the ways a browser can give away its user. The Oliveira paper is a useful reminder that the most powerful identifier is not anything the browser leaks. It is the user themselves: the small handful of websites they keep coming back to.

For privacy researchers, the finding extends a line of behavioral identification research that goes back to the 2010s — Sweeney on quasi identifiers in medical records, de Montjoye on credit card transactions, Cecaj and others on mobility traces. Each of these papers reaches the same conclusion in a different domain: even very sparse behavioral data identifies people with high accuracy.

For everyone else, the takeaway is simpler. The four sites you visit most often are a name tag. Anyone holding a copy of your traffic can read it. The only meaningful defense is preventing them from getting a copy in the first place.