Study: Your anonymous web browsing isn’t as anonymous as you think
A new paper from Princeton and Stanford researchers says their model works up to 70 percent of the time, using only anonymous browsing histories.
Data privacy advocates — and marketers concerned about ensuring user privacy — may have a new worry.
It’s possible to determine a user’s real identity — up to 70 percent of the time — simply from an anonymous browsing history.
That’s the key finding in a recently released paper from researchers at Princeton and Stanford universities. The paper, “De-anonymizing Web Browsing Data with Social Networks,” is scheduled for presentation in April at the World Wide Web Conference in Perth, Australia.
One of the paper’s authors, Assistant Professor of Computer Science at Princeton Arvind Narayanan, said in a statement that the new research “shows that anyone with access to browsing histories — a great number of companies and organizations — can identify many users by analyzing public information from social media accounts.”
In the paper, the researchers describe how they could deduce a user’s profile on Twitter through a model of web browsing behavior that utilized only a user’s anonymous browsing to 30 link destinations. And the computer processing can complete the task in less than a minute.
In a test of the model, almost 400 people donated their web browsing histories, and the researchers were able to identify over 70 percent of them. The research shows that technique works for a variety of other social media accounts, including Facebook and Reddit.
Any social media service can be utilized, as long as the content is public, there is a substantial number of visits to links posted in the user’s social media feed, and the users followed by the person in question are known or can be inferred.
The model draws on the fact that people often click on links posted by users they follow, so the person in question can be identified by finding a social feed with a similar history of links. “A link appearing in a user’s feed increases its probability of appearing in their browsing history,” the paper notes. Twitter was chosen in part because most of its activity is public.
Additionally, they said, online trackers of user browsing behavior commonly capture enough anonymous data to achieve similar results. The researchers suggest that even if the full URL is hidden, the user can still be identified as long as the domain is visible. In that case, a greater number of visits — that is, a larger browsing history — is required.
In the paper, the researchers note:
“Privacy advocates have argued that such [browsing] data can be de-anonymized, but we lack conclusive evidence [until now]. It has remained unclear what type of identified auxiliary information could be used in a deanonymization attack, whether an attack could work at the scale of millions of users, and what the success rate of such an attack would be.”
One immediate consequence of the study could be that the Federal Communications Commission’s recent privacy rules already need modification. Those rules, adopted in October, say that internet service providers can only use or store customer information when it is not linkable to personal identities. Now, apparently, anonymous browsing history is linkable.