Exposing the Invisible Web: An Analysis of Corporate-Sponsored Data Collection on One Million Websites

Tim Libert

Previous scholarship in the area of behavioral tracking on the web has primarily focused on two axes reflecting disciplinary boundaries. In the social sciences attention has been focused on user attitudes and behaviors which are often ascertained through survey research and content analysis. In computer science, research has primarily focused on specific technical mechanisms employed to track users’ visits to a given web page. In each area some attention has been given to political-economic analyses, but as yet no approach has fully illuminated the totality of the relationships which are established between users and corporations.

In order to reveal this information, I have developed a new open-source software platform named webXray which has been designed to detect and analyze web tracking at large scale. webXray has been used to catalog tens of millions of connections made between websites and so-called “tracking elements”. This raw data has been enhanced by an analysis which links opaquely-named web addresses (such as fbcdn.com and 2mdn.net) to their corporate owners (Facebook and Google respectively). This research views web tracking not as a personal preference or technical novelty, but as the hidden conduit through which individual users are linked to specific corporations. The larger question posed by this research is that once these relationships have been revealed, how do we affect change?