There seems to be some kind of misunderstanding about the Google Search/Web History disable switch that google provides to its more privacy conscious users. It’s not exactly the most advertised feature (you won’t find it in your profile page) to begin with, but once you do find it (it’s on the history page, you have to click the little ‘gear’ on the top right and then click the button to switch it off) there is no guarantee whatsoever that google does anything except for changing what they display to you. So if you are under the impression that this changes something about what data google collects on you or how they will use that data then you are likely wrong.
Like most websites, our servers automatically record the page requests made when you visit our sites. These “server logs” typically include your web request, Internet Protocol address, browser type, browser language, the date and time of your request and one or more cookies that may uniquely identify your browser.
Here is an example of a typical log entry where the search is for “cars”, followed by a breakdown of its parts:
- 184.108.40.206 - 25/Mar/2003 10:15:32 -
- http://www.google.com/search?q=cars -
Firefox 1.0.7; Windows NT 5.1 - 740674ce2123e969
220.127.116.11 is the Internet Protocol address assigned to the user by the user’s ISP; depending on the user’s service, a different address may be assigned to the user by their service provider each time they connect to the Internet;
25/Mar/2003 10:15:32 is the date and time of the query;
http://www.google.com/search?q=cars is the requested URL, including the search query;
Firefox 1.0.7; Windows NT 5.1 is the browser and operating system being used; and
740674ce2123a969 is the unique cookie ID assigned to this particular computer the first time it visited Google. (Cookies can be deleted by users. If the user has deleted the cookie from the computer since the last time s/he visited Google, then it will be the unique cookie ID assigned to the user the next time s/he visits Google from that particular computer).
So the only thing that fancy switch does is to limit what you see, but it definitely does not limit google’s ability (or desire) to collect data about you, and you can rest assured that that is exactly what they’ll be doing.
So all you’re being given here is a false sense of privacy.
And of course, this is just taking an example from the ‘search’ facility that google offers, you should be aware that google has so many points of contact that keeping your browsing habits hidden from them has become all but impossible. Here are a few examples of how your web activity ends up associated with your profile on google’s servers:
analytics: This page caused you to be served a google analytics tag, so google knows you visited here. (Come to think of it, google analytics is not worth more to me than your privacy, I should find an alternative and disable GA on my web properties).
fonts: That fancy font that is being used on that pretty page? That’s quite possibly served from a google server.
google+: On many web pages (again, including this one) you’ll find a google+ button embedded loaded from a google server.
youtube embeds: whenever a page contains a youtube video you’ve just told google you visited that page
adsense ads: every page that serves up an adsense ad tells google about your visit to that page. And if you click on an ad it tells google about your personal preference.
google app engine: A service where google hosts 3rd party web applications, in other words, sites that do not look like google has anything to do with them at all running entirely on google infrastructure, which means every request in and every answer out of those services passes through google.
google dns: google serves up a large number of DNS requests, instances where a certain IP is requesting what address to use to reach some service somewhere else on the internet.
Over the last two months is visited 3490 web pages according to my browsers history, across 615 domains. Of those 43 were google properties. 56 of the 572 remaining contained google+ buttons, 6 contained google platform.js, 94 contained a font served by google, 324 served up google analytics tags, 109 embedded youtube video, 13 contained adsense tags, 75 contained doubleclick tags, 22 used googletagmanager.com, 43 used googletagservices, 47 contained content from googlesyndication.com, 42 contained content hosted on googlecode.com and I’ve probably missed a couple (such as google translate, which of course knows what you translate). All in all of those 572 domains 425 served up some google content, so about 74%, not counting the 43 google properties that I excluded, if we put those back in again then it is 76%. Come to think of it, those urls that google ‘samples’ (you know, when they don’t link directly to the result but through a redirector) are of course also known but I have no idea how many of those 3490 pages I reached through google search.
And that’s just the web, we’re not even looking at mobile, where the android phones are pretty much an extension of google’s infrastructure right in your pocket, with hardware capable of telling google about every spot you visit in real life. So consider google to be riding right beside you and reading over your shoulder all the time, no matter what you’ve done to that history disable button. You’d almost think that all those free and invisible services google provides have one goal: to get you to load something from google on every page that you visit.
As an aside, if I were in law enforcement I would be paying special attention to those searches done by people who have the ‘history’ feature disabled.
edit: thanks to Thomas Bachem for reminding me of DoubleClick and the google DNS servers