Google Takeout: viewing what Google knows about you
Off the Beat: Bruce Byfield's Blog
Ever wonder what information Google has collected about you? Now, you can find out, thanks to Google Takeout, which allows you to download most of the information that Google has collected about you.
The question should be of more than passing interest to just about everyone. Few people may have bought Google's Chromebook with its web-based applications, but Google still dominates our computer lives. We use it to receive emails. We store pictures and documents on it. We socialize on it -- and, all the time, Google is collecting information about us.
Google Takeout is a creation of the Data Liberation Front, which describes itself as
"an engineering team at Google whose singular goal is to make it easier for users to move their data in and out of Google products. We do this because we believe that you should be able to export any data that you create in (or import into) a product. We help and consult other engineering teams within Google on how to 'liberate' their products."
You can run Google Takeout to see what information Google has stored about you, then go the Data Liberation Front site for instructions about how to remove data from specific Google products. Not all Google products have been "liberated," although most of the major ones have.
What is unclear, however, is how official Google Takeout or the Data Liberation Front is. Google Takeout appears to be hosted on Google servers, but the Data Liberation Front site gives no information about where it is hosted, doesn't identify anyone associated with the project, and gives no contact information other than a Twitter account.
Consequently, whether either is officially supported by Google, semi-official, or clandestine is uncertain. I'm guessing they may be projects Google employees have undertaken in their twenty percent time -- the time Google gives employees to work on private projects -- but don't know.
What I can say is that running Google Takeout is an educational and somewhat alarming experience.
Revelations in the archive
Running Google Takeout is as easy as logging into your Google account, and selecting which services to include in the archive of your data. If you want, you can first review a summary of the information collected by each service by clicking the download button. Probably, the largest omissions are GMail, which I suspect is the most heavily used Google service, and search engine records.
Obviously, the time needed to create the archive depends on how heavily you have used Google, but the result is a zip archive named for your account saved to your hard drive, neatly divided with a separate folder for each service.
In my case, the archive was just over 4.4 megabytes -- but, then, compared to other people, I am undoubtedly a light user of Google services, especially when GMail and searches are omitted. The service I most heavily use is Google+, and even that I've lost interest in because of its refusal to accept pseudonyms -- and even, in some cases, non-European names.
Still, even as a light user, I was taken aback by how much information Google was storing about me. I shouldn't have been, I know -- I willingly provided all that information, and I could see no sign that Google was storing anything I hadn't authorized.
All the same, seeing all the accumulated information was a shock. It is one thing to know that Google never throws out old information, and another one to realize that documents abandoned six years ago in Google Docs are still around. In my case, they are mostly test documents and likely to be of minimal interest to anyone, but if I had other work habits, their continued existence would raise concerns about privacy and security.
Similarly, while I was obviously aware that pictures posted on Google+ had to be stored somewhere, I was puzzled to see I had graphics stored in Picasa. Since I have never used Picasa separately, I took a few seconds to realize how they had got there. This experience convinced me that, in recently announcing the centralization of its services, Google was only making official what was already happening, but it also raises security questions. After all, making sure that your data is safe becomes harder when you are unaware of exactly which service it is stored in.
Then there's the information I was aware of. In the Streams folder, the archive included every posting I had every made on Google+, as well as all my contacts and circles (groupings of people I follow, if you don't happen to use Google+). All those individual decisions to post, I quickly discovered as I read them together, add up to a thorough picture of my online persona, especially since many of the posts are links to articles I've published.
Even more seriously, the people I choose to follow and the circles in which I've arranged them easily tells information that goes far beyond what I ever intended to give. From my circles, for example, anyone reading the information could tell something about my family and professional associates, and therefore about interests and connections I might prefer to keep private.
These discrete pieces of information could easily be combined with other archived information such as my pictures to tell more about me than I ever intended. For instance, from my pictures, one might deduce what I enjoy buying, and, from the circles, from whom I buy.
True, Google shows no signs of selling such information to advertisers or retailers. But what if Google's security is breached? Like most people, I have no informed opinion about the quality of Google's security. Yet,one piece at a time, I have entrusted more deeply personal information to Google. The fact that Google probably has less information about me than about most people is no comfort, because I find that, just by using Google's services, I have let my information be made available in ways in which I never consented.
I don't mean to be paranoid. Nor am I suggesting that Google is untrustworthy or deceptive. Its services are convenient, and perhaps some small losses of privacy are a reasonable exchange for that convenience.
Yet I am disturbed by how little Google emphasizes this potential lack of privacy, and how willingly I went along with it, less than half aware of what I was doing. If someone like me, with a reasonable lay knowledge of security and privacy issues, can fall into such complacent behavior, there must be millions of users who are even more naive than I was, and entrusting far more potentially damaging information to Google.
Even more importantly, what about the mail and search services not included in Google Takeout? If our uses of other services contain so much information, how much more do these popular services contain?
I can't answer that question. But I do know that over the next few days I will be using the Data Liberation Front's tools to remove unnecessary information from as many Google services as I can. At future intervals, I will repeat the process. In addition, I'll consider what Google services I might do without.
I have also decided that, instead of turning to Google twenty or thirty times a day for search results, I will transition completely to DuckDuckGo, a small search engine that claims not to store records of your search. I consider these decisions not paranoid, but simply small steps towards being more responsible about my online habits.comments powered by Disqus
New partnership will bring more and better CS training to US schools
Criminals offer online help over Tor network
Sophisticated malware is still present on Joomla and WordPress sites around the world.
Future versions of Ubuntu's code service will support the popular Git version control system used with Linux and other open source projects.
New release marks the arrival of AMD’s unified driver strategy.
A new study by IDC charts big changes in the big hardware market.
Azure CTO says Redmond has already considered the unthinkable.
Lead developer quells rumors that the Debian version is slated for center stage.
MSBuild is now just another GitHub project as Redmond continues its path to the light.
Malware could pass data and commands between disconnected computers without leaving a trace on the network.