The Quest for Data

I'm a data guy. I spend many many hours at my day job staring at tables, crunching numbers, and turning piles of data into something comprehensible. Yet a day doesn't go by when I don't at some point mutter, "man, I sure wish we had better data on this question".

(from ckhamken on Flickr)

We (society) are currently at a crossroads when it comes to how we  feel about data. On the one hand, there's a genuine fear that corporations and government are harvesting every single data point about us that they possibly can and that they're doing something malicious with it. Facebook is selling our data to advertisers! Target is using our data to send diaper coupons to pregnant ladies! The government knows every step we take every day of the week!

If data this comprehensive exists, I'm yet to see it. Instead, I'm often faced with the opposite dilemma: a situation that's not especially well studied that could be better understood by data. It isn't either because no such data exists, because it's being held by someone powerful behind bulletproof glass, or because it's simply in a format that's not analyzable (try sorting thousands of records of physical paper and you'll know what I mean).

Last week I downloaded Untapped on my iPhone. Its premise is simple - keep track of the beers you drink, when you drank them, where, how you liked them, etc. On the surface, the utility of this app seems limited, and appears to appeal to people in the "share happy" world we now  live in. Unless of course you're using it to harvest data about yourself that could later yield some very interesting results.

When I was first trying to figure out the excitement about the Foursquare app, someone left a comment on this blog explaining that he uses the app essentially as a ledger to document his life. It's a running tab of every place he goes during every day. I hadn't initially thought about it in that way. To me, it was some kind of goofy game where you score points and compete against other people; not a data collection tool on your movement from place to place.

To me, both Untapped and Foursquare show that people actually want to harvest data on themselves. Why? Because it can lead to interesting insights. What was the most frequent beer I drank last year? What were the most frequent places I visited? Did my enjoyment of those beer differ by different days of the week? Did I see any evidence of my tastes changing with time? If I had to recall from memory, I might have some good guesses, but I might significantly over or under count certain things due to a cognitive bias. Keeping records is the only way to know for sure.

I've come across data warehouses online that make you check a box when you sign up for an account. This box says "I agree to use this website always for GOOD, never for EVIL". I think it perfectly sums up the struggle over data. Data can be a powerful tool in the quest for knowledge and understanding. But if it falls into the wrong hands, it can be used for potentially malicious purposes. It's the fear of the evil that often keeps valuable data from being used by the forces of good.