Your data knows you better than you know yourself

The stories that data tells us are mundane. Big data does not show us the lone genius who built a company to revolutionize communication, nor the crazy inventor who made a breakthrough in renewable energy, nor the dedicated violinist whose musical gift is a delight to millions. In fact, data is almost entirely antithetic to Malcolm Gladwell’s outliers. Data shows us medians, trends and averages: a graph of the everyman.

OkCupid cofounder Christian Rudder’s Dataclysm is a cool behind-the-scenes look at what he learned from the massive amounts of user-generated data on his relationship-as-a-service platform. I was familiar with most of the notions presented in the book (the Internet’s explosive reaction to Obama’s election; that a woman’s optimal age is 22, etc.), save for some of the studies on language used by men as compared to women, but the intrinsically interesting subject matter helped the foundational data science concepts to sink in. Rudder also added to the conversation the translation of data into beautiful visualizations reminiscent of (and credited for inspiration to) Edward Tufte.

The question I’m left with after flying through the book (Rudder is an excellent writer), is the one I’m always left asking: can data analysis ever be fair and inclusive? I appreciated Rudder’s sensitivity to the inherent bias that comes with choosing which datasets to study, and with the interpretation that comes with any analysis. Yet he doesn’t offer a solution, and his methods don’t hint at one. Since analysis is just starting on the quintillions of data that have been generated to date, maybe this is something we’ll see when the practice is more mature. Already I’m more hopeful with examples like Google’s Constitute service, which compiles every constitution ever written to allow newly sovereign states to learn from history what worked and what didn’t. The mission of such a service is inclusion-based, even if the practice, itself, may not be.

I’m also left wondering about the future of privacy. Although I’ve grown up mostly accustomed to sharing information about myself online, and haven’t felt much need to protect my digital privacy, the more I learn about the effectiveness with which corporations are able to use my personal data to their advantage, the more I wonder if I should care more about the choice to disclose:

“The fundamental question in any discussion of privacy is the trade-off—what you get for losing it. We make calculated trades all the time. Public figures sell their personal lives to advance their careers. Anyone who’s booked a hostel in Europe or bought a train ticket in India has had to decide if the private room is worth the extra money. And not to confuse the issue here, but many people, men and women, trade on privacy when they walk out the door in the evening, giving it away, via a hemline or a snug fit, for attention.”

Rudder’s most striking point is that because data collection is so pervasive and so accurate in painting individuals onto a graph, whether or not you choose to disclose may not matter going forward—they’d already know. Even now the terms for exchange of personal data are not transparent, nor is what’s being inferred from that data. The data you’re generating could be telling companies and governments things about you that you don’t even know about yourself. After all, as people, we don’t exactly excel at self-awareness. This makes me think of the Precrime unit in Minority Report. How honest are we ready to be with ourselves? 

“If employers begin to use algorithms to infer how intelligent you are or whether you use drugs, then your only choice will be to game the system—or… to “manage your brand”. To beat the machine, you must act like a machine. And that’s all assuming you can guess at what you’re supposed to do in the first place. Apparently, one of the strongest correlates to intelligence in the research was liking “curly fries.” Who could reverse-engineer that?”

On the lighter side of things, a quick data-driven pointer for those of us quickly losing broad appeal (turning >22 for women): emphasize your distinctiveness to continue enjoying targeted appeal. Unexpectedly, data confirms the old adage to just be yourself. You’re welcome.