"Privacy issues are an ongoing roar and generally do not receive the nuanced coverage that might inform consumers and the general public to make smarter decisions about which services they use and how."
—
In pursuit of the latest intel on the privacy issues surrounding the Big Data revolution, I contacted Alexander Howard, the Government 2.0 Correspondent for publisher O’Reilly Media. Many thanks to Kaitlin Thaney of Digital Science for making the introduction.
O’Reilly recently published their guide to Privacy and Big Data (Sept 2011), and Alexander was the moderator of the panel, If Data Wants to be Free, Is Big Privacy A Prison? at Strata 2012 in March 2012 in California.
To what extent has the Google privacy policy change increased the general public’s interest in web privacy issues? How far does that awareness/interest extend (i.e., to other networks/services)?
The media coverage of the change, along with Google’s own notifications, has likely led to a marginal increase in awareness. I’d need to see research or polls to know whether that’s true with any certainty. Last year’s media firestorm over a tracking file in the iPhone or a progression of Facebook stories add to that. Honestly, however, there have been major data breaches from corporations and government agencies for years. Privacy issues are an ongoing roar and, in the context of so much other hype and media attention, generally do not receive the nuanced coverage that might inform consumers and the general public to make smarter decisions about which services they use and how.
What is Big Data and where does it become problematic with regards to privacy?
You can’t do better than [Strata Conference programme chair] Edd Dumbill’s primer.
“Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”
How much does data really say about individuals? To what extent are corporations’ expectations of Big Data being met, and where do they fall?
It depends upon the data. Data that a doctor collects with biomarkers says quite a bit about your health. Data collected by geneticists may tell us even more about what to expect, and certainly predict what we’re at higher risk for later in life. Data collected from mobile devices over time can tell us where we’ve been, who we’ve communicated with and how often. Data from Web browsers, naturally, can tell us everything we’ve done in them, unless we make special efforts to remove it. Data from retail sites can tell us what we buy — and what we might like to purchase. Data from financial companies, credit agencies and information brokers can reveal someone’s entire fiscal history. Academic data can show how much a student is reading, graduate student is producing or what influence a professor’s papers are in a given research community. The examples are nearly endless. Data can say quite a lot, though one has to be very careful to verify quality and balance it with human expertise and intuition.
Most corporations are still figuring out how to make data work for them; that need is in part why our Strata Conference has proven to be so popular.
How different are the privacy issues raised by Big Data different from what’s come before in terms of government and corporate surveillance? In terms of interpersonal surveillance?
Machine learning, big data and massive processing power can find patterns in ways previous data visualization or processing platforms could not. Look at what Palantir is doing for the U.S. intelligence community for one example, or how they’re helping to detect Medicare fraud for another. Given enough data, intelligence and power, corporations and government can connect dots in ways that only previously existed in science fiction. Many of them, however, are still struggling to do so.
What is more likely to affect a change in organisational data collection of personal information: public interest, government regulation or something else?
Both of those factors will have an effect. Data retention laws could play a major role, as could regulation that comes out of them. It’s not clear how well those entrusted with the public interest understand the issues and implications here, in terms of the risks and substantial rewards that exist with big data. Any action by government or other entities should weigh the potential benefits of data for the public good with potential harms.