Big data privacy challenge requires due process response, says paper

Tools

Big data doesn't just utilize data: It creates data often of a highly personal nature, and that's a challenge to today's set of privacy protections, argues a paper by a Microsoft researcher and a law academic.

Analytics of large data sets composed of many individually meaningless bits that aren't themselves personally identifiable information can manufacture highly revealing information, note paper authors Kate Crawford and Jason Schultz, of Microsoft Research and the New York University School of Law, respectively.

Famously, retail store Target is able to determine through purchasing patterns which of its customers is pregnant, leading the store in one case to send coupons for maternity clothing and nursery furniture to a Minneapolis teenager who hadn't yet updated her family, the New York Times reported in 2012.

"Target had never collected data showing that any particular female customer was pregnant, a fact that most people would almost assuredly consider to be very personal and intimate information. Instead, it predicted it," paper authors write.

Current privacy regimes such as Fair Information Privacy Practices that rest on providing consumers with notice and control over personally identifiable information can't account for big data processing--especially since in many cases analysis draws unpredicted results from large data sets.

"When a pregnant teenager is shopping for vitamins, could she predict that any particular visit or purchase would trigger a retailer's algorithms to flag her as a pregnant customer? At what point would it have been appropriate to give notice and request her consent?"

What's needed, the authors argue, is procedural data due process that would regulate the fairness of the analytical process rather than the collection, use or pre-decision making disclosure of data.

Procedural due process is an appropriate remedy because big data predictions have the potential for foisting stigmas on individuals, a recognized threshold for its invocation, they say. Some uses of big data would be difficult to incorporate into individualized due process proceedings, since individuals not notified of job openings or favorable insurance offers or the like may not be aware of their analytics-driven exclusion. But when individuals are aware that big data analytics informs an outcome, "individualized due process approaches will seem most appropriate."

In the case of "more opaque predictive problems, such as a real estate offer one never sees because big data might have judged one unworthy," authors say a structural due process driven by public agency oversight and auditing might be better.

An opportunity for individuals to invoke a due process hearing should be possible to invoke once analytics generates personally identifiable information. A variety of notice obligations that at minimum would provide a mechanism for individuals to access an audit trail of the data used in the predictive process could be employed.

"For example, if a company were to license search query data from Google and Bing in order to predict which job applicants would be best suited for a particular position, it would have to disclose to all applicants who apply that it uses search queries for predictive analytics related to their candidacy." In cases where security or proprietary concerns arise, or in structural situations, adjudication could be the role of a neutral data arbitrator, authors add.

A great myth about big data is that its conclusions are free from bias; due process would also serve as a framework for ensuring greater fairness with predictive analytics, authors say.

For more:
- go to the paper's download page on SSRN

Related Articles:
Big Data has implications for national security, says DARPA
Massive data analysis fraught with challenges, says National Research Council
EU agency warns of voluntary surveillance society