Operationalizing Privacy in a Big Data Context

Operationalizing Privacy in the Big Data Context

Ann Racuya-Robbins

Operationalizing privacy is largely about understanding the nature of the data you are interested in analyzing. Understanding of the nature of the data involves intuition, ethics and some ICT technical knowledge. An average adult person has the ability to understand and make decisions on these matters.

Once the nature of the data is understood intuitively, ethically and in a general technical way privacy requirements for Big Data can be delineated in alignment with national standards, laws and regulations followed by the further specification of technical details that meet privacy requirements in deployment. Importantly since this technical knowledge is based on intuitive and ethical parameters the average person can understand the relationships between the privacy parameters and the technical and they should remain clear.

Privacy Risk Assessment, Management, Prevention and Mitigation

Goal: Privacy Preserving Information Systems using Big Data, Big Data Analytics

Scoping the Privacy Context – A Question and Answer Tree.

Pre-Big Data Processing/Analytics  also a Post-Big Data Processing/Analytics

QUESTION: The most important question is: Does your prospective data set(s) contain personal data*?


FOLLOWUP QUESTION 1: How do you know?

FOLLOWUP ANSWER: Metadata Personal Data Tag

FOLLOWUP ANSWER: Provenance Report from Data Vendor or Agency

FOLLOWUP QUESTION 2: Can you verify? Reproduce?




FOLLOWUP QUESTION 1: How do you know?

FOLLOWUP ANSWER: Data Vendor Reports no Personal Data.

FOLLOWUP QUESTION 2: Can you verify?

FOLLOWUP ANSWER: No. Data Vendors maintains Proprietary  Status of Data

ANSWER: I don’t know.

FOLLOWUP QUESTION 1: How can you find out?


QUESTION: Does your data set contain “raw” data?

QUESTION: How large is your Data Set(s) Cluster?

< 100 gig

< 1.5 TB

< 100 TB


QUESTION: Will more than one data set be linked and analyzed.

QUESTION: What is the anticipated rate of arrival of the data? At what velocity will the Data Set(s) Cluster be processed/analyzed?

QUESTION: Is the data irregular and of multiple data types?

QUESTION: Will the processing/analytics be used for real-time decision-making?

More QUESTIONS to be determined.



Definitions TBD

Personal Data/Information

Data Actions















Personal Data Metadata Tags

General Personal Data = PD-G

Very Sensitive Personal Data = PD-VS


Privacy Rights Risks, Harms and Mitigations (Controls)

Rights TBD



Appropriation: Personal information is used in ways that deny a person self-determination or fair value exchange.

Breach of Trust: Breach of implicit or explicit trusted relationship, including a breach of a confidential relationship

Distortion: The use or dissemination of inaccurate or misleadingly incomplete personal information

Exclusion: Denial of knowledge about or access to personal data. Includes denial of service.

Induced Disclosure: Pressure to divulge information.

Insecurity: Exposure to future harm, including tangible harms such as identity theft, stalking.

Loss of Liberty: Improper exposure to arrest or detainment.

Power Imbalance: Acquisition of personal information about person which creates an inappropriate power imbalance, or takes unfair advantage of or abuses a power imbalance between acquirer and the person.

Stigmatization: Personal information is linked to an actual identity in such a way as to create a stigma.

Surveillance: Collection or use, including tracking or monitoring of personal information that can create a restriction on free speech and/or other permissible activities.

Unanticipated Revelation: Non-contextual use of data reveals or exposes person or facets of a person in unexpected ways.

To Be Defined:

Data Inference





Data Subjects Intellectual Property


Preventions, Mitigations, Controls


Big Data Guidelines Repository at WIPO or


Human Attributes Arise from Human Capabilities

Image by Ann Racuya-Robbins Copyright 2012
Social Cooperation and Privacy


“Hi Ann – could you please help me understand better by giving a few specific examples of human capabilities, and the human attributes that arise from those capabilities? The description you’ve provided is a bit too abstract for me to get my head around it.”

Thank you for your question Andrew.

Human capabilities are sometimes described as functions. More generally human capabilities refers to things a person can do, how a person can act.

For example, speaking (speech) is a human capability. When, by what means, how long, the pitch of the person’s voice, how loud a person speaks, where a person spoke from, whether a person used sign language… are human attributes that arise from the human capability to speak. Because human capabilities are dynamic and expanding so too human attributes are dynamic and expanding. In cyberspace and online environments human capabilities and the human attributes they create is a dynamic and expanding kind of information.
To protect this human capability, for example, American democracy created a right to free speech (with some provisos) which covers more or less all the human attributes that arise from speaking. For this reason we don’t have a right to speak limited to a device. So for example we don’t have a human right to speak limited to speaking on a telephone. This would limit and discourage the dynamic and expanding human function of speech. If a third party takes the human attributes created by a human capability and uses it to make money we would consider that an appropriation and a violation of copyright.

In cyberspace, online environments and information systems we draw on privacy provisions to protect the human capability and human attributes of speech.

In America there is general agreement (consensus if you will) that limiting the right to speak or appropriating speech erodes social cooperation in a society.


Ann Racuya-Robbins