Operationalizing Privacy in a Big Data Context

Operationalizing Privacy in the Big Data Context

Ann Racuya-Robbins

Operationalizing privacy is largely about understanding the nature of the data you are interested in analyzing. Understanding of the nature of the data involves intuition, ethics and some ICT technical knowledge. An average adult person has the ability to understand and make decisions on these matters.

Once the nature of the data is understood intuitively, ethically and in a general technical way privacy requirements for Big Data can be delineated in alignment with national standards, laws and regulations followed by the further specification of technical details that meet privacy requirements in deployment. Importantly since this technical knowledge is based on intuitive and ethical parameters the average person can understand the relationships between the privacy parameters and the technical and they should remain clear.

Privacy Risk Assessment, Management, Prevention and Mitigation

Goal: Privacy Preserving Information Systems using Big Data, Big Data Analytics

Scoping the Privacy Context – A Question and Answer Tree.

Pre-Big Data Processing/Analytics  also a Post-Big Data Processing/Analytics

QUESTION: The most important question is: Does your prospective data set(s) contain personal data*?


FOLLOWUP QUESTION 1: How do you know?

FOLLOWUP ANSWER: Metadata Personal Data Tag

FOLLOWUP ANSWER: Provenance Report from Data Vendor or Agency

FOLLOWUP QUESTION 2: Can you verify? Reproduce?




FOLLOWUP QUESTION 1: How do you know?

FOLLOWUP ANSWER: Data Vendor Reports no Personal Data.

FOLLOWUP QUESTION 2: Can you verify?

FOLLOWUP ANSWER: No. Data Vendors maintains Proprietary  Status of Data

ANSWER: I don’t know.

FOLLOWUP QUESTION 1: How can you find out?


QUESTION: Does your data set contain “raw” data?

QUESTION: How large is your Data Set(s) Cluster?

< 100 gig

< 1.5 TB

< 100 TB


QUESTION: Will more than one data set be linked and analyzed.

QUESTION: What is the anticipated rate of arrival of the data? At what velocity will the Data Set(s) Cluster be processed/analyzed?

QUESTION: Is the data irregular and of multiple data types?

QUESTION: Will the processing/analytics be used for real-time decision-making?

More QUESTIONS to be determined.



Definitions TBD

Personal Data/Information

Data Actions















Personal Data Metadata Tags

General Personal Data = PD-G

Very Sensitive Personal Data = PD-VS


Privacy Rights Risks, Harms and Mitigations (Controls)

Rights TBD



Appropriation: Personal information is used in ways that deny a person self-determination or fair value exchange.

Breach of Trust: Breach of implicit or explicit trusted relationship, including a breach of a confidential relationship

Distortion: The use or dissemination of inaccurate or misleadingly incomplete personal information

Exclusion: Denial of knowledge about or access to personal data. Includes denial of service.

Induced Disclosure: Pressure to divulge information.

Insecurity: Exposure to future harm, including tangible harms such as identity theft, stalking.

Loss of Liberty: Improper exposure to arrest or detainment.

Power Imbalance: Acquisition of personal information about person which creates an inappropriate power imbalance, or takes unfair advantage of or abuses a power imbalance between acquirer and the person.

Stigmatization: Personal information is linked to an actual identity in such a way as to create a stigma.

Surveillance: Collection or use, including tracking or monitoring of personal information that can create a restriction on free speech and/or other permissible activities.

Unanticipated Revelation: Non-contextual use of data reveals or exposes person or facets of a person in unexpected ways.

To Be Defined:

Data Inference





Data Subjects Intellectual Property


Preventions, Mitigations, Controls


Big Data Guidelines Repository at WIPO or


Characteristics of Trust in a Time of Big Data

Implications for Life in a Time of Big Data
Goals, Methods and Models, Dilemmas and Opportunities
Ann Racuya-Robbins
February 20160229 —Spring 2016
1. Big Data Goals for Life — Survival?
Today the world store of human life has grown greatly. It is not clear that any other form of life has increased as rapidly, except perhaps the microbes and other life that cohabitates on/in human life. This increase has brought with it many concurrent and emergent problems and opportunities for life, not only human but all life. These problems and opportunities have simultaneously brought to bear the limits of our creative capabilities in understanding human survival and the survival of life. Someones of us have yelled fire, and millions of people and their technology are looking for answers and understanding. Generally speaking this development is a good thing; on some level every life wants to survive and even flourish and thrive. The question and the context then becomes; Is our collective effort of gathering knowledge—data and information for the survival of life?
For now it is important not to be distracted nor to make too much of the differences in terminology here of data, information and knowledge, as if in our case, data is something fundamentally different from information and knowledge. It is not. It may be reasonable to point out that data and information are kinds of knowledge and/or contexts of knowledge without inferring that these contextual differences are greater than the common ground of knowledge. We could claim our subject to be Big Knowledge or Big Information. For now Big Data may suffice. Later there will be time and effort applied to pinning the technological details of our project.
What makes data, knowledge or information Big? A hundred years hence?
What makes data Big Data? This is a second motive for our work here. To be sure one cause is simply the increase in human life population. This increase has created an increase in the volume of knowledge from data collected. This is the first characteristic identified in the NBDPWG Volume One Definitions. Because the data/information/knowledge comes largely from and in association with life it is full of variety another characteristic of Big Data. Life is at every instance various and significant, unique and changeable. Variety is a form of knowledge that changes over time. Knowledge of life that changes over time can be a picture, a life pattern. Highly detailed life patterns that change over time identify and are in aspects individual lives. Because of the volume and variety of knowledge from data there is both an apparent and real need for speed and velocity to understand this volume and variety. This apparent and real need for speed and velocity is both an intuitive and practical pressure being placed on technology to manage Bigness. Of course bigness is a relative and changeable term. More on this later. For today it might be more precise to say that human life is trying to find a strategy and technology for bringing together in an intelligible way differences in the speed and velocity of knowledge creation.

For Whom
For What
For When
For Whom
For What
For When

2. Living Methods and Models
The Role of Thinking
The Role of Reflection
The Role of Metaphor
and Mapping
The Role Security
The Role of Privacy

3. Dilemmas and Opportunities for Life
Concurrency, Simultaneity, Parallelism and the Scientific Method
Is it obsolete as an organizing principle?
What history? From when?
Orchestration and Orchestrator
Governance and Government