Operationalizing Privacy in a Big Data Context

Operationalizing Privacy in the Big Data Context

Ann Racuya-Robbins

Operationalizing privacy is largely about understanding the nature of the data you are interested in analyzing. Understanding of the nature of the data involves intuition, ethics and some ICT technical knowledge. An average adult person has the ability to understand and make decisions on these matters.

Once the nature of the data is understood intuitively, ethically and in a general technical way privacy requirements for Big Data can be delineated in alignment with national standards, laws and regulations followed by the further specification of technical details that meet privacy requirements in deployment. Importantly since this technical knowledge is based on intuitive and ethical parameters the average person can understand the relationships between the privacy parameters and the technical and they should remain clear.

Privacy Risk Assessment, Management, Prevention and Mitigation

Goal: Privacy Preserving Information Systems using Big Data, Big Data Analytics

Scoping the Privacy Context – A Question and Answer Tree.

Pre-Big Data Processing/Analytics  also a Post-Big Data Processing/Analytics

QUESTION: The most important question is: Does your prospective data set(s) contain personal data*?

ANSWER: Yes.

FOLLOWUP QUESTION 1: How do you know?

FOLLOWUP ANSWER: Metadata Personal Data Tag

FOLLOWUP ANSWER: Provenance Report from Data Vendor or Agency

FOLLOWUP QUESTION 2: Can you verify? Reproduce?

FOLLOWUP ANSWER:

 

ANSWER: No.

FOLLOWUP QUESTION 1: How do you know?

FOLLOWUP ANSWER: Data Vendor Reports no Personal Data.

FOLLOWUP QUESTION 2: Can you verify?

FOLLOWUP ANSWER: No. Data Vendors maintains Proprietary  Status of Data

ANSWER: I don’t know.

FOLLOWUP QUESTION 1: How can you find out?

 

QUESTION: Does your data set contain “raw” data?

QUESTION: How large is your Data Set(s) Cluster?

< 100 gig

< 1.5 TB

< 100 TB

etc

QUESTION: Will more than one data set be linked and analyzed.

QUESTION: What is the anticipated rate of arrival of the data? At what velocity will the Data Set(s) Cluster be processed/analyzed?

QUESTION: Is the data irregular and of multiple data types?

QUESTION: Will the processing/analytics be used for real-time decision-making?

More QUESTIONS to be determined.

 

*Appendix

Definitions TBD

Personal Data/Information

Data Actions

             Collection

Processing

Use

Logging

Disclosure

Generation

Retention

Transformation

Transfer

Inference

Extrusion

Pollution

Manageability

 

Personal Data Metadata Tags

General Personal Data = PD-G

Very Sensitive Personal Data = PD-VS

 

Privacy Rights Risks, Harms and Mitigations (Controls)

Rights TBD

 

Harms

Appropriation: Personal information is used in ways that deny a person self-determination or fair value exchange.

Breach of Trust: Breach of implicit or explicit trusted relationship, including a breach of a confidential relationship

Distortion: The use or dissemination of inaccurate or misleadingly incomplete personal information

Exclusion: Denial of knowledge about or access to personal data. Includes denial of service.

Induced Disclosure: Pressure to divulge information.

Insecurity: Exposure to future harm, including tangible harms such as identity theft, stalking.

Loss of Liberty: Improper exposure to arrest or detainment.

Power Imbalance: Acquisition of personal information about person which creates an inappropriate power imbalance, or takes unfair advantage of or abuses a power imbalance between acquirer and the person.

Stigmatization: Personal information is linked to an actual identity in such a way as to create a stigma.

Surveillance: Collection or use, including tracking or monitoring of personal information that can create a restriction on free speech and/or other permissible activities.

Unanticipated Revelation: Non-contextual use of data reveals or exposes person or facets of a person in unexpected ways.

To Be Defined:

Data Inference

Extrusion

Pollution

Bias

Discrimination

Data Subjects Intellectual Property

 

Preventions, Mitigations, Controls

 

Big Data Guidelines Repository at WIPO or

 

Privacy in a time of Big Data

Privacy in a Time of Big Data

Ann Racuya-Robbins

The emergence and existence of Big Data technologies and techniques have scoped the challenge of insuring privacy in contemporary life. It is fair to say that there is an inverse relationship, roughly speaking, between big data and privacy. That is as data scales up privacy challenges become more grave. The factors pressuring big data to scale, to get bigger, faster… are powerful including a hoped for competitive edge and speed and cost reduction of analytics to acquire these competitive edges under the name patterns. While the term patterns has gained currency in the field the term’s meaning is not so well understood. It is important to state clearly that the patterns that are sought are themselves data that contain or create an advantage. Understanding is itself an advantage.  By advantage is meant largely a competitive commercial monetary advantage by a third party other than the data subject.

Privacy is a subject of individual life and living.

Privacy is an expression of biologic specificity. Privacy properly ensured and governed preserves innovation, creativity and living development. In this way privacy is a key ingredient of survival and successful maturation.  The pervasion of data that has thrown open the loss of privacy carried in computer and ICT infrastructures is a relatively new phenomenon. The concern for privacy is a recognition of the broadening value of all individual life.  A recognition of the dignity and richness of every life. A recognition that individual life is not rightly an object or property of another.

Privacy cannot be reduced to personal information i.e. name, address and/or other factuals. PII is an obsolete moniker for our subject.

Let us stipulate that we will in this first instance be referring to living individual adults. Living has many stages and forms that must to be addressed later.

Privacy is—living individual’s control over and freedom and refuge from data collection, capture, extraction, surveillance, analytics, predictions, excessive persuasive practices and communication of the living individual’s life, including external or internal bodily functions, creations, conditions, behavior, social, political, familial and intimate interaction including mental, neural and microbial functioning—unless sanctioned by civil and criminal law and when sanctioned only under protocols where the ways and means of collection, capture, extraction, surveillance, analytics and communication including new methods to emerge are governed by appropriate social cooperation principles and safeguards embedded in ICT infrastructures and architectures overseen by democratic courts and civil and community organizations and individuals peers charged with insuring proper conduct.

Living individuals own the data generated by or from their lives. Should revenues be generated from the collection, capture, extraction, surveillance, analytics and communication of the living individual’s data the majority of revenue generated from the living individual’s life belong to the living individual. Data ownership, provenance, curation, governance as well as the consequences of violations of privacy practices must be encapsulated in or within the data, be auditable and travel in encrypted form with the data. Where possible block chain techniques shall be employed as well as counterfactual strategies (processes) in engineering privacy.

Provenance is an accounting of the history of data in an ICT setting.

 

Next Steps

Define further Data Governance, Data Provenance, Data Curation, Data Valuation. Integrate the principles and practices outlined above into an archetypal Privacy Use Case(s) and articulate the Privacy Use Case as it proceeds through the reference architecture.

 

 

 

 

Data Ownership in the Big Data Context

Who own the data in the Big Data context? What does ownership in the Big Data context mean?

 

Data Ownership –

Ann Racuya-Robbins 2016 06 10

Data ownership means that the data subject owns the majority of the revenues generated from data that emanates/ed from or was built upon the data subject’s data. A data subject is a living being. This kind of ownership would mean that the data subject has the authority over decisions including development and disposition of the data subject’s data.

Also see the Individuals Trust Frame Work