Operationalizing Privacy in a Big Data Context

Operationalizing Privacy in the Big Data Context

Ann Racuya-Robbins

Operationalizing privacy is largely about understanding the nature of the data you are interested in analyzing. Understanding of the nature of the data involves intuition, ethics and some ICT technical knowledge. An average adult person has the ability to understand and make decisions on these matters.

Once the nature of the data is understood intuitively, ethically and in a general technical way privacy requirements for Big Data can be delineated in alignment with national standards, laws and regulations followed by the further specification of technical details that meet privacy requirements in deployment. Importantly since this technical knowledge is based on intuitive and ethical parameters the average person can understand the relationships between the privacy parameters and the technical and they should remain clear.

Privacy Risk Assessment, Management, Prevention and Mitigation

Goal: Privacy Preserving Information Systems using Big Data, Big Data Analytics

Scoping the Privacy Context – A Question and Answer Tree.

Pre-Big Data Processing/Analytics  also a Post-Big Data Processing/Analytics

QUESTION: The most important question is: Does your prospective data set(s) contain personal data*?


FOLLOWUP QUESTION 1: How do you know?

FOLLOWUP ANSWER: Metadata Personal Data Tag

FOLLOWUP ANSWER: Provenance Report from Data Vendor or Agency

FOLLOWUP QUESTION 2: Can you verify? Reproduce?




FOLLOWUP QUESTION 1: How do you know?

FOLLOWUP ANSWER: Data Vendor Reports no Personal Data.

FOLLOWUP QUESTION 2: Can you verify?

FOLLOWUP ANSWER: No. Data Vendors maintains Proprietary  Status of Data

ANSWER: I don’t know.

FOLLOWUP QUESTION 1: How can you find out?


QUESTION: Does your data set contain “raw” data?

QUESTION: How large is your Data Set(s) Cluster?

< 100 gig

< 1.5 TB

< 100 TB


QUESTION: Will more than one data set be linked and analyzed.

QUESTION: What is the anticipated rate of arrival of the data? At what velocity will the Data Set(s) Cluster be processed/analyzed?

QUESTION: Is the data irregular and of multiple data types?

QUESTION: Will the processing/analytics be used for real-time decision-making?

More QUESTIONS to be determined.



Definitions TBD

Personal Data/Information

Data Actions















Personal Data Metadata Tags

General Personal Data = PD-G

Very Sensitive Personal Data = PD-VS


Privacy Rights Risks, Harms and Mitigations (Controls)

Rights TBD



Appropriation: Personal information is used in ways that deny a person self-determination or fair value exchange.

Breach of Trust: Breach of implicit or explicit trusted relationship, including a breach of a confidential relationship

Distortion: The use or dissemination of inaccurate or misleadingly incomplete personal information

Exclusion: Denial of knowledge about or access to personal data. Includes denial of service.

Induced Disclosure: Pressure to divulge information.

Insecurity: Exposure to future harm, including tangible harms such as identity theft, stalking.

Loss of Liberty: Improper exposure to arrest or detainment.

Power Imbalance: Acquisition of personal information about person which creates an inappropriate power imbalance, or takes unfair advantage of or abuses a power imbalance between acquirer and the person.

Stigmatization: Personal information is linked to an actual identity in such a way as to create a stigma.

Surveillance: Collection or use, including tracking or monitoring of personal information that can create a restriction on free speech and/or other permissible activities.

Unanticipated Revelation: Non-contextual use of data reveals or exposes person or facets of a person in unexpected ways.

To Be Defined:

Data Inference





Data Subjects Intellectual Property


Preventions, Mitigations, Controls


Big Data Guidelines Repository at WIPO or


Privacy in a time of Big Data

Privacy in a Time of Big Data

Ann Racuya-Robbins

The emergence and existence of Big Data technologies and techniques have scoped the challenge of insuring privacy in contemporary life. It is fair to say that there is an inverse relationship, roughly speaking, between big data and privacy. That is as data scales up privacy challenges become more grave. The factors pressuring big data to scale, to get bigger, faster… are powerful including a hoped for competitive edge and speed and cost reduction of analytics to acquire these competitive edges under the name patterns. While the term patterns has gained currency in the field the term’s meaning is not so well understood. It is important to state clearly that the patterns that are sought are themselves data that contain or create an advantage. Understanding is itself an advantage.  By advantage is meant largely a competitive commercial monetary advantage by a third party other than the data subject.

Privacy is a subject of individual life and living.

Privacy is an expression of biologic specificity. Privacy properly ensured and governed preserves innovation, creativity and living development. In this way privacy is a key ingredient of survival and successful maturation.  The pervasion of data that has thrown open the loss of privacy carried in computer and ICT infrastructures is a relatively new phenomenon. The concern for privacy is a recognition of the broadening value of all individual life.  A recognition of the dignity and richness of every life. A recognition that individual life is not rightly an object or property of another.

Privacy cannot be reduced to personal information i.e. name, address and/or other factuals. PII is an obsolete moniker for our subject.

Let us stipulate that we will in this first instance be referring to living individual adults. Living has many stages and forms that must to be addressed later.

Privacy is—living individual’s control over and freedom and refuge from data collection, capture, extraction, surveillance, analytics, predictions, excessive persuasive practices and communication of the living individual’s life, including external or internal bodily functions, creations, conditions, behavior, social, political, familial and intimate interaction including mental, neural and microbial functioning—unless sanctioned by civil and criminal law and when sanctioned only under protocols where the ways and means of collection, capture, extraction, surveillance, analytics and communication including new methods to emerge are governed by appropriate social cooperation principles and safeguards embedded in ICT infrastructures and architectures overseen by democratic courts and civil and community organizations and individuals peers charged with insuring proper conduct.

Living individuals own the data generated by or from their lives. Should revenues be generated from the collection, capture, extraction, surveillance, analytics and communication of the living individual’s data the majority of revenue generated from the living individual’s life belong to the living individual. Data ownership, provenance, curation, governance as well as the consequences of violations of privacy practices must be encapsulated in or within the data, be auditable and travel in encrypted form with the data. Where possible block chain techniques shall be employed as well as counterfactual strategies (processes) in engineering privacy.

Provenance is an accounting of the history of data in an ICT setting.


Next Steps

Define further Data Governance, Data Provenance, Data Curation, Data Valuation. Integrate the principles and practices outlined above into an archetypal Privacy Use Case(s) and articulate the Privacy Use Case as it proceeds through the reference architecture.





Individual Human Well Being in an Era of Intangible Dominance and Platform Economics

I am now beginning to wonder if we are chasing a false choice or dichotomy. Is the choice really between privacy based on individual rights vs individuals’ belief that they have been harmed? Is broader better or worse? If we are making a choice based on a fear that commercial interests won’t participate in the IDESG IDEF, what is that fear based on? Outside the digital divide if we go the highly automated (high velocity) route to identity management the issues and choices will likely become invisible to the individual. Risk management may help. As I understand this privacy approach it is based on both rights and harms. From the perspective of the individual human being do not harms require a higher burden of proof (in time and money) than rights? Without portability there are no remedies currently today so is not portability an essential piece of this our requirements. Human Rights

The sense that is emerging is that we need a conjunction of the rights and harms language along with a portability requirement.

Social Cooperation and Privacy

Personal information is a dynamic and expanding kind of information emergent from human capabilities

Image by Ann Racuya-Robbins Copyright 2012
Social Cooperation and Privacy

Social Cooperation, Human Capabilities and Privacy

An excerpt

Ann Racuya-Robbins

November 2, 2014

What is personal information and why does it matter for privacy and social cooperation?

Personal information is a dynamic and expanding kind of information emergent from human capabilities and life experience. Personal information belongs to each person in the same way that human rights and dignities do and each person creates his or her own personal information. When human capabilities are protected and encouraged human capabilities can continue to emerge and grow. This continues to expand the personal information available. Without protection human capabilities’ personal information will be exploited by the strong against the weak and the inequities in our current societies will be replicated. Generally speaking the protection of human capabilities is the raison d’etre of privacy.  Human life and capabilities is the essential source of wealth in the world. Privacy of human capabilities’ personal information is distinct in many ways from the security of personal information. But I won’t go further into that here.

Because personal information of human capabilities is valuable in so many ways including commercially, attempts will continue to be made to separate people from his or her personal information.  One of the recent strategies has been to fracture the human life personal information into bits, attributes, such as a person’s gender, hair color, weight, eye color, height, with the rationale that this fracturing separates the attribute from person, making that piece of human personal information available to be exploited without the person of origin benefiting. Similarly there is an attempt to make the “things” of the Internet of Things separate from the person in order in part to exploit and name the device information as something other than personal.

Personal information from human capabilities is, as I said, an emergent and expanding domain.  Human attributes and personal information belong to the same domain of human capabilities. Through analysis many inferences from human attributes and or personal information can be discovered, revealed and acted upon. There is no perceptible end today to human capabilities for good. There has emerged over the last 75 years a very clear and palpable end to the human capabilities for destruction.  It is fair to say that it is within all of our best interest to reduce the capabilities for destruction and protect capabilities for good.

Image by Ann Racuya-Robbins Copyright 2012
Social Cooperation and Privacy

I also agree with much of Bob Pinheiro’s definition below as a beginning point for personal information.

“So instead I’ll use the term “personal information” to refer to two kinds of information that I believe we’re concerned about: (a) the information that people specifically provide about themselves, as well as information that people directly generate about themselves, and (b) information or “intelligence” about people that others infer based on our observable online activities and behaviors. Included under (a) would be every piece of information that people specifically provide about themselves to service providers, social media, etc., including PII such as name, address, etc. Also included would be our calling and messaging histories, and browsing histories, as recorded on our personal devices. Examples of (b) would be the credit scores that credit bureaus develop about us based on our financially-related activities, intelligence about our preferences, likes, dislikes, etc that third parties develop and sell based on our online activities, and of course the metadata that the NSA develops based on our telephone records.” Bob Pinheiro

Personal Information in Commercial Transactions

Transactions in cyberspace or the internet are unique in many ways. Commonly, for example the entity that owns or holds the data of a website in a server has a built-in advantage over the visitor to a site. In commercial transactions the visitor typically exchanges at least one item of monetary value just by visiting a location—his or her personals ip address and/or a referring site’s ip address. These are personal attributes and information of commercial value “the new money” as Anhil John describes it.

Personal information from human capabilities is, as I said, an emergent and expanding domain.  Human attributes and personal information belong to the same domain of human capabilities. Today there is no perceptible end to human capabilities for good. There has emerged over the last 75 years a very clear and palpable end to the human capabilities for destruction.  It is fair to say that it is within all of our best interest to reduce the capabilities for destruction and protect capabilities for good.

However, I see no reason why addressing these fundamental challenges are out of scope or need to wait. After all that is one of the advantages of the private sector is the ability to create and operate under contract law.

I recommend that the completion of a Memorandum of Social Cooperation the spells out the fairness of the relationship in clear and understandable language and that protects all parties in a commercial relationship including the equitable resolution of the distribution of the monetary value of personal information. It should be the first step in all online transactions.



The Human Trust Experience in an Era of Big Data

Consumer, Manager, Domain Expert Proposal
Subtopic: Unmet Big Data requirements

Ann Racuya-Robbins Image
tHTRX Logo graphic

1. Title
The Human Trust Experience (HTX) in an Era of Big Data

2. Point of Contact (Name, affiliation, email address, phone)
Ann Racuya-Robbins
World Knowledge Bank: Human Trust Experience Initiative

3. Working Group URL

4. Proposed panel topic: Unmet Big Data requirements

5. Abstract
The Human Trust Experience Initiative’s mission is to use Big Data to explore and lay the ground work for understanding the parameters, characteristics, attributes, information architecture, and reference and interaction models of the human trust experience in motion and at rest. Central premises of this work to be evaluated and interpreted are that:
• The human trust experience is foundational to Privacy, to the uptake of ICT innovation, education and the challenges of democratic governance.
• The human trust experience is a central component of all human labor and to individual and community well-being and survival.
• The human trust experience can be a measure and standard by which we understand and prioritize problem solving.

6. Working Group summary
• Create the human trust experience use case.
• Create the human trust experience context.
• Create a semiotics and information architecture of the human trust experience.
• Facilitate through CMS conversation about the tHTRX in a Big Data context.

7. Number of Participants, data working group began, frequency of meetings
December 2013

8. Target Audience
Individuals, Consumers and Producers of Big Data, Businesses, Government

9. Current initiatives
The Human Trust Experience Initiative

10. Specific Big Data Challenges:
Value, Valuation, Contextual Veracity, Identity, Pseudonymity, Anonymity, Privacy, Vetting, Contextual Vetting

11. Urgent research needs

12. Related Projects or Artifacts The Human Trust Experience: Informed Valuation Project

13. Big Data metrics (describe your data to make a Big impression)
Search, discovery, revelation, creation and analysis of the human trust experience from cyberspace data.

14. Keywords
human trust experience, value, valuation, informed valuation, informed contextual value, informed contextual valuation, contextual veracity, identity, pseudonymity, anonymity, privacy, risk management

Best Practices for Human Attributes

How to Move towards Trustworthy ground with Human Attributes

Human Attributes—all the aspects of a life—in online transaction environments—should progress towards the creation of Standards for the attributes-lifecycle. Such Standards should include how to respect, care and creatively treat those attributes. I think this is the right direction.
I think there should be a base Standard of assurance that will allow for the greatest range of transactions by the greatest number of participants. More on this later. Such a base standard of assurance should be agreeable by all stakeholders including individuals. This will require individuals to better understand monetization of human attributes and the crucial complex of the meaning of human attributes.
To move towards and achieve Standards for the attribute lifecycle a central challenge and dilemma must be undertaken to transparently articulate the relationship between Personally Identifiable Information (PII), attributes over a lifecycle and attributes that create PII through aggregation, provenance or other time related processes. We must acknowledge that PII and attributes are, more or less, on a continuum. The truth needs to be told that privacy requirements are not meaningful without taking on this challenge. I have some suggestions for standards in this area that I would like to forward at the proper time.
Here lie many perils and much promise.

Underserved Initiative Receives Approval

July 24th—Un and Underserved People’s Identity approach receives IDESG Privacy Committee Review approval. This means that wider adoption of such an approach is more feasible.

Process Flow Diagram

Un and Underserved Use Case Process Flow

One of the indices of the human trust experience is whether or not and  the extent to which a person or organization creates work that serves others and interests separate and greater than themselves. My view is that particularly today economic development should wherever possible be designed to serve the under served first. To this end I have worked on a systemic approach (at the IDESG referred to as a “Use Case” for identity management and privacy for the Un and Underserved People for the Identity Ecosystem Steering Group (IDESG). In essence this approach allows individuals who have been left out of the existing online environment to piggy back into interactions and transactions online through identities received through Federally Insured bank accounts. This can be done remotely as well as in person. There are many advantages to this approach which will be reported on in the coming months. The good news is that on July 24th this approach received IDESG Privacy Committee Review approval. This means that wider adoption of such an approach is more feasible.

For complete information see https://www.idecosystem.org/wiki/Un_and_Underserved_People_Use_Case

Use Case Description
“Use Case Purpose”: Un and Underserved People Enter the IDESG Identity Ecosystem.

Un and Underserved refers to people that do not have, have lost, or have inadequate digital identities to enable them to participate in the secure and resilient, cost effective and easy to use, privacy enhancing and voluntary interoperable online Identity Ecosystem envisioned by NSTIC and the IDESG. Currently there are barriers to and opportunities for the Un and Underserved to enter the IDESG Identity Ecosystem. Such barriers may be, limited financial means, physical disadvantage or challenge, language differences, loss of employment, to name but a few. Such opportunities may be new products and services to remove these barriers, innovations in serving this community as well as greater social cohesion and internet-wide cyber-security. Importantly, many of the Un and Underserved are also financially un and underserved. Today 68 million American adults are un or under banked. More than 2.5 billion adults around the world are unbanked.

The goal of this use case is to leverage existing programs and services, for example the FDIC “Safe Account” program, to allow the Un and Underserved to use their “Safe Account” bank account enrollment process as a means of obtaining a digital identity and entering the IDESG Identity Ecosystem. Being Un and Underserved is not a new problem but one that has had a long (perhaps going back to the beginnings of money and then banking)and often intractable set of complexities. The efficiencies of cyberspace (the internet) provides an historic opportunity to bridge this gap.

Scenario(Example): Julia, a prospective underserved financial services customer, wants to open a bank account as well as obtain an digital identity for use in the IDESG Identity Ecosystem. Julia learns of a FDIC “Safe Account” type of account at her local community center which allows her to apply for an account and subsequently obtain a digital identity. Julia applies for and gets an FDIC “Safe Account” through an FDIC insured bank or equivalent financial institution compliant with 31 CFR 1020.220 – Customer identification programs (CIP) for banks, savings associations, credit unions, and certain non-Federally regulated banks. Or other acceptable customer identification program. The enrollment vetting process into a “Safe Account” serves the vetting requirements for Julia to obtain her digital identity. After a period of successful Safe Account practices Julia uses her Safe Account history and digital identity to apply for an FCCX credential or other governmental credential for accessing government services. Julia receives the government credential and uses the government credential to apply for other online services and products including more financial services. Julia is able to step by step build access to a wide range of products and services she will need and use as she provides for her family and builds her entrepreneurial life as a clothes designer and pattern maker.

Goals Summary: Julia will be able to obtain an digital credential with the qualifications used to obtain her Safe Account. Julia will be able manage her finances in a secure and insured or protected environment where she can increase her income through entrepreneurship, improving the quality of life for herself and her son, the economic activity in her neighborhood through her purchases, and tax receipts to her city and state. Julia will be able to interact with some government and non-profit services improving confidence in government and non-profit institutions and financial institutions including banking. The financial institutions and non-profit organizations, government agencies and healthcare providers will be able to increase the number of their customers/participants. Through this use case a broad range of stakeholders are brought together to share risks and rewards in creating an online Identity Ecosystem Framework where economic opportunity, productivity and human well being are harmonized.


  • 1 Un and Underserved People.
  • 2 Financial Institutions.
  • 3 Non-profit Organizations.
  • 4 Government.
  • 5 Insurance entities.
  • 6 Any Relying Party or Service Provider in the IDESG Identity Ecosystem that complies with the NSTIC principles and has a Trustmark Accreditation.
  • 7 Alternative Financial Services.

Assumptions Un and Underserved Person applies in person at the Financial Institution or uses an acceptable electronic means of application including for example Treasury’s OCIP that has brought together the FSSCC, DHS, and NIST to create a Cooperative Research and Development Agreement on identity proofing, which has identified new methods for satisfying the “know your customer” requirements of financial institutions. Financial Institution must be a FDIC insured bank or equivalent. The digital identity meets the needs of relying parties.

Process FlowProcess Flow Diagram

This use case is unique in that the person, Julia and her son in this case, exist outside an online Identity Ecosystem. Entering the Identity Ecosystem is a kind of state change, so to speak, for Julia. The other stakeholders are already inside the Identity Ecosystem. The process of entering the ecosystem should be done with care by all stakeholders. Success Scenario Julia is able to enroll in a Safe Account that provides her with a digital identity useful in the ID Ecosystem for products and services and for federal, state and local governments. Julia can also apply for and potentially receive other digital identities from other ID Ecosystem providers enlarging the range of products and online services, including financial she can access.

Constructing Human Anonymity

Can Human Anonymity be Constructed?

Are Human Identity and Human Anonymity compatible in online or Internet interactions, transactions?

To begin to wrestle with these questions take a look at the complex and challenging process recommended as a best practice by the Washington Post for constructing imperfect anonymity online.


SecureDrop – The Washington Post.

Why is human anonymity so hard to construct on line and how is this difficulty central to the human trust experience ?  Does the elaborate process the Washington Post has created indicate that someone is intending to tell the truth as they know it? Or that they are intending to mislead or lie about something? One thing  I take away from the Washington Post SecureDrop process is that both the sender and receiver of the message anticipate and intend that the message not be distributed even though the sender and receiver don’t know each other. This is a very high standard of privacy—my mouth to your ear.  Further it seems clear that both sides without even knowing the content of the message anticipate or want to make it possible for the importance of the message to be inferred as well as maintained. This is not throw away language but language that both the human sender and the human receiver must have confidence—to a very high degree of certainty—will be transmitted completely with its original content. This is a powerfully human experience of trust, mistrust, of risk and even intimacy. From this view the stakes for the  human trust experience are very high.

I will be building a privacy use case for online anonymity from an exploration of online anonymity.

Constructing Anonymity like navigating a wild American river.ver Photograph in motion
Animas River Colorado USA

Human Trust Experience and Data Actions

Recently I attended a Privacy Workshop hosted by NIST. One of the insights that emerged is the difference between security language and privacy language. For example while a phrase like “data actions” may from the security engineering perspective be useful and meaningful from the perspective of human beings this term is quite empty. Identity is emergent, tender, personal, lying in the field of emotions and life and death. Identity is alive. We should not be impatient that such an important subject is hard maybe beyond our ability at present to speak to. Privacy too is new and unformed.

I sensed that by the end of the NIST Privacy Workshop there was an awareness of the raw and vast scope of the problem.

When “data actions” means inferring what a human being is going to do or think next, monetizing that and generating revenue for a third party or releasing the recent date of your brother’s death for monetary purposes, the emotional danger of these “actions” emerges.

Context is a wonderful tool to help us. But some things carry across context. I think we should look for our humanity in every context and accept nothing less.