It seems to me that much of contemporary-government governance is built on the premise that statistical/computational methods are unbiased evidentiary approaches to informing many aspects of standards development, governance, policy and enforcement. In a democracy where differences exist this is particularly important because the best practice is to resolve differences through discussion and unbiased evidentiary information where people are encouraged to voluntarily come together even compromise. Big Data erodes if not undermines the fairness of the “unbiased evidentiary” basis of statistical/computational approaches. The lack of privacy in the Big Data setting is one of the expressions of this. This lack of privacy is a problem for both individuals and entities. It is really a collective problem of our time. Introducing noise into the computation has more or less as many drawbacks as benefits taken overall.
This is the conundrum I have been wrestling with and which I hope to shed some light on in the Implications for Life in a Time of Big Data Whitepaper I am working on in the NIST Big Data Public Working Group.
Implications for Life in a Time of Big Data
Goals, Methods and Models, Dilemmas and Opportunities
February 20160229 —Spring 2016
1. Big Data Goals for Life — Survival?
Today the world store of human life has grown greatly. It is not clear that any other form of life has increased as rapidly, except perhaps the microbes and other life that cohabitates on/in human life. This increase has brought with it many concurrent and emergent problems and opportunities for life, not only human but all life. These problems and opportunities have simultaneously brought to bear the limits of our creative capabilities in understanding human survival and the survival of life. Someones of us have yelled fire, and millions of people and their technology are looking for answers and understanding. Generally speaking this development is a good thing; on some level every life wants to survive and even flourish and thrive. The question and the context then becomes; Is our collective effort of gathering knowledge—data and information for the survival of life?
For now it is important not to be distracted nor to make too much of the differences in terminology here of data, information and knowledge, as if in our case, data is something fundamentally different from information and knowledge. It is not. It may be reasonable to point out that data and information are kinds of knowledge and/or contexts of knowledge without inferring that these contextual differences are greater than the common ground of knowledge. We could claim our subject to be Big Knowledge or Big Information. For now Big Data may suffice. Later there will be time and effort applied to pinning the technological details of our project.
What makes data, knowledge or information Big? A hundred years hence?
What makes data Big Data? This is a second motive for our work here. To be sure one cause is simply the increase in human life population. This increase has created an increase in the volume of knowledge from data collected. This is the first characteristic identified in the NBDPWG Volume One Definitions. Because the data/information/knowledge comes largely from and in association with life it is full of variety another characteristic of Big Data. Life is at every instance various and significant, unique and changeable. Variety is a form of knowledge that changes over time. Knowledge of life that changes over time can be a picture, a life pattern. Highly detailed life patterns that change over time identify and are in aspects individual lives. Because of the volume and variety of knowledge from data there is both an apparent and real need for speed and velocity to understand this volume and variety. This apparent and real need for speed and velocity is both an intuitive and practical pressure being placed on technology to manage Bigness. Of course bigness is a relative and changeable term. More on this later. For today it might be more precise to say that human life is trying to find a strategy and technology for bringing together in an intelligible way differences in the speed and velocity of knowledge creation.
2. Living Methods and Models
The Role of Thinking
The Role of Reflection
The Role of Metaphor
The Role Security
The Role of Privacy
3. Dilemmas and Opportunities for Life
Concurrency, Simultaneity, Parallelism and the Scientific Method
Is it obsolete as an organizing principle?
What history? From when?
Orchestration and Orchestrator
Governance and Government
Appendix B: Terms and Definitions
Big Data consists of extensive datasetsprimarily in the characteristics of volume, variety, velocity, and/or variability that results in new and unprecedented amounts and kinds of value; primarily economic and socialthat requires a governed scalable architecture for the efficient and fair storage, manipulation, analysis and realization of this new value to increase the capability of living, individual social good and the well-being of society as a whole.
The Big Data paradigm consists of the distribution of data systems across horizontally coupled, independent resources to achieve the governed scalability needed for the efficient processing and fair realization of the value inherent in extensive datasets.
Big Data engineering is based on technical paradigms that tend to ignore or remain silent on the societal consequences of Big Data; this is why governance is needed, to guide the technical paradigms to use advanced techniques that not only harness independent resources for building scalable data systems, but also use those advanced techniques to assure the just and fair realization of the societal value inherent in those datasets. Big Data engineering so guided will use advanced techniques that harness the value in independent resources for building governable and governed scalable data systems so that when the characteristics of the datasets require new architectures for efficient, fair storage, manipulation, analysis such architectures will also enable the fair realization of value for the capability of living, individual social good, and the well-being of society as a whole.
Data governance is part of an evolving and dynamic rule set for realizing the societal and economic value from datasets. Data governance involves but is not limited to risk management or administering, or formalizing, discipline (e.g., behavior patterns) around the management of data. Data governance is a reflection of the choices made among normative and competing values and ideals such as—efficiency, economic efficiency—autonomy, individual personal autonomy— distributive justice—corrective justice between the parties—fairness and the like—where parity or equality in bargaining power between the parties is the foremost aspiration.
Value refers to the inherent wealth, economic and social, embedded in any data set that must be governed in order to realize that wealth for all members of the society.
Big Data Governance
However large and complex Big Data ultimately emerges to become in terms of data volume, velocity, variety and variability, Big Data Governance will in some important conceptual and actual dimensions be much larger. Data Governance will need to persist across the data lifecycle; at rest, in motion, in incomplete stages and transactions all the while serving the privacy and security of the young and the old, individuals as companies and companies as companies—to be an emergent force for good. It will need to insure economy, and innovation; enable freedom of action and individual and public welfare. It will need to rely on standards governing things we do not yet know while integrating the human element from our humanity with strange new interoperability capability. Data Governance will require new kinds and possibilities of perception yet accept that our current techniques are notoriously slow. For example, even as of today we have not yet scoped-in data types.
The reason we, so many of us, are gathering our energies and the multiplexity of our perspectives is that we know Big Data without Big Data Governance will be less likely to be a force for good. It may come to be said that the best use of Big Data is Big Data Governance.
What concept or concepts are powerful enough to organize, cohere and form an actionable way forward? Are we brave enough to push forward a few concepts for our discussion? Some think data provenance, curation and conformance are the way forward. I agree with those that think this ground deserves a fifth V — Value.