A few weeks ago, Anne-Sophie Mertens, founder of Vouloir Dire (an online booking company for FSL interpreters), shared with us an article on LinkedIn about dataism, a term we didn’t know. And we, as data professionals using Artifial Intelligence daily, have been surprised: there is a trend that places data above humans.
The concept of dataism
The term dataism was first used in February 2013 in an article published in the New York Times, written by David Brooks. In this opinion paper, David Brooks explains that computers and algorithms are far superior to humans in detecting weak signals and in observing and modelling events without judgment bias.
In 2016, Yuval Noah Harari further develops the concept by explaining that “Dataism declares that the universe consists of data flows, and the value of any phenomenon or entity is determined by its contribution to data processing”. He then describes that individuals are in fact units of calculation and that the organization of society seeks to optimize the computing capacity of that society, and the propagation of information within that society.
Of course, this concept and this description are above all a philosophical posture and not absolute reality. There are others, such as the theory of the selfish gene, stoicism, etc. We see here that dataism is a paradigm and that it can in no way be an absolute doctrine.
The problem is that by fact, some people are really convinced that the worshipped Big Data dominates the world, and that we are at the dawn of a dystopian universe where the machine is more intelligent, more powerful, more autonomous than humans. Besides, isn’t the machine superior to men at Go game or Jeopardy? Didn’t we see Facebook’s Artificial Intelligence (AI) invent their own language, incomprehensible to humans? And above all, haven’t we seen a Google AI create its own AI superior to all similar AIs developed by humans? Well, not really. In our opinion, this indicates a profound misunderstanding of the process of AI development, and what they are or not capable of.
What is called Artificial Intelligence is not an intelligence comparable to human intelligence. It is simply a mathematical rule, sometimes extremely complex, programmed to perform a calculation, and running on a computer of variable power. AlphaGo, which beat the best human player at Go game, is only a program set to be good at Go game. It can’t do anything else. The so-called “intelligence” is only the ability to analyze a Go board and place one’s checkers optimally to maximize one’s score. The AI created by another AI? Nothing really magical: the parent AI tries to optimize the parameters (architecture, coefficients, weight, functions, etc.) of its child AI so that it returns the smallest possible error. In fact, the parent AI produces a matrix, a table of parameters that are used to create the child AI. This child AI has an objective defined a priori, in this case image recognition.
At DataTailors, we use Artificial Intelligences on a daily basis to help companies make better decisions. How does this actually work for us who create AIs?
Well, first, we have to define a goal for the AI we create: predicting a sale, a churn probability score, recognizing an object in an image, detecting an anomaly in a series of data, etc. This step requires human expertise, for the simple reason that while AI can model and predict anything as long as it has sufficient training, AI itself is unable to determine what is relevant for running a business or not. This objective is chosen by people who have a detailed knowledge of their sector of activity, particularly on a social and human level.
In a second step, we choose the data that the machine will consider to achieve its objective. For example, imagine that we have to predict an avalanche risk. There are several ways to approach the problem. Weather and avalanche history over the same period in past years can be taken into consideration. We can also ask an avalanche specialist who will tell us that the avalanche risk depends not only on the weather, but also on the structure of the snowpack, the humidity of the snow, the slope, etc. This expertise is crucial because no AI is able a priori to define what data it needs to achieve its objective. Worse than that: one could very well predict the avalanche risk from the stock market indices of CAC40 companies. The AI will find weak mathematical relationships between these stock indices and avalanche risk if asked. The problem is that these mathematical relationships are not relevant, not reliable. They are absurd. In this case, the AI is useless.
In a third step, we choose the algorithm we will use to create this AI. There are plenty of them depending on the objective to be achieved. The choice of this algorithm is also eminently human. There are of course general rules for choosing an algorithm according to the type of objectives to achieve, but very often the Data Scientist will ultimately rely on his deep knowledge of the mathematical, algebraic processes that govern the considered algorithms. In addition, the chosen algorithm will depend partly on the data selected to drive this algorithm.
Then, we pre-process the training data set to put it in a format understandable by the chosen algorithm. In fact, 95% of an AI’s success depends on the human’s ability to address the algorithm in the most appropriate way. An algorithm is silly by nature, and will do as it can with the data we give it. There is also a principle that governs the development of AI: GIGO (Garbage In, Garbage Out). Poor data will produce poorly performing AI. Data pre-processing is also based on human expertise, which will determine the relevance of pre-processing. For example, for a series of continuous data with values ranging from 0 to 100 (such as temperatures), is it better to center-reduce these values between -1 and 1 or to bin these data in classes(0-20 °C,20-40 °C, etc.)? This decision depends on the objective to be achieved, the importance that one wants to give to these data for AI, and the knowledge that we have a priori of the relationship of these data with the objective to be achieved.
At the stage of the model setup, the machine is far superior to human. Why? Because the machine has to solve a mathematical optimization problem that requires to perform millions of calculations without errors, with a very high precision. The machine can do that very well, and very fast. That’s why human even made it. Our task at this stage is to choose the methods for optimizing the parameters, and to make sure that the AI manages to train properly. But most of the work is done by the machine.
Finally, we develop an interface that allows the human, our client, to communicate with the AI. Note that the AI does not address the human. The AI has no intention. The human decides to question the AI. Otherwise, the AI is just an inert mathematical model resting deep down a computer.
So, what about dataism? Well, it is above all a philosophical position, a paradigm that we must confront with concrete reality. It is true that the machine is superior to human to understanding weak signals, modelling, making statistics, etc.. In fact, the machine is superior to human in calculation. That is why a computer is called a computer. AIs are prodigious creations that make it possible to carry out more and more complex tasks: driving cars, recognizing faces, maintaining almost natural conversations. Nevertheless, they are nothing but mathematical models created precisely to perform the task assigned to them.
Big Data, Deep Learning, AI, are above all fabulously ingenious creations of men and women. They were invented to answer questions, to achieve objectives defined by humans. It is premature to place machines, data, above human. Of course, we are sometimes afraid to see how Google recommends that we leave 15 minutes early because it has seen in our agenda that we have a meeting in another place than usual. However, we must not forget that at the very beginning, a human created this AI with precisely this objective: to tell the user when to leave according to the places written in his agenda. And the AI does it well, because Google has extremely brilliant people to create these AIs. But it remains a story of men and women.