“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”

This quote from Duke University professor Dan Ariely sums it up well. Big Data is very trendy, but apart from data professionals, few people really know what it is.

We regularly hear in meetings that “this person is doing big data”, which makes no sense. Big Data is simply a concept that designates a large volume of data. I mean… Big Data is just “Big Data”.

Intuitive definition

What is Big Data?

Well, actually, it depends on the context. We will generally consider that we are dealing with Big Data if the size of the data to be processed is greater than the RAM capacity of the computer on which they are processed. This happens, for example, if you are working on a computer with 4 GB of RAM and you need to process a 10 GB text file. We would consider it as Big Data because you will have to implement specific strategies and technical solutions to be able to process this text file.

Less intuitive definition

In reality, Big Data refers to all the tools that allow us to respond to 3 problems:

  1. Data volume: this is what we have just seen above. The amount of data to be processed is the determining factor.
  2. Variety of data. Let’s take the example of our 10 GB text file. Maybe our file only contains first and last names, and in this case it is quite simple to process. On the other hand, it often happens that such text files contain various data such as date, usage time, weather, links to images, etc. Often, these data are stored in structured files called NoSQL databases.
  3. Processing Velocity. This is the speed at which the data must be processed. For example, when you watch videos on Youtube, Youtube recommends other videos to watch. To recommend these videos, Youtube uses an algorithm that analyzes in split second the other videos you have already watched, to present you with relevant content. This analysis, this data processing, is instantaneous. This velocity again requires the implementation of very specific techniques.

Big Data therefore meets these 3 criteria: Volume, Variety, Velocity. This is the 3 V of Big Data. To these criteria we now add 2 other V : Veracity (the reliability of the data obtained) and Value (what the data collected is worth on a strategic and economic level).

The 5 Vs of Big Data: Volume, Variety, Velocity, Veracity, and Value.

Why is Big Data important for companies?

Let us look at the problem the other way around. Why would so much effort and money have gone into developing storage, processing, and Big Data analysis technologies if they had no use?
Big Data serves roughly three purposes, which may or may not be joint:

  1. Offerring a better service
  2. Making better decisions
  3. Reducing risks

Applied to the company, these 3 objectives allow either to earn more money, or to avoid losing money. How?

Better service: Big Data makes it possible, for example, to mathematically profile customers. When a store offers a loyalty card that is scanned at each checkout, it allows it to associate a purchasing behavior with a loyalty card identifier. This behaviour, or buying profile, then allows the store to offer targeted promotional offers to the loyalty card owner. This profiling makes it possible on the one hand to better serve its customers by not flooding them with offers that do not interest them; on the other hand, it makes it possible to increase the average basket of the customer (the average amount of money spent by the customer at each visit to the store).

Making better decisions: Bad decisions kill companies. Bad investment at the wrong time, bad anticipation, etc. These bad decisions are due to uncertainty. Big Data makes it possible to greatly reduce this uncertainty and to avoid making a decision with the feeling, by having an erroneous perception of the reality. In our portfolio, we take the example of a company that rents self-service bicycles. The company’s management team must decide whether to invest 100k € in the purchase of 100 new bicycles or whether to invest in something else. This decision is crucial: if a user arrives at a station and there is no bike available, he will arrive late for work, will no longer trust the bike rental company and that company will lose a customer. By using Big Data, we can fairly estimate that in the weeks to come, the volume of rented bikes will drop, and that it is not necessary to invest immediately in new bikes. So the company can invest the money in something else that is more profitable: marketing, etc.

Reducing risks: We talk about risks for both companies and people. For production plants, every minute the production line is shut down costs a lot of money. These interruptions are often due to breakdowns. Big Data processing tools, including a category of analysis called Deep Learning, can predict the probability that a failure will occur on a particular machine, based on the information given by the sensors of that machine. Even if these sensors return values that are within the norm, Deep Learning tools are able to say “Be careful, this pattern of values is weird, the machine has 70% chance of crashing in the coming week”. The technicians can thus go to control the machine concerned and prevent the breakdown.

But in my company, I don’t have any data!

Really? You don’t have a customer file? No invoices list? All companies have data that can be used. VSEs and SMEs often do not yet have the data analysis culture, even though they already have valuable data. Companies that implement a data analysis strategy are taking a serious advantage over their competitors because they offer better services, make better strategic decisions, and take less risk. Big Data is today’s revolution, just as industrial mechanization and robotization were in the past.

In short

Big Data is a concept that encompasses heterogeneous, voluminous data whose processing offers a strategic advantage. In the age of the digital revolution, all companies have data from which they can benefit. The analysis of this data requires advanced technical skills that have given rise to new jobs, which we will discuss in a future post.