Real-time data analysis: towards the society of tomorrow
2 October 2019 | Written by Thomas Ducato
We generate and disseminate a large amount of information, but these need to be processed more quickly and effectively. We talked about it with Michele Ridi of Radicalbit
Every day we generate about 3 quintillion bytes, an 18-digit number that gives an idea of the enormous amount of information we produce. This is the figure that emerged from a recent research by Cefriel, a company owned by universities, businesses, and public administrations, which carries out digital innovation and training projects. In the last two or three years, almost 90% of the available data has been created, offering new opportunities to citizens and companies. To be exploited and represent a real advantage, however, the data must be analyzed in an effective and fast way: for this reason, an approach called event stream processing, the analysis of the data in real-time is spreading more and more.
Real-time analysis. There are two main technological approaches to data analysis: batch data processing, definitely the most widespread, which consists of processing large amounts of data once they have been “backed up” on a storage system or file system, and Event stream Processing, the ability to operate on continuous (potentially infinite) data flows directly while the data transits, without consequently the need to move the datum from the source to operate on it. Using the second approach, which today is experiencing a phase of great growth, can represent an important competitive advantage in terms of speed and efficiency and represents the future.
Towards continuous intelligence. Continuous intelligence uses historical and real-time data on which to apply machine learning models or artificial intelligence algorithms to extract ever-updated insights for each event. It is, therefore, a new analytical process, capable of crossing different data streams to explore their relationships in continuous mode, allowing the systems to learn uninterruptedly and to make assessments and interpretations of each interaction even more accurate.
Radicalbit. The team of researchers, data scientists, and engineers at Radicalbit, a software house founded in 2015, focused on streaming. They mastered the technologies and specialized in Event Stream Processing through the design and development of a platform capable of managing the entire data life cycle and integrating Machine Learning and Artificial Intelligence algorithms. We interviewed Michele Ridi, Cmo of RadicalBit, to clarify the data collection and analysis process and to reason about impacts, risks, and prospects for the near future.
Today it is said that the data is the new oil, but when did our information become so important?
In the last decade, but actually already before, we began to leave a trace of everything we do, both from the point of view of the consumers and companies alike. It is as if we were modern Tom Thumbs, who leaves crumbs as he passes. Those who understood it ahead of time, one above all Google, had a huge competitive advantage: they began to provide services that initially seemed incredible and we didn’t understand what they were earning. Just think of what happened with the emails: before entering the field, having an email address was a paid service. Google started to grant free registration: it didn’t do it for goodness but in exchange for something, information about our habits. Thus he began to make money with the advertising of those interested in reaching outlined customers.
Google is definitely in good company…
At that point, the revolution that led to the digitalization of processes began. The farther you go, the more this type of approach becomes all-encompassing: any operation both from the consumer’s point of view and from the business process leaves traces, collected with sensors, applications or left by us more or less consciously, which are translated into information that can be analyzed and used.
How is the process articulated and how does this analysis take place?
The data has become a fundamental and enabling element for the great projects of the future. What allows all this is the collection, storage, and analysis of this information. The third aspect is that which we deal with Radicalbit, through the development of software capable of applying artificial intelligence. At the moment in which artificial intelligence algorithms are added to the information, in fact, one can begin to reason. In general the analytics are divided into four families: the descriptive ones, which tell what happens at a specific moment, the diagnostic ones, which also explain the reason why something is happening, the predictive ones, which give an idea of what could happen and those prescriptive that, in light of a forecast, also suggest how to intervene.
It is clear that these are the last two on which we are focusing today. Can you give us some examples?
As regards predictive analytics, an example could be Amazon’s warehouse management which, thanks to a patented system, sends the goods around the world based on the probability of purchase. It does not have all the products in all the warehouses of the world, but with this system, it manages to deliver the order in a very short time. As for the prescriptive approach, instead, the best example is that of the self-driving car: based on experience, the car is able to take a turn and every new driving session will improve its knowledge. It is able to process information coming from cameras and sensors, records the presence of a curve and tackles it on the basis of what was “lived” in the past.
In data analysis, there are two different approaches. Can you explain them to us?
With the first approach developed, also due to the lack of adequate tools, the data was collected in a data lake, a sort of well within which the information, still raw, is stored and analyzed. Today, on the other hand, technologies are being imposed on the market capable of doing this work in real-time: the data must not be moved to the data lake to be processed, but this passage takes place directly while the information passes. These streaming technologies improve performance, allowing them to be faster and more effective, guaranteeing a result in real-time and allowing the application in many sectors: from sports telemetry to cybersecurity. If we think of smart cities or smart homes it is clear that this new approach is fundamental, we cannot have the information late. For example, I need to know where the bus is in real-time or that the lights come on exactly when I enter a room.
What are the applications?
It goes from the most annoying things from the commercial point of view, which allows you to see, scrolling on social media, the advertising of a motorway just before passing it on the highway, up to more noble and interesting things like those already mentioned for smart cities. In the future, it could be the city itself that becomes a huge information display.
And for the companies?
Immediate identification of frauds or anomalies in production processes, predictive maintenance, optimization of the supply chain is just a few design examples enabled by Streaming data analysis. This approach, therefore, involves the entire value chain and can have a decisive impact on the business model of an organization: we will have machines able to give us indications and advice in real-time, in the best cases for all company sectors. The benefits, however, are also for citizens: first of all, I want to mention the use in the medical field for predictive diagnoses, to support those of the doctor, or for the management of patients. This approach will allow a real revolution, which will lead to the emergence of continuous intelligence.
Many opportunities, then. But what are the most obvious dangers and risks?
Some dangers are already present and have been shown and demonstrated: the fact of having shared a great deal of our information has already made us potential targets for campaigns that are not always friendly or clear, as in the case of Cambridge Analytica. That type of risk is there and will still be there, also because the legislation has not yet moved in a particular way. The other big risk I see is that the big information managers, Microsoft, Apple, Amazon, can use our data in a negative way. In the end, they are not democratic states, but private companies with their interests. We must trust that they use this data as the law requires, in a context where the differences between European regulations and the rest of the world is still profound. Sometimes the tools to defend ourselves are there, but nobody pays attention that all this deserves.