It is well known that computing tends to be interpreted as a science or an art depending on very non-objective reasons. To me, it’s a science because you need to study hard subjects such as algebra and calculus before you get into the innings of algorithms. But it also contains lots of the features that makes any area of human knowledge considered to be an art: imagination, abstraction, decomposition, visualization, and many more.
Last week I was attending a recruitment event at a large Industrial Engineering school. One student came over and said (honestly but containing a hidden bitter reproach): “you computer engineers, have become preeminent because of the data, it used to be industrial engineers that had the glory when industry was on top.” Leaving aside the bitterness of the sentence, which I understand, he was right in stating that data nowadays is leading, Not only companies in their decisions, but individuals through apps, portals and news.
This is nothing new, anyone could argue this. And it’s completely true. Governments and influences have always been fed by reports, figures, and trends. Many years ago, getting a report was somehow difficult, expensive, and slow. Today, we suffer from data indigestion. Data is everywhere, far more than anyone or any one company could ever digest.
The key question is about what is the right data to consider. Let’s clarify some relevant data categories:
- Big Data is about volume, huge amounts of data. Now it is technically and economically feasible to tackle these sort of data with new ‘Cloud Technology’
- Fast Data is about data that flows at high speed, either because it’s generated at high speed, or because there are many sources that makes processing daunting. We now also have good architectures and technologies to deal with fast data.
- Complex Data is about data that is not structured like a web page or tweet, or that is not-so-easy to manage as car metrics in proprietary format.
I would also add another classification for data: known vs. unknown data. This leads to the amazing topic of “how to deal with the unknowns”, an area of active research in fields like philosophy, astronomy, mathematics, intelligence, and computing, among others. How do we know if we have unknowns?
In my experience, there are companies that have a slight idea about some of their unknowns, in terms of data. Other companies just ignore the topic (“we’ve got everything!”) or humbly accept that there might be something “out there”. However, no one can deny the existence of Dark Data, that data that exists but - for a combination of reasons - remain silently hidden and does not have any further effects apart from their intended usage.
I founded Datumize because while working for corporate customers I became aware of vivid examples of Dark Data. Data that was being used for very specific purpose but, if they could have a second life, would have yielded an important benefit to the company.
The first example of Dark Data I discovered many years ago, was related to a resource allocation system in a large health-industry customer. Things like allocating a surgery room, or checking availability of a certain drug. It did happen (and it does happen!) that the same resource was checked again and again over a long period of time. Did anyone ever notice this? Nope. Read operations, that is, what you do with an application that is just “checking” and is not being “saved”, are rarely registered. At the end of the day, the whole organization could have been requesting a certain drug more than one thousand times and nothing happened. Dark Data, in this case, became the collection of inventory questions. If that Dark Data would have become available, a simple report could unveil the list of most requested items out of inventory. And someone could work on fixing it, offering positive results
Dark Data represent the next potential revolution in terms of data. Our ability to navigate through these unknowns will define how much value we get in return. A question I usually ask my customers is not whether they have Dark Data (they have for sure), but if they would be keen on gaining an advantage over their competitors based on those data. So, at the end of the day, the question is not about the data category (big, fast, complex, dark) but which data category will yield the most beneficial results for your business.