|Source : IBM Global Data Growth|
Decision makers must understand the impact of data uncertainty on their decisions and should think of ways to making this impact explicit. This is not new and is not depended on whether the data comes from a big data source or not. Data uncertainty has been around ever since the first optimisation model was created. In practice this uncertainty is simplified by using a single measure, for example the minimum, maximum or average. The impact of that simplification is manifold as Sam Savage explains in The Flaw of Averages. Without explicitly taking into account the uncertainty in (big) data, the outcomes of optimisation models using that data are no better than a wild guess. With the high level of uncertainty of big data, explicitly taking into account the data uncertainty is even more important. Luckily Operations Research offers various ways to incorporate this uncertainty into the modelling and changes a wild guess into an informed decision. Some well-known approaches are what-if analysis, fuzzy logic, robust optimisation and simulation.
Big data is not objective nor truthful nor credible; It’s a creation of human design and therefore biased. Numbers get their meaning because we draw inferences from them. Biases in the data collection, data analysis and modelling stages present considerable risks to decision quality, and are as important to the big-data equation as the numbers themselves. Decision makers must know about this uncertainty, know how it will impact decision making.