Thursday, 13 October 2016

The Error in Predictive Analytics

For more predictions see :
We are all well aware of the predictive analytical capabilities of companies like Netflix, Amazon and Google. Netflix predicts the next film you are going watch. Amazon shortens delivery times by predicting what you are going to buy next, Google even lets you use their algorithms to build your own prediction models. Following the predictive successes of Netflix, Google and Amazon companies in telecom, finance, insurance and retail have started to use predictive analytical models and developed the analytical capabilities to improve their business. Predictive analytics can be applied to a wide range of business questions and has been a key technique in search, advertising and recommendations.  Many of today's applications of predictive analytics are in the commercial arena, focusing on predicting customer behaviour. First steps in other businesses are being taken. Organisations in healthcare, industry, and utilities are investigating what value predictive analytics can bring. In these first steps much can be learned from the experience the front running industries have in building and using predictive analytical models. However, care must be taken as the context in which predictive analytics has been used is quite different from the new application areas, especially when it comes to the impact of prediction errors.

Leveraging the data

It goes without saying that the success of Amazon comes from, besides the infinite shelf space, its recommendation engine. Similar for Netflix. According to McKinsey, 35 percent of what consumers purchase on Amazon and 75 percent of what they watch on Netflix comes from algorithmic product recommendations. Recommendation engines work well because there is a lot of data available on customers, products and transactions, especially online. This abundance of data is why there are so many predictive analytics initiatives in sales & marketing.  Main objective of these initiatives is to predict customer behaviour, like which customer is likely to churn or buy a specific product/service, which ads will be clicked on or what marketing channel to use to reach a certain type of customer. In these types of applications predictive models are created either using statistical (like regression, probit or logit) or machine learning techniques (like random forests or deep learning) With the insights gained from using these predictive models many organisations have been able to increase their revenues.

Predictions always contain errors!

Predictive analytics has many applications, the above mentioned examples are just the tip of the iceberg. Many of them will add value, but it remains important to stress that the outcome of a prediction model will always contain an error. Decision makers need to know how big that error is. To illustrate, in using historic data to predict the future you assume that the future will have the same dynamics as the past, an assumption which history has proven to be dangerous. The 2008 financial crisis is prove of that. Even though there is no shortage of data nowadays, there will be factors that influence the phenomenon you’re predicting (like churn) that are not included in your data. Also, the data itself will contain errors as measurements always include some kind of error. Last but not last, models are always an abstraction of reality and can't contain every detail, so something is always left out. All of this will impact the accuracy and precision of your predictive model. Decision makers should be aware of these errors and the impact it may have on their decisions.

When statistical techniques are used to build a predictive model the model error can be estimated, it is usually provided in the form of confidence intervals. Any statistical package will provide them, helping you asses the model quality and its prediction errors. In the past few years other techniques have become popular for building predictive models, for example algorithms like deep learning and random forests. Although these techniques are powerful and able to provide accurate predictive models, they are unable to provide a confidence intervals (or error bars) for their predictions. So there is no way of telling how accurate or precise the predictions are. In marketing and sales, this may be less of an issue. The consequence might be that you call the wrong people or show an ad to the wrong audience. The consequences can however be more severe. You might remember the offensive auto tagging by Flickr, labelling images of people with tags like “ape” or “animal” or the racial bias in predictive policing algorithms.


Where is the error bar?

The point that I would like to make is that when adopting predictive modelling be sure to have a way of estimating the error in your predictions, both on accuracy and precision. In statistics this is common practice and helps improve models and decision making. Models constructed with machine learning techniques usually only provide point estimates (for example, the probability of churn for a customer is some percentage) which provides little insight on the accuracy or precision of the prediction. When using machine learning it is possible to construct error estimates (see for example the research of Michael I. Jordan) but it is not common practice yet. Many analytical practitioners are not even aware of the possibility. Especially now that predictive modelling is getting used in environments where errors can have a large impact, this should be top of mind for both the analytics professional and the decision maker. Just imagine your doctor concluding that your liver needs to be taken out because his predictive model estimates a high probability of a very nasty decease? Wouldn’t your first question be how certain he/she is about that prediction? So, my advice to decision makers, only use outcomes of predictive models if accuracy and precision measures are provided. If they are not there, ask for them. Without them, a decision based on these predictions comes close to a blind leap of faith.

Wednesday, 3 August 2016

Airport Security, can more be done with less?

One of the main news items of the past few days is the increased level of security at Amsterdam Schiphol Airport and the additional delays it has caused travellers both incoming and outgoing. Extra security checks on the roads around the airport are being conducted, also in the airport additional checks are being performed. Security checks have increased after the authorities received reports of a possible threat. We are in the peak of the holiday season where around 170.000 passengers per day arrive, depart or transfer at Schiphol Airport. With these numbers of people for sure authorities want to do their utmost to keep us save, as always. This intensified security puts the military police (MP) and security officers under stress however as more needs to be done with the same number of people. It will be difficult for them to keep up the increased number of checks for long. Additional resources will be required, for example from the military. Question is, does security really improve by these additional checks or could a more differentiated approach offer more security (lower risk) with less effort?

How has airport security evolved?

If I take a plane to my holiday destination …I need to take of my coat, my shoes, and my belt, get my laptop and other electronic equipment out of my back, separate the chargers and batteries, hand in my excess liquids, empty my pockets, and step through a security scanner.  This takes time, and with an increasing numbers of passengers waiting times will increase. We all know these measures are necessary to keep us save but taking a trip abroad doesn’t start very enjoyable. These measures have been adopted to prevent the same attack from happening again and has resulted in the current rule based system of security checks. Over the years the number of security measures has increased enormously, see for example the timeline on the TSA website, making it a resource heavy activity which can’t be continued in the same way in the near future. A smarter way is needed.

Risk Based Screening

At present most airports apply the same security measures to all passengers, a one size fits all approach. This means that low risk passengers are subject to the same checks as high risk passengers. This implies that changes to the security checks can have an enormous impact on the resources requirements. Introducing a one minute additional check by a security officer to all passengers at Schiphol requires 354 additional security officers to check 170.000 passengers.  A smarter way would be to apply different measures to different passenger types, high risk measures to high risk passengers and low risk measures to low risk passengers. This risk based approach is at the foundation of SURE! (Smart Unpredictable Risk Based Entry) a concept introduced by the NCTV (The National Coordinator for Security and Counterterrorism) Consider this, what is more threatening, a non-threat passenger with banned items (pocket knife, water bottle) or a threat passenger with bad intentions (and no banned items). I guess you will agree that the latter is the more threatening one and this is exactly where risk based screening focusses on.  Key component in risk based security is to decide what security measures to apply to which passenger, taking into account that attackers will adapt their plans when additional security measures are installed.

Operations Research helps safeguard us

The concept of risk based screening makes sense as scarce resources like security officers, MP’s and scanners are utilized better. In the one size fits all approach a lot of these resources are used to screen low risk passengers and as a consequence less resources are available for detecting high risk passengers. Still, even with risk based screening trade-offs must be made as resources will remain scarce. Also decisions need to be made in an uncertain and continuously changing environment, with little, false or no information. Sound familiar? This is the exactly the area where Operations Research shines. Decision making under uncertainty can for example be supported by simulation, Bayesian belief networks, Markov decision and control theory models. Using game theoretic concepts the behaviour of attackers can be modelled and incorporated, leading to the identification of new and robust counter measures. Queuing theory and waiting line models can be used to analyse various security check configurations (for example centralised versus decentralised, and yes centralised is better!) including the required staffing. This will help airports to develop efficient and effective security checks limiting the impact on passengers while achieving the highest possible risk reduction. These are but a small number of examples where OR can help, there are many more.

Some of the concepts of risk based security checks, resulting from the SURE! Programme are already put into practice. Schiphol is working towards centralised security and recently opened the security check point of the future for passengers traveling within Europe. It’s good to know that the decision making rigour comes from Operations Research, resulting in effective, efficient and passenger friendly security checks. 

Thursday, 21 July 2016

Towards Prescriptive Asset Maintenance

Every utility deploys capital assets to serve its customers.  During the asset life cycle an asset manager repetitively must make complex decisions with the objective to minimise asset life cycle cost while maintaining high availability and reliability of the assets and networks. Avoiding unexpected outages, managing risk and maintaining assets before failure are critical goals to improve customer satisfaction. To better manage asset and network performance utilities are starting to adopt a data driven approach. With analytics they expect to lower asset life cycle cost while maintaining high availability and reliability of their networks. Using actual performance data, asset condition models are created which provide insight on the asset deterioration over time and what the driving factors of deterioration are. With this insights forecasts can be made on the future asset and network performance. These models are useful, but lack the ability to effectively support the asset manager in designing a robust and cost effective maintenance strategy.

Asset condition models allow for the ranking of assets based on their expected time to failure. Within utilities it is common practice to use this ranking in deciding which assets to maintain. By starting at the assets with the shortest time to failure, assets are selected for maintenance until the budget available for maintenance is exhausted.  This prioritisation approach will ensure that the assets most prone to failure are selected for maintenance, however it will not deliver the maintenance strategy with the highest overall reduction of risk. Also the approach can’t effectively handle constraints in addition to the budget constraint. For example constraints on manpower availability, precedence constraints on maintenance projects, or required materials or equipment. Therefore a better way to determine a maintenance strategy is required taking into account all these decision dimensions. More advanced analytical methods, like mathematical optimization (=prescriptive analytics), will provide the asset manager with the required decision support.

In finding the best maintenance strategy the asset manager could instead of making a ranking, list all possible subsets of maintenance projects that are within budget and calculate the total risk reduction of each subset. The best subset of projects to select would be the subset with the highest overall risk reduction (or any other measure). This way of selecting projects also allows for additional constraints, like required manpower, required equipment or spare parts, time depended budget limits, to be taken into account. Subsets that do not fulfil these requirements are simply left out. Also, subsets could be constructed in such a manner that mandatory maintenance projects are included.  With a small number of projects this way of selecting projects would be possible, 10 projects would lead to 1024 (=2^10) possible subsets. But with large numbers this is not possible, a set of 100 potential projects would lead 1.26*10^30 possible subsets which would take too much time, if possible at all, to construct and evaluate them all.  This is exactly where mathematical optimisation proofs its value because it allows you to implicitly construct and evaluate all feasible subsets of projects, fulfilling not only the budget constraint but any other constraint that needs to be included. Selecting the best subset is achieved by using an objective function which expresses how you value each subset. Using mathematical optimisation assures the best possible solution will be found. Mathematical optimisation has proven its value many times in many industries, also in Utilities, and disciplines, like maintenance. MidWest ISO for example uses optimisation techniques to continuously balance energy production with energy consumption, including the distribution of electricity in their networks. Other asset heavy industries like petrochemicals use optimisation modelling to identify cost effective, reliable and safe maintenance strategies.

In improving their asset maintenance strategies, utilities best next step is to adopt mathematical optimisation. It allows them to leverage the insights from their asset condition models and turn these insights into value adding maintenance decisions. Compared to their current rule based selection of maintenance projects in which they can only evaluate a limited number of alternatives, they can significantly improve as mathematical optimisation lets them evaluate trillions (possibly all) alternative maintenance strategies within seconds. Although “rules of thumb”, “politics” and “intuition” will always provide a solution that is “good”, mathematical optimisation assures that The Best solution will be found.  

Tuesday, 19 July 2016

Big Data Headaches
Data driven decision making has proven to be key for organisational performance improvements. This stimulates organisations to gather data, analyse it and use decision support models to improve their decision making speed and quality. With the rapid decline in cost of both storage and computing power, there are nearly no limitations to what you can store or analyse. As a result organisations have started building data lakes and invested in big data analytics platforms to store and analyse as much data as possible. This is especially true in the consumer goods and services sector where big data technology can been transformative as it enables a very granular analysis of human activity (up to the personal level). With these granular insights companies can personalise their offerings, potentially increasing revenue by selling additional products or services. This allows for new business models to emerge and is changing the way of doing business completely. As the potential of all this data is huge, many organisations are investing in big data technology expecting plug and play inference to support their decision making. The big data practice however is something different and is full of rude awakenings and headaches.

That big data technology can create value is proven by the fact that companies like Google, Facebook and Amazon exist and do well. Surveys from Gartner and IDC show that the number of companies adopting big data technology is increasing fast. Many of them want to use this technology to improve their business and start using it in an exploratory manner. When asked about the results they get from their analysis many of them respond that they experience difficulty in getting results due to data issues, others report difficulty getting insights that go beyond preaching to the choir. Some of them even report disappointment as their outcomes turn out to be wrong when put into practice. Many times the lack of experienced analytical talent is mentioned as a reason for this, but there is more to it. Although big data has the potential to be transformative, it also comes with fundamental challenges which when not acknowledged can cause unrealistic expectations and disappointing results. Some of these challenges are even unsolvable at this time.

Even if there is a lot of data, it can’t be used properly

To illustrate some of these fundamental challenges, let’s take an example of an online retailer. The retailer has data on its customers and uses it to identify generic customer preferences. Based on the identified preferences offers are generated and customers targeted. The retailer wants to increase revenue and starts to collect more data on the individual customer level. The retailer wants to use the additional data to create personalised offerings (the right product, at the right time, for the right customer, at the right price) and to make predictions about future preferences (so the retailer can restructure its product portfolio continuously). In order to do so the retailer needs to find out what the preferences of its customers are and the drivers of their buying behaviour. This requires constructing and testing hypotheses based on the customer attributes gathered. In the old situation the number of available attributes (like address, gender, past transactions) was small. Therefore only a small number of hypothesis (for example “women living in a certain part of the city are inclined to buy a specific brand of white wine”) can be tested to cover all possible combinations. However with the increase in the number of attributes, the number of combinations of attributes that are to be investigated increases exponentially. If in the old situation the retailer had 10 attributes per customer, a total of 1024 (=210) possible combinations needed to be evaluated. However when the number of attributes increases to say 500 (which in practice is still quite small), the number of possible combinations of attributes increases to 3.27 10150  (=2500) This exponential growth causes computational issues as it becomes impossible to test all possible hypotheses even with the fastest available computers. The practical way around this is to significantly reduce the number attributes taken into account. This will leave much of the data unused and many possible combinations of attributes untested, therefore reducing the potential to improve. This might also cause much of the big data analysis results to be too obvious.

The larger the data set, the stronger the noise

There is another problem with analysing large amounts of data. With the increase in the size of the data set, all kinds of patterns will be found but most of them are going to be just noise. Recent research has provided proof that as data sets grow larger they have to contain arbitrary correlations. These correlations appear due to the size, not the nature, of the data, which indicates that most of the correlations will be spurious. Without proper practical testing of the findings, this could cause you to act upon a phantom correlation. Testing all the detected patterns in practice is impossible as the number of detected correlations will increase exponentially with the data set size. So even though you have more data available you’re worse of as too much information behaves like very little information. Besides the increase of arbitrary correlations in big data sets, testing the huge number of possible hypotheses is also going to be a problem. To illustrate, using a significance level of 0.05, testing 50 hypothesis on the same data will give at least one significant result with a 92% chance.

P(at least one significant result) = 1 − P(no significant results) = 1 − (1 − 0.05)50 ≈ 92%

This implies that we will find an increasing number of statistical significant results due to chance alone. As a result the number of False Positives will rise, potentially causing you to act upon phantom findings. Note that this is not only a big data issue, but a small data issue as well. In the above example we already need to test 1024 hypotheses with 10 attributes.

Data driven decision making has nothing to do with the size of your data

So, should the above challenges stop you from adopting data driven decision making? No, but be aware that it requires more than just some hardware and a lot of data. Sure, with a lot of data and enough computing power significant patterns will be detected even if you can’t identify all the patterns that are in the data. However, not many of these patterns will be of any interest as spurious patterns will vastly outnumber the meaningful ones.  Therefore, with the increase in size of the available data also the skill level for analysing the data needs to grow. In my opinion data and technology (even a lot of it) is no substitute for brains. The smart way to deal with big data is to extract and analyze key information embedded in “mountains of data” and to ignore most of it. You could say that you first need to trim down the haystack to better locate where the needle is. What remains are collections of small amounts of data that can be analysed much better. This approach will prevent you from getting a big headache from your big data initiatives and will improve both speed and quality of drive data driven decision within your organisation.

Friday, 29 April 2016

Is Analytics losing its competitive edge?

Since Tom Davenport wrote his ground-breaking HBR article on Competing on Analytics in 2006 a lot has changed in how we think about data and analytics and its impact on decision making. In the past 10 years the amount of data has gone sky high due to new technological developments like the Internet of Things. Also, data storage costs have plummeted so we no longer need to choose whether we would like to store the data or not.  Analytics technology has become readily available. Open source platforms like KNIME and R have lowered the adoption thresholds, providing access to state of art analytical methods to everyone. To monitor the impact of these developments on the way organisations use data and analytics MIT Sloan Management review sends out a survey on a regular basis. Recently they published their most recent findings in Beyond the Hype: The hard work behind analytics success. One of the key findings is that analytics seems to be losing its competitive edge.

Analytics has become table stakes

Comparing their survey results over several years MIT Sloan reports a decrease in the past 2 years in the number organisations that gained a competitive advantage in using analytics. An interesting finding, especially now when organisations seems to be set to leverage on the investments they have done in (big) data platforms, visualisation and analytics software. An obvious explanation for this decline is that more organisations are using analytics in their decision making, therefore it lowers the competitive advantage. In other words analytics has become table stakes. The use of analytics in decision making has become a required capability for some organisations to stay competitive. For example in the hospitality and airlines industry. All companies in those industries use analytics extensively to come up with the best offer for their customers. Without the extensive use of analytics they would not be able to compete. There are however more reasons for the reported decline in competitive advantage.

Step by step 

From the MIT Sloan report, several of the reported reasons for having difficulty in gaining a competitive edged with analytics are related to organisational focus and culture. The survey results show that this is due to lack of senior sponsorship. Also, senior management doesn’t use analytics in their strategic decision making. As a consequence there are only localised initiatives that have little impact. I see this happen in a lot of organisations. Many managers see value in using analytics in decision making but have difficulty convincing senior management in supporting them. There can be many reasons for that. It could be that senior management simply doesn’t not know what to expect from analytics and therefore avoid investing time and money in an activity with uncertain outcome. It could also be that the outcomes of analytics models are so counterintuitive senior management simple can’t believe the outcomes. There are several ways to change this and benefit more from analytics than just in local initiatives. Key is to take a step by step approach, starting with the current way of decision making and gradually introduce analytics to improve it. Simple steps with measurable impact. That way senior management can familiarise itself with what analytics can do and gain confidence in its outcomes. It can take some time, but each step will be an improvement and will grow the analytical competitiveness of the organisation.

Investing in People

One other main reason from the survey for having difficulty in gaining an edge with analytics is that organisations don’t know how to use the analytics insights. One important reason for this to happen is that analytics projects are not well embedded in a business context. Driven by the ambition to use data and analytics in decision making, organisations rush into doing analytics projects without taking enough time to assure the project addresses an important enough business issue, has clear objectives and scope and implementation plan. As a results insights from the analytics project are knocks on an open door or are too far of what the business needs or its unclear what to do with the outcomes.
Another reason I come across often is that analytics projects are started from the technology perspective: “We have bought analytics software, now what can we do with it?”. It should be the other way around. The required analytics software comes after understanding the business issue and the conditions under which it needs to be solved. Therefore analytics is more than buying software or hardware, people need to be trained to recognise business issues that can be solved from an analytics perspective and be able to choose the appropriate analytical methods and tools. The training will also result in a common understanding of the value of analytics for the organisation which in turn will help change the current way of decision making into one that incorporates the analytics insights.

So, has analytics become less competitive? The picture I get from the above reasons is that most organisations have difficulty changing into a new and more analytical way of working. Many organisations are just starting to use analytics, the MIT Sloan survey reports conforms this given the significant increase in first users (the Analytically Challenged Organisations). These organisations have high expectations on what they will get from analytics but will need to go through organisational changes and changes in the way decisions are made before the benefits of using analytics become visible. This will, following a Satir like change curve, at first cause a decrease in productivity causing in my opinion the lower expectation on the competitive gain these organisations expect to get from using analytics. But this will change over time, and end in a new and improved productivity level. As with any new capability or technology, you first need to learn how to walk, then run and then jump

Sunday, 3 April 2016

The most dangerous equation in the world

Each year Generali, one of the biggest insurers in the world, analyses the claims of its car insurance customers in the Netherlands. Results of that analysis can be found on their website. In their analysis, Generali relates the number of claims to where people live & drive, the age of the driver, the age of the car and the car brand. Some of these statistics provide insights that are to be expected. For example, you expect young drivers to have the highest claim rates, as their analysis confirms. Cars in less populated areas have the lowest claim rates, which seems plausible as well. There is however one finding that raised my eyebrows and that is that drivers of specific car brands have significant higher claim rates than others, suggesting that driving a car of a certain brand makes you either a better or a worse driver. This year Mazda drivers had the highest claim rates according to the Generali analysis, this was for the second year in a row. Drivers of a Citroen had the lowest claim rate, making them the safest drivers of 2015 according to Generali. So, is their truth in their finding and should you therefore avoid a Mazda driver or at least not buy a Mazda yourself? Generali’s statistics suggest you should, don’t they?

Putting it in perspective

Let’s take a closer look at the numbers. The claim rates themselves don’t tell much, but combining them with other data will. What will be interesting is to see how the distribution of car brands compares to the number of claims per brand. Unfortunately, but understandable, Generali only reports the relative difference in claims of a brand compared to the average claim rate. However, Generali claims that its findings apply to all drivers in the Netherlands, so it’s fair to assume that the distribution of car brands in their car insurance portfolio is similar to the overall distribution of car brands in the Netherlands. With data from the Netherlands Vehicle Authority (RDW) gathers, selecting only the brands reported by Generali, we find that 7,545,266 vehicles were registered in the Netherlands in 2015, with the following relative distribution over brands. Clearly Volkswagen, Opel and Peugeot are the biggest brands, while Skoda, Mitsubishi and Mazda are the smallest brands.

Does driving a Mazda make you the worst driver?

By plotting the population size per car brand against the relative claim performance per car brands an interesting pattern appears, the spread in claim performance is bigger when the population size decreases. So smaller car brands have a bigger spread in claim performance than bigger brands. This is the result of a not so well known statistical law, De Moivre’s Equation, which provides us with that standard deviation of the sampling distribution of the mean, σx=σ/ n. Howard Wainer named this equation the most dangerous equation in the world because too little people are aware of it and as a consequence made faulty decisions with serious impact. Look at the formula we see that the standard deviation of the mean is inversely proportional to the square root of the sample size. As a consequence car brands with a smaller number of cars in the Netherlands will have a larger variation in relative claim performance than bigger brands. To illustrate, a small brand with no claims will have the best claim performance in one year, while a small number of claims will make it the worst performing brand the next year. For the bigger brands this is not an issue. Note that the brand with the best claim performance last year was Skoda, this year Skoda was among the worst performers, De Moivre’s equation in action.

The most dangerous equation in the world

So, Generali’s claim that driving a Mazda makes you the worst driver in the road is much to strong. Whether you are a good or a bad driver depends on many things, but I doubt that it will be the brand of your car, and would require a much more detailed analysis. Hopefully Generali doesn’t take the brand of your car into account when calculating their premium levels. Chances are they either over or under price it when you choose to drive one of the rarer car brands. When you want to avoid that, better choose one of the larger car brands as it will be unlikely for them to end up being the worst performing category. Insurance claims are not the only subject affected by De Moivre’s equation. Wainer shows with some compelling examples what the consequences can be of being ignorant of the most dangerous equation of the world and why understanding variability is critical to avoid serious errors.

Sunday, 28 February 2016

Will we have an Algorithm Economy?

We all know it, data doesn’t have any value whether it is big or small, structured or unstructured, available in real time or just sitting in your data warehouse. You need to process data to create insights and then act upon them. For example, being able to accurately forecast next year’s share price of a company doesn’t bring you any value, unless you decide to invest (or divest). Analysing data, creating predictions and determining the best possible action all require algorithms. Organisations are increasingly adopting algorithms to support them in decision making. Gartner expects that the use of algorithms will increase heavily in the next 5 years. Gartner SVP Peter Sondergaard envisions in his 2015 keynote that by 2020 there will be marketplaces similar to app stores where algorithms can be bought or sold. Algorithms can be bought to solve a specific problem or create new opportunities from the exponential growth of data and the Internet of things.  Or organisations can monetise their algorithms by selling them to other organisations. The Algorithm Economy will bring the App Economy to analytics according to Sondergaard.

Algorithms are ill utilised

Algorithms are not new nor is interest in them caused by the growing amount of data or the Internet of Things. Algorithms have been around for a quite a while, some areover 3500 years old, and exist because we as humans had an interest in solving problems in an efficient and repeatable manner. Computers have sped up the development and use of algorithms allowing us to solve bigger and more complex problems faster and enables the analysis of vast amounts of data. Companies like Google, Facebook and Amazon use algorithms at the core of their business, it has been this capability for them to have become so big and influential. This is however not because they were the only ones with access to algorithms. Everyone can get access to state of art algorithms as many of them are taught in university’s maths or computer science classes. A well trained computer scientist or operation researcher can design and implement them for you. Some of them are even for free, open source statistical package R for example contains the latest and most advanced machine learning algorithms. What is striking is that, even though a lot of very advanced algorithms are easily accessible, not a lot of companies seem to be using them. In a Gartner survey from 2013 as little as 3% of the interviewed companies reported using prescriptive analytics, 16% used prescriptive analytics.  So, even though we have the algorithms available, still a lot of companies are not using them. Why is that?

Algorithmic decision making requires high level of analytics maturity

The explanation is simple. Having the data and technology available simply is not enough, it’s a necessary but not a sufficient condition for success. For organisations to be successful with algorithms they need the technology, but also require the people that understand algorithms and decision makers that are willing to act upon the algorithm outcomes.  Acquiring the right analytics talent requires finding people with the technical competency to design, build, assess and use algorithms, usually they have a background in operations research, mathematics or computer science. Next to that, the analysts must have the right business sense to understand the business problem and the right domain knowledge. Analysts need to have well developed communication skills so the right business requirements are identified, otherwise the right answer to the wrong question will be found. Besides the right people that understand algorithms, decision makers must be convinced that with algorithms they can make better decisions. For analytics to be more than just a one-of initiative, senior management needs to support the development of an analytical culture and facilitate algorithm supported decision making throughout the organisation, fully automated or in support of human decision making. They should show their trust in algorithms by using it in their own decision making, show the benefits and stimulate others to do so as well. The current low adoption of advanced analytics methods shows that currently the majority of organisations either are not mature enough or do not have the need for analytical methods. 

Gartner expects that by 2018, more than half of the large organizations globally will compete using advanced analytics and proprietary algorithms, causing the disruption of entire industries. This is quite a bold prediction (even though it is not very precise). For that to happen, in my opinion, these organisations must first grow their analytics maturity. So, instead of investing only in technology companies should invest in getting the right talent and develop an analytical culture. This will benefit them on the short term as doing analytics right will bring immediate value.  It will however take more than the projected 2 years for more than half of the large organizations globally to compete with algorithms in such a manner that it will disrupt entire industries.

A competitive edge comes from specific algorithms not the Algorithm Economy

From an economic perspective, I don’t think that there will be a huge market for algorithms. I expect the demand in the algorithms market place to be low as these algorithms will be general purpose algorithms like face recognition algorithms or SVM implementations. Nice building blocks, but not the differentiator you are looking for. For your organisation to gain a competitive edge, your algorithms need to be unique and specific to you organisation’s business. You therefore will need to design and build them yourself or hire people who can do that for you. That is what Google, Facebook and Amazon did and is also the case in other industries. Take for example pricing algorithms in the airline industry. All major airlines have their own algorithm to optimise their ticket prices even though there are general purpose pricing algorithms available. Reason they don’t use those is that they don’t expect to gain a competitive edge if they would use the technology that is available for everyone else.  So, I expect the algorithm markets as envisioned by Sondergaard to be rather small only containing general purpose algorithms. I do think that algorithms will take the centre stage as they will be a key enabler for companies to become and stay competitive in the future. For that, companies not only need the technology, but should invest in the right talent and proceed with building their analytical competences and culture. 

Tuesday, 5 January 2016

What’s keeping you from getting optimised?

Attention for using data and analytics in decision making is at a level never seen before. Most organisations acknowledge that data is essential for them to keep track of their performance and to be able to analyse why observed and expected performance differ. Some even use it to take a prospective look into the future and prepare plans accordingly. To be able to do this, descriptive and predictive analytics methods are used and as a result these methods are becoming a common instrument in the toolbox of the business analyst. The results of analytics methods are incorporated in decision making processes more often as speed, accuracy and usability of the analysis has increased heavily due to better data management practices and the increased adoption of data visualisation/analysis software. Even though the use of descriptive and predictive analytics has brought many benefits and cost savings to organisations, it is only pocket money compared to the potential that prescriptive analytics has to offer. Still very few organisations have adopted the use of prescriptive analytics. Results of a Gartner survey presented by Lisa Kart during the 2013 INFORMS Executive Forum showed that only 3% of the interviewed organisations used prescriptive analytics in their decision making. Although the number of organisations adopting prescriptive analytics will be rising, I don’t expect that the number has risen significantly in the past 2 to 3 years, which implies that a lot companies have the opportunity to unlock their unused improvement potential. This post is on why they should. 


An Insight or a forecast isn’t actionable

A data driven performance overview, insight or forecast can be useful information but has little value. That’s because the outcomes of descriptive or predictive analytics are not actionable. Real value is only created when insights and forecast are used to make better decisions. This is exactly what prescriptive analytics, a.k.a. optimisation, offers. Given your objective(s), conditions and decision variables it will provide explicit recommendations to achieve the best possible outcome. The recommendation results from considering all possible solutions to your decision problem in a smart way, not just considering a few, and choosing the one that results in the best objective value while satisfying all conditions.

Many analytics overview charts put optimisation at the top or as final step in a process of growing in analytical maturity, suggesting that predictive and descriptive analytics are prerequisites to start with optimisation. This is not the case. There are many organisations successful in optimisation without the ability or the need to forecast. For example, hospitals optimise the utilisation of human capital by constructing optimal shift rosters for nurses and maximise the utilisation of their operating theatres without the ability to forecast. Similarly, delivery firms construct routes for their delivery vans to maximise vehicle utilisation and customer service while minimising cost per km.  

Prescriptive analytics translates a business decision into a mathematical model and uses optimisation algorithms or simulation to find the best answer. With a mathematical model the analysis becomes repeatable and can be re-done quickly, for example re-optimising the production schedule when certain demand conditions change. This brings agility to an organisation and allows it to quickly adapt to new conditions. Also, a mathematical model solidifies knowledge of specialists which enables decision makers to use that knowledge and take action without having to be a specialist themselves.

The whole is better than the sum of parts

Some mathematical optimisation problems are easy, but many of them become unsolvable as they increase in size. This is called the combinatorial explosion, expressing that time to solve the problem grows exponentially in problem size. The size of the mathematical problem that can be solved is therefore depended on the computing power available. Luckily the speedup of computing power is tremendous, for example my 4 year old iPad 2 has the same computing power as a CRAY2 supercomputer from 1985. This speedup, together with the progress in algorithmic optimisation, gives us the ability to solve larger and more complex mathematical problems than 30 years ago. To illustrate, decision problems that would have taken 85 years or ~45 million minutes to solve on the hardware and algorithmic capabilities of 1988 can now be solved within 1 (!) minute. In the 80’s and 90’s large decision problem had to be broken up into smaller parts and solved separately. With the progress in technology we don’t need to break up models into smaller arts but can solve decision problems covering multiple departments in an organisation or across organisations in supply chains in one go.  This holistic way of optimisation most certainly will lead to better decisions as more relevant conditions are taken into account and it will consider more possible solutions.

Optimisation delivers real value

Attention for optimisation is rising, Gartner and other analyst firms signal that attention for this technology is growing.  Optimisation has been around for over 70 years and has proven its value many times. To illustrate, each year INFORMS organises a competition in which the best business applications of analytics compete for the Franz Edelman Award. As a former participant and winner, together with TNT Express, I can tell it’s a tough competition where only the best analytics practitioners have a chance to win. Illustrative for the value optimisation can bring is a graph that contains the benefits of the selected finalists of the Edelman competition. Measured since 1972 total benefits exceed $223 Billion! What is interesting is that the graph seems to level up a bit, indicating that the reported benefits are rising. The benefits from optimisation are not only monetary, it increases the agility of an organisation, allows for continuous improvement, stimulates knowledge sharing, and leads to changes that improve health, safety, cooperation, decision making, and job satisfaction.


Although prescriptive analytics is very powerful, it is no substitute for human brainpower, experience and or judgment. In the end a mathematical model is a simplified version of reality and can’t possible cover all aspects of a decision. In my experience the best results are achieved when decision makers are supported by prescriptive analytic models in their decision making. The results from Edelman finalist proof that. As complexity and speed of decision making is growing you need a better tool than just descriptive or predictive analytics. You need actionable insights, access to prescriptive modelling therefore is a must. So, what is keeping you?