Monday, 8 December 2014

The Age of Algorithms started on a diet

With the exponential growth of interest in data analytics, either big or small, the attention for the use of algorithms has risen strongly as well. An algorithm is nothing more than a step-by-step procedure for a calculation.  That is also what makes it so powerful. Algorithms make our lives easier. Recommendations engines single out the product we have been looking for based on our previous purchases and those of buyers similar to us. The NESTthermostat programs itself and continually adapts to our changing life. Facebook selects the news items that are of interest to us, based on what we have been reading, liking and sharing.  All of this would not have been possible without the use of algorithms. Using computers, algorithms can do things close to magic as Arthur C. Clarks third law predicts. Algorithms even let a computer think up new recipes when it gets bored playing Jeopardy! We are experiencing the third great wave of invention and economic disruption; we live in the Age of Algorithms.

Many people see Google as the initiator of the Age of Algorithms; 16 years ago Larry Page and Sergey Brin created their page rank algorithm that enables us to efficiently find what we are looking for on the web. It accelerated the growth of data and the use of algorithms to analyse it. Most of the analyses focus on detecting patterns in the digital bread crumbs we leave behind when wandering on the web. With algorithms companies and organisations try to better understand our needs and wants so they can improve their marketing and sales strategies. Google’s page rank however wasn’t the start of the use of algorithms to solve practical decision problems, the age of algorithms kicked off much earlier than most people think.

Shortly after the Second World War the Pentagon based research group SCOOP was formed. SCOOP stands for Scientific Computation of Optimal Programs. The group set out to find methods for the programming problems of the Air Force. Programming problems are concerned with the efficient allocation of scarce resources to meet some desired objective, for example the determination of time phased requirements of materials in support of a war plan. Mathematically these problems look like this:

Fact is that thousands of real life problems in business, government and the military can be formulated (or approximated) this way. A way (algorithm) to solve these linear programming problems would therefore be very useful and that is exactly what George Dantzig, Chief mathematician of SCOOP, did. In 1947 he invented the simplex method. The impact and power of Danzig’s invention is hard to overstate, I dear to say it’s the most used algorithm today. The journal Computing in Science and Engineering listed it as one of the top 10 algorithms of the twentieth century. It’s probably on your computer as well as it comes standard with Excel.  

One of the first problems solved using the simplex method was a diet problem named after Nobel Laureate George Stigler. Dantzig wanted to test if his new method would work well on a rather “large scale” problem. You can try solving the Stigler Diet problem yourself either by hand as Stigler did or use the power of the simplex algorithm to find the optimum. The age of algorithms starts here (or in 1947 actually). 

Saturday, 22 November 2014

Let the data do the talking

Although the principles of scientific management from Frederick Taylor have long become obsolete, many parts of the theory are still important for organisations today. When was the last time you were involved in a project concerned with efficiency improvement, the elimination of waste or the identification of best practices?  These are just a few topics from scientific management that are still part of industrial engineering and everyday management decision making. Key for the success of these kinds of projects is to have (or obtain) an in depth understanding of the work processes that require improvement.  You can imagine that without this, changing the process might cause you to end up with a worse performance than you started with. The default way to gather information for analysing a process is by studying business process maps, interviewing people and fact finding on the shop floor. This can however be very time consuming, where the quality and accuracy of the gathered data could be questionable. Business process maps are known for their outdatedness (written to pass some ISO certification step years ago), people have different views on how processes are performed, while fact finding many times can only cover part of the processes under investigation. Not a very good start for a successful improvement project wouldn’t you say. There is however a solution and it’s called data!

Many organisations today have implemented workflow management, CRM and/or ERP systems. These systems are what you could call “process aware”. Key is their capability to log events, like a new order coming in, a request to process an invoice, the rejection of an insurance claim, the admittance of a patient, etc. These systems register very detailed information on the activities that are being performed, information that could be used to mine the data to uncover the actual work processes. Using the logged events related to the same case (e.g a new customer order) in process mining, the sequence in which they were performed is used to identify all the activities required to process the complete case. If the event log also contains information on the performer (person/resource, etc) of the activity and timestamps on when the activity took place; resource usage, duration and productivity can be measured as well.  So, the data does all the talking instead of the interviewees.

Traditionally process mining is focussed on deriving information about the actual work process, the organizational context (who performs what), and execution properties (resource usage, duration, performance, etc) from event logs. With the resource information from the event logs social networks can be extracted; this allows organizations to monitor how people, groups, or software/system components are working together. Next to the discovery of actual work processes, process mining can be used to test conformance with the to-be (or designed) work processes, enabling the work processes to be audited in a fast and objective manner. This can especially be of value in highly regulated businesses like banking or insurance, checking conformance with regulations like Basel III. A third area which process mining can be of value is by extending an existing process model with new information, for example using the information from the event log to detect the data dependencies or decision rules for a specific activity.

Process Flow gynaecological oncology patients
To illustrate how process mining works I used an example data set containing the event logs of gynaecological oncology patients of a Dutch hospital.  The data set contains the event logs of 627 individual patients. Using the open source data mining platform RapidMiner and the Process Mining package ProM from the Process Mining expertise centre at Eindhoven University I created the following process flow from the data using the ILP miner (an integer linear programming based model to extract a process flow).  Note that I did not use any a-priori knowledge about the care process of this group of patients. This all comes from the event log data. Using the same tools and data the social network can be constructed providing insight on who works with who, who delegates work to who and the intensity of these work relations. Also using visualisations like the Dotted Chart specific patterns can be detected in the way the patients are treated.

Dotted Chart 
Using process mining to discover work processes from event logs can be very powerful and less time consuming than the “old” way of interviewing, studying outdated process flow descriptions and fact finding expeditions. When you let the data do the talking, first results can be delivered quickly. Crucial is of course to have access to data.  It requires skill to extract the right data from an ERP-like system, probably a lot of data cleansing needs to be done, including the check on completeness and validity of the data. Also it’s quite easy to get swamped in data, especially when the number of log events is big and a lot of process steps are involved.  In environments like hospitals a lot of unstructured processes exist which will make it more difficult to use techniques like process mining as is, however using techniques from data mining like clustering first, satisfactory results can be achieved. Compared to traditional business Intelligence tools that only provide an aggregate view of the process inputs and outcomes, process mining dives inside the processes and provides the insights to give your next improvement project a head start.

Sunday, 26 October 2014

“That which is measured, improves”

Some people attribute the above law to Karl Pearson, a famous statistician and founder of mathematical statistics. Others attribute it to Peter Drucker, a well-known management consultant. The source of the above law is however not very relevant. The fact that most people believe it to be true is one of the fundamentals of modern decision making and a driving force behind the increasing use of analytical and optimisation methods to improve organisational performance.  Without measurement it is difficult to assess an organisation’s performance, set measurable objectives and makes decision making to achieve those goals equivalent to a leap in the dark. As in any other business, measurement is important in healthcare. Not only to keep track of efficiency and effectiveness, but to measure and improve quality as well.

Each year a large Dutch newspaper, het Algemeen Dagblad, publishes a ranking of all Dutch hospitals, the AD TOP100, to identify the best Dutch hospital. It has done so for 11 years. Each year the ranking is much debated as it is said to measure the quality of care in an arbitrary way and does nothing to support patients in deciding which hospital to go to. Because of the erratic movement of hospitals up and down the ranks, sometimes a hospital moves from a top position to the lower end of the list in a consecutive year and vice versa, it is suggested by the critics that the yardstick used by the AD is ill suited to assess quality of care. Therefore the list is dismissed as irrelevant. I think that it is too easy to dismiss the efforts of the AD journalists that way. There are useful insights to be gained form in the data they gathered.  Although not perfect, it can assist hospitals and policy makers in deciding on the improvement of healthcare quality and patients on which hospital to visit. Much has to do with how the results of the quality assessment are presented.

Measuring and comparing quality of hospitals is a tough job. Hospitals keep track of many indicators on the performance of their care processes and procedures, sometimes over a 1000. Which ones best provide insights on quality? The AD TOP100 uses only 0.3% of the available number of indicators. Not much you would say. What is important to mention however is that the indicators hospitals keep track of are not the same for each hospital, they don’t have a shared definition and most importantly they are not publicly available for the obvious reasons. The indicators the AD uses are, by law, publicly available and well defined. That makes the AD list verifiable. Moreover, the indicators are set by the Dutch Health Care Inspectorate, a government institute that promotes public health through effective enforcement of the quality of health services. To me this makes the list of criteria even less arbitrary. There are still many factors that make it difficult to objectively compare the performance of a hospital, for example a hospital serves a different patient population and has certain focus areas (for example cardiology, skin cancer, etc) which creates an unequal playing field for scoring the criteria. The AD tries to compensate for these factors, it’s however not very clear how they do that. Maybe a more sound (and analytical! approach) would be more objective, transparent and trustworthy and would reduce the debate.

"Starry Skies " of consecutive rankings
Besides the criticism on the criteria used and the scoring, much of the criticism is focussed on the rather erratic outcomes of the assessment. When reviewing the position of the hospitals over the consecutive years, the rank of a hospital can change quite a bit. Using scatter plots (they look like “starry skies”) and correlations on the consecutive ranking of hospitals the critics try to show that the outcomes are irrelevant.

And I think they are right, looking at the absolute position of a hospital in the list is not very informative to assess the quality of care it provides. When quality of care is on about the same level for all hospitals (as you might expect in the Netherlands) small changes in the overall score can lead to very different ordering. Focussing on ranks alone is therefore not very informative, as the changes in score most likely are caused by random variation, data errors, etc. 
Things change when a instead of rankings the actual scores of the hospitals are analysed, than a clearer picture arises. From the scatter plots of scores in consecutive years it is directly clear that there is a positive relationship (although still far from perfect).

Quality score in consecutive years
Also the plots show that the quality for the majority of hospitals was worse in 2011 compared to 2010 (majority of points are below the grey line), a similar conclusion holds for 2012. Things are getting better in 2013 and in 2014 quality has improved for most hospitals compared to 2013 (majority of points are above the grey line). Although the data is the same, changing the way it is presented also changes the insight it brings. By displaying the outcomes over time, better insights are provided on the development of quality over the years.  An even clearer picture arises when we create buckets for scores and count the number of hospitals in each bucket. The downward trend in 2011 and 2012 is very clear, what is interesting to know is what caused it (a change in criteria measured maybe?). The same applies to the increase in overall quality in 2014. Does what is measured indeed improve? These insights should be an incentive for both journalist and hospital to have a deeper look at the cause of this increase.

Hospital quality score development 2010-2014
How to decide which hospital is best? Given that small changes in the overall score can change the absolute position in a quite drastic way, a better way to measure “best” could be to use an average score. From the perspective of a patient this would make sense. Would you prefer to undergo surgery in hospital which ranked first this year, or a hospital that on average scored high over a period of several years? Using this as a proxy a different “winner” arises, although the 2014 number one is also in the list. The big difference is that the rank of the alternative best hospital in the 2014 list is 22, some difference! The same applies to number 4 in the list.

Best Average Hospital 2010-2014
Besides evaluating the individual quality of the Dutch Hospitals, the AD TOP100 data has much more to offer. The data gathered could be combined with data on the financial performance of hospitals, the insurers that contract them, the kind of care they offer, the demographics of the area they serve, etc. Much more insights can be gained that can be beneficial for quality and performance improvement. What about the effects of hospital mergers on quality? As an example, I looked at how the quality of care (according to the AD TOP100 data) has evolved geographically across the Netherlands in 2010 and 2014. In the maps the hospitals having a below average quality score are coloured red, the one with an above average score blue.

The results show that in 2010 the majority of hospitals in the North West, South East and South West had above average scores on quality. But in 2014 this has changed significantly, only the majority of hospitals in the South West have an above average score.  Is this an effect of the aging population in both North East and South East of the Netherlands? Are more people visiting hospitals in Germany and Belgium? Nice directions for further journalistic investigations I would say.

Hospitals with below (-) or above (+) average core per region
Measurement is crucial for informed decision making. Besides having a clear objective and being careful on what and how to measure, it is also important to think of how to analyse the data and present the outcomes in an insightful way. I think that research desk of the AD newspaper can do a better job in presenting the results of their analysis. Also by broadening the scope and including multiple years the impact if their work will increase as it will become more relevant and informative to hospitals, politics and patients. A missed opportunity is that the AD doesn’t go into details on WHY hospitals score better than others. This would however be very insightful and has news value. It will direct actions of the hospitals to improve, or patients/insures/politics can undertake action. A way to achieve it would be to extend the TOP100 scoring and work towards a benchmark that not only ranks but also supports the identification of best practices. It will push overall quality of healthcare to a higher level.

For those interested in performing or extending the analysis themselves, the data and script I used can be found on my GitHub 

Wednesday, 1 October 2014

CAP® Certified!

I just returned from taking the Certified Analytics Professional assessment which I…..(drumroll).… passed! YEAH! Nailed it! Want proof? Here is the confirmation from INFORMS:

You might think why does someone with nearly 25 years of experience in using analytics/operations research techniques to solve all kinds of business problems want to get certified? Isn’t experience enough and wouldn’t the assessment be a simple tick in the box for someone like him? Well, it isn’t. As I sat down at the computer of the Kryterion testing centre in Arnhem, I felt very uncomfortable. What if I failed?  Not only would it have dented my self-esteem, it would be a clear signal that my skills and knowledge are not up to standards.  Also I would be worried about the quality of the work I have delivered to my clients, was it really the best possible? It has been a while since I graduated from university. I have kept my knowledge up to date reading books, studying articles, visiting conferences and applying all this knowledge to real world challenges. Failing the exam would indicate that it wasn’t good enough to be top-notch.

There are several reasons for me to take the exam. First of all I was curious to know whether I would be able to pass it. Also, applying for the CAP® certification offers an independent and unbiased way to assess the quality of my capabilities. Next, it sets an example to all my younger colleagues, keeping your skills and knowledge up to data is a prerequisite for any analytics consultant, also for the experienced ones. Being an analytics (and/or #orms) professional requires you to have a working knowledge of new developments in our field of expertise and know about how to apply it in practice. Contacting a CAP® certified professional, clients can be assured that they have the best available knowledge at hand and see the difference compared to “someone handy with Excel calling himself a data scientist”.

Last but not least, as Peter Drucker states, what’s measured improves. Working towards the CAP® assessment helped improve my analytics skills. I used the CAP® study guide to get an overview of the subjects that will be tested, it includes sample test questions. The guide contains lots of resources (references to websites, books and white papers) supporting you in studying for the exam. The study guide and exam are based on an extensive job task analysis that was performed by a team consisting of people like Jack Levis (UPS), Scott Nestles (US Army), Jerry Oglesby (SAS) and  Sam Savage (Stanford).  It helped me identify the areas of expertise of the analytics profession for which my working knowledge was in shape, and which ones could use some refreshment. That way, I was able to improve my working knowledge of those areas by attending master classes, studying the occasional book and working along my colleagues with more experience in that specific area.  This approach resulted in a “pass” for the assessment. 

Drucker is known as the creator and inventor of modern management, with measuring performance as its central theme. Nowadays, measuring alone is not enough, analytics has become the modern way of management decision making. With certified analytics professionals companies can be assured that they get the best available.

By the way, I wonder what the official notice of certification will look like; will it contain the famous blue CAP® of Barry List?  

Wednesday, 30 July 2014

When data gets Big, it becomes complicated

It goes without saying that we live in a data-rich era; data is no longer a scarce resource. The number of devices connected to the internet is growing every day, increasing the growth rate of data even further. Technology providers urge organisations to invest in IT infrastructure and software to capture all that data with the potential of generating new insights, competitive advantage and increased revenues. To capture these advantages, data needs to be transformed into actionable insights and requires organisations to develop the right analytical capabilities. However, technology and analytical capabilities alone will not be enough for organisations to benefit from all that data as changes to the business strategy will also be required. Cost of data has gone down drastically because the internet gave data an IP address, making data easily accessible and allowing for its massive growth. This reduction of cost will also impact the current way of doing business. It will cause the breakup of today’s value chains into smaller pieces, especially those impacted by internet connectivity, which increases the complexity of decision making. Therefore benefits from big data will only arise if organisations both change their strategy and invest in analytical capabilities.

Growth of data is immense

With the birth of the internet the distribution of information became much cheaper. As a consequence companies whose core business activity was in gathering and distributing information, like encyclopaedias or newspapers, were highly impacted. With access to the internet people had other means than leather bound books and paper to get informed. Moreover, information became accessible on the demand, was up to date and had a much lower cost. In these early years of the internet, the Web 1.0 era, information could only travel in one direction, towards the consumer (remember downloading your first MP3?). Web 2.0 enabled bi-directional information flows. With Web 2.0 came social media (Facebook, LinkedIn, Twitter), user generated content (the fact that you are reading this blog is an example of that), crowd sourced content (like Wikipedia) and collaboration (involving customers in product developments/improvements). What Web 3.0 will be about, who knows? But data certainly will play a major part in it. The internet caused the volume of available data to grow exponentially. The amount of data with an IP address will grow to 35 Zetabytes in 2020, IDC predicts. That’s a hundredfold multiplication of what is available today, an immense growth of data that will impact companies and their business models leading to new ways of creating and capturing value.


How data impacts strategy

That digital and data with an IP address have transformed businesses today is clear. Nobody buys an encyclopaedia anymore; we just look it up on Wikipedia. But how does access to information impact a business? Suppose that you run a business. As a business owner you need to hire workers and negotiate prices for the products you sell and the resources you buy. These activities require your time and access to information on hiring rates and prices for your products and sourced raw material. Each time you require them you need to go to the market which will result in transaction cost. A way to reduce transaction cost is to increase scale of your activities and become a firm. In his essay The Nature of the Firm Ronald Coase points out that increase in scale improves access to information and negotiating position. It allows for investment in processes and technology which raises efficiency and quality possibly giving you a competitive edge. With scale however, organisational cost increase due to coordination of activities within the firm. Successfully balancing organisational cost and transaction cost is what running a business all is about. Scale is an important factor to reduce transaction costs and leads to vertical integration of value chains. For example, when Starbucks acquires a 593-acre coffee bean farm, it gets control over its sourcing transaction cost. Transaction cost therefore is one of the driving factors for business strategy. Changes in transactions cost will therefore require rethinking of business strategy.

Big data increases complexity

A basic economic law implies that prices drop when a resource becomes less scarce; with big data therefore it’s evident that the cost of data will drop which lowers the transaction cost of a firm. With decreasing transaction cost, scale becomes less important, which challenges the vertical integrated value chain as the preferred business strategy. As a result value chains will fall apart, opening up opportunities for other participants. Wikipedia caused the encyclopaedia value chain to fall apart, increasing the total number of participants manifold. Another example is that of online ad display. The first ever online ad was issued in 1994 by AT&T which took only one party, Hotwired, to publish. The below chart shows the process of online ad display as it is today. The complete process from left to right takes place in just a few milliseconds.  Each of the parties has their own specific roll in the chain and can only play that part because data is readily available and at low cost.

With the rise of big data, transaction cost has dropped, invalidating the traditional vertical integrated value chain. It is expected, like in the encyclopaedia- and online ad business, network-like value chains will be become the standard with an increasing number of participants. This increase in the number of parties in a value chain has a counter side. With the increasing number of participants, the number of decisions also rises which increases the decision making complexity. This increase in complexity can only be dealt with if the quality of decision making improves. In the online ad business for example (now a multibillion dollarbusiness ), it is required to constantly measure and adjust advertising campaigns. A high return on marketing investment can only be achieved when advanced optimisation techniques are used, requiring business not only to invest in IT infrastructure but in analytics and optimisation capabilities as well. It shows that big data alone is not enough to be successful.

With digitisation and the internet, traditional vertical integrated value chains are challenged as the cost of data has plummeted.  As a consequence organisations need to rethink their business strategy. New business models show an increased number of participants making decision making more complex and requires the use of advanced analytics to capture the value enclosed in big data.

Friday, 30 May 2014

Balancing today’s decisions against tomorrow’s conditions

As organisations move up Tom Davenports’ analytics maturity curve, they encounter new challenges in using the insights from data analysis and optimisation models.  Today, the majority of organisations use descriptive analytics to create insights on what has happened. Also the use of diagnostic analytics to understand why things have happened is becoming more common. Moving up the curve towards predictive and prescriptive analytics is more difficult and requires the development of more advanced analytical capabilities. Gartner surveys indicate that about 13% of the companies are using predictive analytics. Predictive Analytics provides these companies the capability to identify future probabilities and trends. It will also support the discovery of relations in data not readily apparent with traditional analysis. These insights can be used to for example estimate future demand, which in turn supports sourcing and production decisions. Predictive analytics enables organisations to balance the decisions of today against the conditions that they face in the (uncertain) future; it allows them to become proactive instead of reactive. Turning these insights into robust decisions is however not always as straight forward as it seems.

Let’s take the example of a company that manufactures desks, tables and chairs. The desks sell at €60, the tables at €40 and the chairs at €10.  To make the furniture the company needs to source wood and two types of labour, carpentry and finishing. Costs and resource requirements for each type of furniture, including the demand, are shown in the table below.

Given the demand, a simple linear programming model will help the company to figure out that the best decision is to produce 150 desks and 125 tables. It will require the company to source 1950 feet of wood, 487.5 labour hours of carpentry and 850 labour hours of finishing. A net profit of €4,165 will result. In fact, a simple per-item profit analysis provides the answer already as producing chairs will not generate any profit. What is important to note is that in the above approach the sourcing, production and selling decisions are made in one go. In practice this might not be realistic.

Using predictive analytics the company constructed the following scenarios for future demand for desks, tables and chairs with the accompanying probability of occurrence.

Given these scenarios the company wonders how this variability in demand will impact its sourcing and production decisions.  How to deal with the various demand scenarios? The use of predictive analytics has created more insight, but also increased the complexity of the sourcing and production decisions. To find out what is best, the company decides to perform a sensitivity analysis on demand using the LP model with the deterministic demand scenario. The analysis shows that although the number of desks and tables produced in each demand scenario differs, no chairs will be produced in any of the scenarios. Given this observation, the company decides to go for expected demand scenario, also a common way of dealing with multiple scenarios in practice. The impact of this decision becomes apparent when we look at the profit for each of the demand scenarios based on this decision. Expected profit for sure is not what was expected! In the low demand scenario there is a significant loss instead of a small profit, in the most likely scenario there is a slightly lower profit while in the high demand scenario the upward potential doesn’t materialize. So on average the company will be worse off than expected (wheredid we here that before?). The sensitivity analysis on demand didn’t provide any clue that this could happen, it is therefore flawed. Stein Wallace indicates that key to better deal with uncertainty in this case is to have a more thoughtful approach to creating the math model.

Key in developing a better model is to understand when decisions are made and how they are impacted by the uncertainty in demand. There are three possible situations.
  1.  Demand is known before the sourcing and production decision
  2.  Demand is known after the sourcing and production decision
  3.  Demand is known after the sourcing decision but before the production decision.

If demand is known before we need to decide what to produce and source, there is no uncertainty on demand and therefore the first model will provide the optimal production and sourcing decisions for every demand scenario (as shown in the above table). When we need to decide both sourcing and production before we know demand, these decisions must be weighed against all demand scenarios. The production plan and corresponding sourcing decisions in this case will be set trading off the sunk cost of producing furniture that can’t be sold with the upward potential in revenue from the high valued demand scenario.  The best decision in this case is to source and produce 50 desks and 110 tables.

An interesting situation arises when we need to source before we know demand but can adapt the production decisions after demand is known. So if there is a change in demand, the resources can be used to produce furniture for which there is demand. The optimal solution to the model clearly shows this. Compared to the second situation in which demand is known after the sourcing and production decision the model in this case advices to acquire more resources. Also in the low demand scenario it suggests to switch to the production of chairs, which generates additional revenue. It’s a fall-back scenario which justifies the more aggressive sourcing decision.  

In practice the input of mathematical models is assumed to be accurate and deterministic. If accuracy of the data is a worry the conventional wisdom is to perform a sensitivity analysis. With the rise of predictive analytics more and more companies will start using the results of their predictive models in their decision models. As predictions are in their nature uncertain, many of these companies will turn to sensitivity analysis to analyse the impact of the uncertainty on their decisions. Most commercial solvers offer this as a standard feature, which is most convenient. However the above example shows that sensitivity analysis can be seriously flawed. Careful analysis of how uncertainty influences decisions will lead to models that better incorporate uncertainty and therefore will result in better quality decisions. This requires companies not only to invest in predictive analytics tools but in modelling skills as well.

This blog is inspired by an article of Stein Wallace on sensitivity analysis in linear programming which was published in Interfaces. If you want to experiment yourself a download of an Excel workbook is available.

Sunday, 30 March 2014

The future of logistics lies in Analytics

In a world in which everything and everyone is connected, in which the amount of data generated is growing faster every day and in which lifecycles of products become shorter and shorter, the need to be able to make smarter decisions is rising fast. Maybe in logistics this need is felt the most as logistics is vital in every supply chain. It requires the logistics manager to be well equipped for making though decisions. It’s my strong belief that access to data analysis, predictive and prescriptive analytics will become crucial for every manager, for the logistics manager in particular. For managers to be able to reap the benefits of analytics they need to invest in analytical capabilities and analytical software. Tools and capabilities however are not enough to turn data in to actionable insights, decision making processes need to become data driven and analytical as well, which requires the corporate cultural to change.

Many decision makers agree that the ever growing mountains of data contain huge potential value. A recent study on supply chain trends by BVL International shows that 60% of the respondents have plans to invest in data analytics in the next 5 years. From my own experience I can tell that analysing data, big or small, always adds value. In the past years I have worked with a lot of logistics companies. Each time the analysis of data guided the path to, sometimes very significant, improvements. Data analytics helps to understand the current situation, to identify bottlenecks and find ways to bypass them. Data analytics supports operational decision making in last mile distribution, but also in tactical and strategic decision making. I can’t imagine determining the best frequency of delivery, find the best location and size of distribution centres or determine an integration plan of networks without the use of data and analytics. Zooming in on the last mile distribution, it’s one of the most expensive parts of a logistics chain; it for example accounts for about 35% of the operational cost for a parcel delivery company. If it can be done smarter, this will directly improve margin. UPS estimates that when they save one mile per driver per day, that would save them $50 million a year (see interview with Jack Levis).  That’s why UPS invests heavily in gathering and analysing data to continuously improve last mile delivery operations. Their famous business rule to take as much right turns as possible is a result of the analysis of waiting and driving times of thousands of routes driven by their delivery vans.

Advanced analytics can combine various data sources, data from last year, last week, up to real time information to forecast what is going to happen next. For example, the expected amount of freight that will arrive tonight or in the next few days in a distribution centre. With this information it can be analysed upfront whether the expected volumes can be managed in the network or that bottlenecks will occur. In the latter case preventive actions can be undertaken, like redirecting the freight if there is insufficient transportation capacity preventing the load to be left behind. The forecasted volume can also be used the estimate the required workforce to make sure the right people are available at the right time in the distribution centre to handle the arriving freight. By integrating real time traffic information with the location and availability of customers a more efficient last mile distribution can be achieved, reducing the number of negative stops. These are just a few examples that show the value of analytics in logistics and how it will support cost reductions and enhance customer service. Funny thing is that the data generated in the execution of the logistics processes can also create value, as it could be sold to the local government to analyse distribution flows in the city or to a market research company. This secondary use of data is something I think the logistics sector could do more with.

Data analysis is not new to logistics companies. Many of them analyse data to know the utilisation of their assets, the volumes transported and delivered on time, the miles used to achieve that including the associated cost. Many of them use dashboards to keep track of their performance indicators, but to me it’s like looking in the rearview mirror while trying to move forward. Of course analysing past performance is required, nut it is not enough. Logistics companies should analyse data to provide forward looking insights using forecasts on volume, fuel price, available manpower and assets. This will provide insights on what the performance indicators are going to look like. With these insights and analytics tools companies can anticipate and optimise their operations, making better decisions. Researchfrom Andrew McAfee and Erik Brynjolfsson of MIT shows that companies that use data and advanced analytics have a 5%-6% higher productivity and profitability compared to organisations that do not. The reported level of improvements is also what I experience in the projects I have done; sometimes even higher improvement levels can be achieved, certainly when the decision is on a tactical or strategic level.

To be able to reap the benefits from analytics logistics companies need to change their decision making processes which are mostly local oriented, intuitive, and have grown from habit. This needs to change into a supply chain wide decision making process and needs to become data driven. Senior management must fully support this change but will only do that if applying analytics results in performance improvements in a repetitive and consistent manner. Making the move towards a data driven and analytical way of decision making takes time. It’s my experience that the best results are achieved not by a big bang approach but by gradually increasing the complexity of the analytics projects. This will consistently improve decision making quality, achieving better results every time. The pace at which to grow in analytical maturity is depended on the rate at which a company can or wants to grow. This is exactly the route TNT Express took, delivering them multi million cost savings. One of the key enablers for TNT Express, which I expect will also be the case for other logistics companies, is to invest in analytics capabilities of the company. Not to make a mathematician of every employee, but to train them to recognise optimisation opportunities and learn how to apply analytical methods.

To improve on their decision making, logistics companies need to become more analytical. The environment in which they operate requires them to do so. There is much to be gained from the data that is available, but current practice is that it is used to create a backward looking view on the company’s performance. The real value that lies enclosed in data will become available if logistics companies start to use forward looking analytical techniques (predictive analytics) providing them the insights to anticipate and optimise their decisions (prescriptive analytics). Prerequisites for success are improving the analytical capabilities of the company and even more important grow a corporate culture that stimulates continuous improvement, measuring performance and evaluating decisions with quantitative evidence.

This blog entry is a summary of the lecture I held at the Election of the Logistics Manager of the Year March 27th 2014

Friday, 28 February 2014

Intuition vs Analysis

Last week the acquisition of WhatsApp by Facebook was breaking news, a typical Mergers and Acquisitions decision. These types of decisions are one of a kind, need to be made in secrecy and at high speed as they can have huge impact on shareholder value. Many executives would argue that in these situations their intuition and experience is what makes the difference.  In their opinion, analysis would take too much time, moreover mathematical models can’t take into account the forces that either make or break the success of such a decision. They believe that intuition is indispensable when making business decisions. Just listen to Sir Richard Branson, Steve Jobs or Jack Welch and the compelling stories of their success in decision making. The counter side is that executive decisions are also subject to overconfidence. An executive might have a very strong feeling that a takeover, product or service will be successful, without considering the probability that a rival is already ahead in undertaken a similar action. Research from Daniel Kahneman shows that the amount of success it takes for us to become overconfident isn’t terribly large. Some executives therefore have achieved a reputation for great successes when in fact all they have done is take chances that reasonable people wouldn’t take.

The above quote is attributed to Albert Einstein. Whether he really has said it isn’t relevant, it strikingly describes the current shift towards more analytical decision making, which to my opinion is a good thing. We are surrounded by Apps, devices and sensors that gather data which is analysed to provide us with suggestions or offers that we are most likely to accept. It's a trend that is also entering the C-suite. However, the growing popularity of Big Data and “technically sophisticated, computationally intensive statistical approaches” has an unfortunate side effect: a “shut up and calculate the numbers” ethos, rather than one that promotes critical thinking and stimulates ideas about what the numbers actually mean. Question is where should the balance lie between intuition and analysis?

What is the best decision strategy, trusting your intuition of performing the analysis, is part of an ongoing debate. Two prominent actors in that discussion are Nobel Prize winner Daniel Kahneman and Gary Klein. Kahneman recently published a book Thinking Fast and Slow in which he analyses decision making and explains why we as humans are not very good at it, especially in situations with a high level of uncertainty. So maybe Intuition is not a very good basis for decision making. Gary Klein, writer of ThePower of Intuition, is a strong proponent on using your intuition in decision making.  He indicates that we need to take our gut feeling as an important data point, but conscious and deliberate evaluation is required to see if it makes sense in the context of your decision.  This suggests that intuition and analysis should go hand in hand.

The human brain, even the brain of executives, cannot oversee the vast amount of alternatives that usually need to be evaluated when making a complex decision. In my experience, with common sense and intuition a solution close to the best possible one (say close to optimal) can be achieved. But it would leave money on the table and might not take into account all relevant limitations, impacting the final result. In situations where margins are small, it could make the difference between a profit and a loss. When taking a more analytical approach, using techniques from Operation Research, new and sometimes counterintuitive solutions will be found that bring in the remaining value, taking into account all relevant limitations. There are many examples in which the use of Operations Research resulted in vast improvements that were not even considered possible.

One great example of a counterintuitive and successful solution from using Operations Research comes from Patrick Blackett, one of the founders of the Operations Research field of expertise. His analysis of the loss of ships crossing the Atlantic due to attacks by German submarines resulted in a breakthrough. The British and US Navy were convinced (by intuition) that single ships had a lower change of being discovered and attacked by German submarines than convoys. However Blackett showed that the use of convoys would improve the survival rate of ships crossing the Atlantic dramatically. It took some effort to convince the Navy, but when his advice was followed through the loss of ships dropped drastically.

What does it take to create a good math optimisation model or perform the right analysis you might ask? I would say intuition and experience. I think the process of finding the right approach or model is similar to the way in which experienced firefighters take split second decisions as Gary Klein mentions in Intuition at Work. He finds that firefighters are able to do a rapid and unconscious situation assessment and recognition from an array of stored templates followed by the taking of appropriate action when a fit is found. In modelling it is the same thing. After selecting a promising approach, validation and calibration of the model takes place. Sometimes this leads to a rejection of the model and a new way must be identified. Otherwise the model is used to do that analysis.

So, there is a balance in intuition and analysis even in creating math models. Therefore there is no analysis versus intuition but analysis and intuition. Use analysis to verify your intuition and apply intuition to find models. Even when the time frame is thigh, analysis can be used to at least verify intuition and bring new directions to consider or think about. Balancing intuition and analysis will boost the quality of decision making and train your intuitive mind.

Thursday, 9 January 2014

The analytical crime watch; why your 13-year-old shouldn’t have a high-end mobile phone

As an inhabitant of the metropolitan area of Rotterdam I am always curious to know how the city is doing, especially with respect to community safety. In Rotterdam community safety has improved in the past 12 years. Among other things, the number of incidents has gone down significantly and the resolution rates for high impact crime (Burglary, Street robbery and Robbery) have never been higher. Reasons for this decline are better deployment of the police force, the introduction of city guards and city marines (officials, appointed to improve tough safety problems in a certain part of the city), and the engagement of local people.  A few weeks ago mayor Aboutaleb of Rotterdam presented a plan to further improve community safety in the city, #Veilig010. By investing €108 million in the next 4 years the plan is aimed to further lower the number of high impact crimes and improve the sense of security in Rotterdam.  

Being a data addict, in favour of fact based decision making, I expected the #Veilig010 plan to be filled with crime statistics and figures supporting the objectives and crime fighting measures mentioned in the plan. However, that expectation proved to be wrong. When spending €108 million I expected a more rigorous approach to analysing and fighting high impact crime. To find out more about the current state of community safety I decided to do some research, using the data from the Rotterdam Open Data site, and some investigative analytics with the objective to create a clear view on high impact crime in Rotterdam.  In this blog I’ve put on my data journalist/scientist hat, to share the insights I gained from the analysis.


Quetelet, a Belgian statistician who is also responsible for developing the BMI index, already noted in the early 1800’s that crime rates have a pattern. He indicates that “the seasons in their course exercise a very marked influence: thus during summer the greatest number of crimes against persons are committed.” What is interesting is that street robbery in Rotterdam has a different pattern. It peaks in Q1 and Q4 and is low in summer. Note that the number of reported incidents in Q4 of 2012 shows a sharp decline. I’m not sure whether this is a true fact or is due to missing data. A statistical test proved the pattern of street robberies to be different from a random series or the even distribution of incidents over the year. So it’s safe to state that the seasonality in the pattern is real.

Besides a seasonal pattern, street robberies also show a distinct pattern over the days of the week and during the time of day. The majority of the street robberies happen after 12:00 with the highest number of incidents between 18:00 and 23:49. The table shows the distribution for 2012, but is similar for 2011. Of all weekdays, Friday seems to be the favourite day for street robbers, making Friday between 18:00 and 23:59 the most dangerous moment to be out on the streets of Rotterdam. In analysing the distribution of high impact crimes over the weekdays for both 2011 and 2012 a (statistical significant) shift towards more offences during weekends in 2012 was found. Police in the streets at the moment at which high impact crime is most likely to occur will reduce the amount of incidents or at least increase the probability of catching the offenders. Knowing the distribution of high impact crime over the year, during the week and day therefore enables more effective deployment of the available police forces.


Next I analysed the age and gender of the victims of street robbery and found that the number of incidents decreased as the age of the victim increased. I expected the elderly to be the prime victims (easy target), but the data show a different picture. Surprisingly, 32% of the victims are between 12 and 17 years old. Also, 60% of all the victims are male (all 2012 figures). The distribution of the number of victims per age category differs (statistical significant) for males and females. Drilling down on the time of day and day of the week, most of the female victims are robbed between 18:00-23:59 on a Friday. For men multiple peaks in the number of street robberies are found, including high incident periods on Friday between 18:00-23:59 and on Saturday and Sunday between 00:00 and 5:59. The above results are useful in informing the right people on the potential danger of street robbery and what you can do to prevent it. It can also support counter measures to reduce the number of street robberies, like presence of police or city guards at the right time in areas where high risk age categories go, for example nightlife areas or schools.


After finding out the when and who, I analysed what was robbed. From the above bar chart it is directly clear that a mobile phone, a bag and a wallet are the top 3 items. In 2011, in about 16% of the incidents nothing was stolen. That figure dramatically decreased to about 5% in 2012.  My guess (or hope?) is that in 2011 the registration of stolen items was not sound enough; otherwise the success rate of the robbers has gone up. That would be something to worry about. When Comparing 2011 and 2012 a (statistically) significant shift towards mobile phones is found. Looking at the breakdown of brands the usual suspects come forward, Blackberry, Samsung and IPhone. In 2011, a Blackberry was stolen in nearly 60% of the cases, at that time a popular brand. In 2012 this shifted towards IPhone and Samsung. Given that 32% of the victims are between 12 and 17 years of age, not giving in to the wish of your kids to have a high-end mobile like an iPhone or Samsung might be a good prevention measure. Better settle for an Acer, the least stolen brand in this case.


Now that we know the when, who and what, the next question that pops up is where are these street robberies occurring?  The map shows all the places were street robberies have taken place. Based on this map it’s very difficult to deduct any useful insights. Some exploratory spatial data analysis of the point pattern will bring the insights we are looking for. By clustering the point pattern by postal code, the postal code areas with a high number of street robberies can be identified. The straightforward count of the number of street robberies per postal code area shows that the more dense populated areas have more street robberies, as is to be expected. But are these areas really the risky places?

Using postal code areas as a grid to cluster the point pattern allows for combining the street robbery data with other demographics, like number of inhabitants of the area, type of the area or income distribution as this information usually is available on a postal code level.  A better measure to identify risky places (hot spots) is to link the number of robberies to the number of inhabitants of the area, than a totally different picture arises. The less inhabited and more remote parts of the city come forward. From this simple but effective spatial analysis police can learn where most of the street robberies are taken place but also where the risky places are. This can support the deployment of police forces in the city to further reduce the number of street robberies and increase the sense of security in the city.

There are many more insights to be gained from the available data, for example by taking the surroundings of the place of the robbery into account. The above approach and visualisation of the data can help policy makers understand high impact crime better and identify the factors of interest in finding out the who, when, where, how and what questions, maybe even why. By visualizing the data, answers to questions come forward you didn’t know you had. Based on these insights fact based counter measures can be developed, resulting in a more rigorous approach to fighting high impact crime, the approach for #veilig010 next time?

Note : Data analysis, statistical testing and visualisations were all done in R