tag:blogger.com,1999:blog-15371192317400221822024-03-07T10:10:08.497+01:00OR at WorkThis blog is about Operations Research applications in practice. I would like to share my experience and ideas with other practitioners in this field and invite them to react.@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.comBlogger105125tag:blogger.com,1999:blog-1537119231740022182.post-72065449467690128922019-09-13T20:52:00.002+02:002019-09-13T20:53:21.758+02:00Pitfalls of Algorithmic Decision Making<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0vYz9blsTIqIHRnmXqGJm_h_LlqqN8JYi4itGOfBT-gzo3px839YUcH3dGqa4Vn1qCAlftC2_tb2yG_wFCfaxhUBME2zvOcZNX4v06gJCX7TSE1m_SV1j-RBM9RJ3pvHPYj8fCp1eaRo4/s1600/action-air-shooting-aircraft-319968.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="1065" data-original-width="1600" height="266" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0vYz9blsTIqIHRnmXqGJm_h_LlqqN8JYi4itGOfBT-gzo3px839YUcH3dGqa4Vn1qCAlftC2_tb2yG_wFCfaxhUBME2zvOcZNX4v06gJCX7TSE1m_SV1j-RBM9RJ3pvHPYj8fCp1eaRo4/s400/action-air-shooting-aircraft-319968.jpg" width="400" /></a></div>
<div class="MsoNormal">
<span lang="EN-GB">While
browsing through my Feedly timeline during the weekend my attention was drown by
</span><a href="https://blogs.gartner.com/jitendra-subramanyam/algorithmic-decisions/"><span lang="EN-GB">a blog</span></a><span lang="EN-GB"> written by Jitendra Subramanyam of Gartner.
The title, <i>Pitfalls of Algorithmic
Decisions and how to handle them</i>, made me curious. As I expected, the
algorithms addressed in the article were machine learning algorithms. Subramanyam
describes how they are used to automate decisions such as a medical diagnosis,
a welfare eligibility assessment or a recruitment decision. The blog indicates that
people’s confidence in algorithmic decisions is declining due to incidents of
biases and perpetuation of discrimination by these algorithms. They are
“weapons of math destruction” as </span><a href="https://mathbabe.org/"><span lang="EN-GB">Cathy O’Neil</span></a><span lang="EN-GB"> describes in her book. As a result
more stringent regulations are proposed by policy makers to protect those
affected by these decisions. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">You can
question if the algorithms Subramanyam mentions really make decisions. They
typically make simple judgements, comparing a model outcome with some
predefined threshold. The algorithms are the result of applying machine
learning to data to create a prediction model for the phenomenon of interest. Usually
a supervised learning algorithm is used to create the model. With the resulting
prediction model, a score is calculated using the data of, for example, a job applicant.
If the score is beyond some threshold the applicant is accepted for the job. To
me this sounds more like an automation of judgement instead of decision making,
and a very bad way of doing it. All depends on the threshold in this “decision”
algorithm, does this do the decision right? It’s a much to simple approach to
decision making, especially when the decision is impactful. Can all relevant
information be captured in this single threshold value? How can we trust the data
used to determine the threshold and verify if it is still accurate and relevant
for, in this case, the applicant being evaluated? Next to that, how certain are
we about the predicted score? Usually machine learning algorithms don’t provide
error bars, making it difficult to verify the quality of the predicted value.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">All kinds
of comment can be made on the way machine learning models are created and used
in automated judgements and much can be done about it to improve it. Even if
these models improve, we are still far away from practical application of these
algorithms in decision making. Reason is that these algorithms only focus on a
single decision, while in practice decisions are usually linked to each other.
Let me give an example. Suppose you run a chain of retail stores and want to use
algorithms to automatically replenish them. An algorithm is used to predict a stock
out for a product in the stores using demand forecasts and actual sales. As
soon as the algorithm detects a potential stock out, an order is issued to the
distribution centre to replenish the store with the product. So far so good,
however unless transportation is nearly free and unlimited (drones maybe?) this
way of using algorithms to automate your store replenishment is not that smart.
As transportation costs are incurred, the decision to replenish should take
into account what other products in the same store should be replenished. Is the
total amount of products enough to issue a whole truck? If not, what other
stores could the truck attend? In that case the replenishment decision should also
incorporate the products to be replenished for the other stores. Next to what
to replenish, there is the decision on how much to replenish. This could for
example depend on the expected demand for the product in each store, the
storage capacity in the store and the revenue it could generate. If a higher
margin substitute product is available in the store, you might be even better
off by postponing the replenishment of the product as people will buy the more
expensive substitute product. As this example shows, in practice a one
dimensional trade-off is far too simple.
<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">The replenishment
decision, as many practical decisions, is complex and cannot be made in
isolation for a single product. It is depended on other decisions that need to be
considered integrally to make the best possible overall decision. Machine
Learning and AI don’t have the capability to handle this kind of
interdependence and complexity in decision making. Decision analytics (aka
Prescriptive analytics or Operations Research) however is specifically equipped
to model the interdependence between the individual decisions and can leverage
the insights from the prediction models to find the guaranteed best possible
decision to make. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<span lang="EN-GB" style="font-family: "calibri" , sans-serif; font-size: 11.0pt; line-height: 107%;">Want to know more about decision analytics?
Check out </span><span style="font-family: "calibri" , sans-serif; font-size: 11.0pt; line-height: 107%;"><a href="https://www.youtube.com/watch?v=0oMVVx81kCs"><span lang="EN-GB">this video</span></a></span><span lang="EN-GB" style="font-family: "calibri" , sans-serif; font-size: 11.0pt; line-height: 107%;">, see </span><span style="font-family: "calibri" , sans-serif; font-size: 11.0pt; line-height: 107%;"><a href="https://www.linkedin.com/in/johnpoppelaars/detail/recent-activity/posts/"><span lang="EN-GB">my blog</span></a></span><span lang="EN-GB" style="font-family: "calibri" , sans-serif; font-size: 11.0pt; line-height: 107%;"> or just get in touch. </span>@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-44023291256183995512019-08-25T17:38:00.001+02:002019-08-25T17:38:12.493+02:00Starting Up<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFvuHulB6i1q7aWpZ0t6IUNlDfgbuldFdXHcNrlVIro2g2vSAH6J1JUXOfjxaN00eanepF0bkaLhYjLPaYUleYB7oEqQqw-h9AIUVY3Ah9SMKVxFU9IowJv2JYLBLZ3GIRs4En_ij3kHB-/s1600/agenda-concept-development-7376.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1067" data-original-width="1600" height="426" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFvuHulB6i1q7aWpZ0t6IUNlDfgbuldFdXHcNrlVIro2g2vSAH6J1JUXOfjxaN00eanepF0bkaLhYjLPaYUleYB7oEqQqw-h9AIUVY3Ah9SMKVxFU9IowJv2JYLBLZ3GIRs4En_ij3kHB-/s640/agenda-concept-development-7376.jpg" width="640" /></a></div>
<div class="MsoNormal">
<span lang="EN-US" style="mso-ansi-language: EN-US;">I have been
working as an analytics consultant for nearly 30 years, have worked in many
different industries and for organizations with varying analytics maturity. My
key focus always is to improve decision making, using data analysis or simple
heuristics if possible. Using more advanced techniques if the complexity of the
decision requires so. It is my experience that a key success factor in improving
the decision making at an organization is to embed the model that has been
developed in a decision support system to support the decision makers. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-US" style="mso-ansi-language: EN-US;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-US" style="mso-ansi-language: EN-US;">At the
beginning of my career the tools that I had available to create these decision
support systems were very limited, nothing more than a programming language
like FORTRAN or C and graphical libraries that allowed me to create character
based screens. Luckily the tools have evolved and become better. But still the
current state is that most of these tools require quite some training or only
support part of the analytics process. Some support the data extraction and
preparation, others focus on the math modeling and the link to the mathematical
programming solver or on the development of user interfaces. What is important
to note is that in parallel to the improvements of the tools for the analytics consultant,
the requirements from customers have also evolved. They have become more
demanding, asking for nicer user interfaces, interactive modeling, explainability
and interpretability of the model outcomes. The requirements also have become
more complex, especially with respect to IT, requiring the analytics consultant
to spent more time on IT related topics.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-US" style="mso-ansi-language: EN-US;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-US" style="mso-ansi-language: EN-US;">Lack of
good tools and the increasing IT demands cause that a lot of the work of a analytics
consultant involves coding. Coding to extract data from source systems and
transform it into model inputs. Coding to visualize data and model outcomes.
Coding to transform the mathematical model into a format the solver can read
and coding to retrieve the solver output and put it into a user readable form. And,
finally, coding to create a decision support system capturing all the precious
steps and support the decision making process. Coding can be fun, its biggest
advantage is that it’s flexible, you can create whatever you like. I liked
coding, but also experienced its downsides. First of all you need to be able to
code. Not every good analytics consultant is also a good programmer. And there
is the choice of what language to use to create the code, will it be python, C++
or maybe Julia? Compatibility issues may arise. Coding can be error prone
causing you to do extensive debugging. Trends in IT go fast, in order to
fulfill customer demand you need to keep up with the latest trends, like web
based development and parallel computing. I don’t know about you, but to me all
this distracts me for what I really want to do, and that is to support my
clients in solving their business issues. All this coding results in long
delivery times. From my own experience, I estimate that about 70% of the time
spent on a project goes into these activities. It causes decision makers to shy
away from analytics as they experience it as complex and too slow for solving
their business issue. Is there no solution to that?<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-US" style="mso-ansi-language: EN-US;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-US" style="mso-ansi-language: EN-US;">I wouldn’t
be starting my own analytics company if I thought there wasn’t. Last year I
have come across an innovative cloud based analytics platform, that fulfills
most of the needs of an analytics consultant. It is a platform that supports me
in all the steps I usually take in a consulting assignment, starting from discussing
the problem with the client, data ingestion, data visualization, model building
& validation, all the way through to delivering a decision support app. One
of its key advantages is that it requires no coding, all modeling is done via a
visual editor, which also does consistency checks of the models and procedures
created. In working with the software I have experienced a significant speed up
in the projects I do. Up to 10 times faster than coding. With the software I
can focus on the problem to be solved, and not be distracted by IT issues.
Moreover I can easily involve my client in the modeling process which speeds up
the problem framing and modeling </span><span lang="EN-US" style="mso-ansi-language: EN-US; mso-bidi-font-family: Calibri; mso-bidi-theme-font: minor-latin;">phase. The software
supports agile model development, intermediate versions of the model can be
shared with little effort, which leads to fast feedback, high client
involvement and simplifies implementation of the new way of decision making.
It’s modeling at the speed of light. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-US" style="mso-ansi-language: EN-US; mso-bidi-font-family: Calibri; mso-bidi-theme-font: minor-latin;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-US" style="mso-ansi-language: EN-US; mso-bidi-font-family: Calibri; mso-bidi-theme-font: minor-latin;">I’m so convinced of
the added value of the software that I decided to build a company around it. I’ll
start my company September 1<sup>st</sup>, s<span style="background: white;">tay
tuned to learn more about the company and the analytics platform.<o:p></o:p></span></span></div>
<br />@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-72833335638472428562019-01-06T20:02:00.000+01:002019-01-07T08:34:30.301+01:00Artificial Intelligence, a critical assessment<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLbI-k1F-EmO6ot8_Sr7Pbyhr_QV03lKO35x6OvsmsGnHPBNEC19PbG6XSicDwjmO-Y906UO82D63t6vMg5CITGp6rOr8UrBEM30uBNWDkrmHww-UPtlHuSl_HFLO_6Fnu8DOjJw7NtqVr/s1600/computers_vs_humans+1875.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="282" data-original-width="578" height="312" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLbI-k1F-EmO6ot8_Sr7Pbyhr_QV03lKO35x6OvsmsGnHPBNEC19PbG6XSicDwjmO-Y906UO82D63t6vMg5CITGp6rOr8UrBEM30uBNWDkrmHww-UPtlHuSl_HFLO_6Fnu8DOjJw7NtqVr/s640/computers_vs_humans+1875.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">https://xkcd.com/1875/</td></tr>
</tbody></table>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">The current
buzz about Artificial Intelligence (AI) has many executives thinking about
whether they should adopt it. The popular press is filled with articles on the
potential value of AI, illustrating it with examples like self-driving cars, digital
assistants, <a href="https://www.forbes.com/sites/bernardmarr/2018/07/06/how-chinese-internet-giant-baidu-uses-artificial-intelligence-and-machine-learning">smart door locks</a>, human level medical diagnosis, and more. The
promise of AI is indeed big. <a href="https://www.gartner.com/newsroom/id/3872933">Gartner</a> predicted global business value for AI to reach
$1.2 trillion in 2018 rising to $4 trillion by 2022. On the other hand, there are
also concerns about AI. <a href="https://www.bbc.com/news/technology-30290540">Scientists and entrepreneurs</a> question whether we would
be able to control AI when it becomes more advanced. Also, failures like the
accident with <a href="https://www.nytimes.com/interactive/2018/03/20/us/self-driving-uber-pedestrian-killed.html">Uber’s self-driving car</a> temper the high expectations for AI. Given
all this noise about AI, both positive and negative, it is unclear what is fact
and what is fiction, making it difficult for organisations to decide their next
step. Competition is tough, being (too) late in adopting new technology could result
in lower competitiveness, loss of customers or in the long run going out of
business. On the other hand adopting the wrong or an immature technology could
lead to disappointments, write offs and similar consequences as being too late.
This blog reflects my view on the current state AI, its promises and
shortcomings, hopefully adding to a more informed discussion of what AI can and
can’t bring us.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<h3>
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">A note from
AI history</span></h3>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">It’s not
the first time that AI is top of mind of executives. In the late 1980’s there
was a similar buzz around AI as today. At that time, <a href="https://en.wikipedia.org/wiki/Expert_system">expert systems</a> (a
subdomain of AI) were positioned as the solution to (m)any business problem(s).
These systems attempted to solve complex problems by reasoning through bodies
of knowledge, represented mainly as <i style="mso-bidi-font-style: normal;">if–then</i>
rules rather than through conventional computer code. It is estimated that, at
that time, about two thirds of the Fortune 1000 companies applied them in daily
business activities. The interest in applying expert systems in business processes
however was short-lived. In the early 1990’s a lot of companies abandoned the expert
systems because they failed to deliver on the (overhyped?) promises or even
made things worse. To illustrate, <a href="https://www.amazon.com/Expert-Systems-Six-Set-Technology/dp/0124438806">expert systems were named as one of the rootcauses</a> of the October 19<sup>th</sup> 1987 stock market crash. Due to a flawed
design of stock trading expert systems and the lack of proper monitoring
measures, stock traders watched in helpless shock as the “bottom dropped out of
the stock market”. A fair question to ask therefore is, given the current buzz
on AI, whether the current promises of AI are firmer compared to the 1980’s and
why.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<h3>
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">Why is AI
successful today?</span></h3>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">AI has many
subdomains, of which expert systems is only one. The recent rise of interest the
is AI is mainly due to successes in another subdomain, Machine Learning, which
is subdivided in Supervised, Unsupervised and Reinforcement learning. In just a
couple of years significant progress has been made providing us with practical AI
applications such as image recognition, face recognition, speech recognition and
machine translation. Recently DeepMind’s AlphaGo algorithm accomplishing what
was expected to take <a href="https://www.businessinsider.com/ai-experts-were-way-off-on-when-a-computer-could-win-go-2016-3">at least a couple of decades longer</a>, beating the world
champion at GO. At the core of all these recent achievements lies a specific supervised
learning technique, artificial neural nets. These neural nets mimic the
structure/working of our brain and are specifically good at pattern detection
and predictions. Research on neural nets goes back to the 1940’s where first
ideas on computational learning were developed. In plain English, neural nets are
nothing more than an estimation technique to create a mapping between numerical
inputs and outputs. Depending on the network size, they may contain millions/billions
of parameters that need to be estimated using labelled data (i.e pictures with
a caption to train the network to recognise objects in the picture, or sound
bites and written text to train it to do speech to text transformations). To
estimate the parameters of the neural net an algorithm is used, called the
backpropagation algorithm. This algorithm has its roots in optimal control theory
from the 1960’s. It was rediscovered by Geoffrey Hinton in the late 1980’s,
when he successfully applied it to estimate the parameters of neural nets. At
that time only small neural nets could be estimated due to the limited computing
power of computers. Today, computing power has become strong enough (GPU computing),
also data is no longer a scarce resource which allows for much larger, multi-layered
neural nets (aka deep neural nets or deep learning) to be created and estimated.
These deep neural nets are instrumental for creating self-driving cars, learn
computers to play video games or defeat the GO world champion. Deep neural nets
are used in these applications to predict the future based on which an
algorithm can determine the best next step. As an example, a deep neural net
could be used to predict the trajectory of a pedestrian based on which the algorithm
of a self-driving car can determine whether to stop the car or continue in the
current direction. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">In summary,
the techniques enabling today's success of AI date back to the late 1980’s. No
major breakthroughs in theory have taken place nor were necessary to make
today’s successes possible. Present day success of AI is mainly driven by the
increase in computing power and an abundance of (labelled) data. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<h3>
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">What are
the weaknesses of AI?</span></h3>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">Given the
results that have been achieved with AI (or better put Machine Learning) thus
far, there is good reason to be optimistic to what it can do for improving
business decision making now and in the future. Many applications are already
around us, and it takes little imagination to envision what could be made
possible with AI next. I do think that attention must be given to the
limitations of AI or to situations in which it might fail to work. Recently
Gary Marcus, Professor of Psychology and Neural Science at NYU, wrote several
essays pinpointing the shortcomings of present day deep learning. Other researchers
have done so as well, among them <a href="https://youtu.be/KsZI5oXBC0k">Stuart Russell</a>, <a href="https://medium.com/@mijordan3/artificial-intelligence-the-revolution-hasnt-happened-yet-5e1d5812e1e7">Michael Jordan</a>, <a href="https://www.technologyreview.com/s/609048/the-seven-deadly-sins-of-ai-predictions/">Rodney Brooks</a>,
<a href="https://youtu.be/l-mYLq6eZPY">Pieter Abbeel</a> and in the Netherlands <a href="https://youtu.be/1rDt_sglrto">Eric Postma</a>. Although prof. Marcus received
some criticism on his critical appraisal of deep learning, many leading researchers
in AI comply with the points he raises. I recommend reading <a href="https://arxiv.org/abs/1801.00631">the essay</a> in full,
it’s not too technical. Below, I’ve summarised five of his major points of
critique on the current capabilities of deep learning:</span></div>
<div class="MsoNormal">
</div>
<ol>
<li>Deep learning is data hungry. Deep
learning’s relies heavily on large numbers of labelled examples to be able to
reliably detect patterns in data. This implies that in situations where only
limited data is available, deep learning will not be the ideal solution and
other analytics techniques should be used. To illustrate, we humans can very
easily detect patterns with only a limited number of examples. What for example
would be your answer to f(5) if f(4) = 8 and f(6) = 12? Presented with this type
of challenge, neural nets would be flummoxed.</li>
<li>Deep learning is shallow. The deep
in deep learning refers to the number of layers in the neural net, not to its understanding
of the phenomenon its trained to learn. Take for example DeepMind’s Atari game
work which uses deep learning to train a computer to play Atari games. The
results are breath-taking (see <a href="https://youtu.be/TmPfTpjtdgg">Breakout video</a>) Many interpret the results as if
the algorithm has “realized” that digging a tunnel through the wall is the most
effective technique to beat the game. If that would be the case, you would
expect that the trained algorithm would reside to that technique when it is
given a slightly different challenge. For example by added a wall in the middle
of the game. <a href="https://arxiv.org/abs/1706.04317">Experiments </a>have shown that the trained algorithm performs very bad
on the new slightly changed game (see <a href="https://vimeo.com/221350956">Kansky</a>) which shows that deep learning results can be
extremely superficial. Deep learning only detect patterns in data, they don’t develop
understanding or knowledge that can be reapplied in other contexts. </li>
<li>Deep learning is not transparent. Most
people in AI research acknowledge that neural nets are something of a black box.
The millions (sometimes billions) of parameters have no clear interpretation to
help explain the outcomes of the model. This is a problem, for example in domains
like credit scoring or medical diagnosis, in which we would like to understand
how a system came up with a recommendation or decision.</li>
<li>Deep learning presumes a stable
world. Slight changes to the environment in which a neural net is trained will
make it perform bad and would require retraining it. The above Breakout example
already shows this. This implies that deep learning works well in stable
environments like board games, but will have less success in messy environments
like politics, economics or customer interactions which are constantly changing.</li>
<li>Deep learning answers are
approximations and often cannot be fully trusted. Although deep neural nets are
quite good at detecting patterns, they can easily be fooled. <span style="text-indent: -18pt;"> </span><span style="text-indent: -18pt;">Examples of <a href="https://www.theverge.com/2017/11/2/16597276/google-ai-image-attacks-adversarial-turtle-rifle-3d-printed">turtles being mistakenly classified as guns</a> show this vulnerability. Images can even be manipulated to
“spoof” neural nets without these changes begin visible to the human eye. This
is a serious topic as we are starting to rely on deep neural nets in for
example self-driving cars, smart surveillance cams and automated classification
of images. Imagine tricking a self-driving car into missing stop signs. This
not only happens in computer vision applications of deep learning but in any
application of deep neural nets. Although there is a lot of research to prevent
this from happening, no robust solutions has been found yet.</span></li>
</ol>
<br />
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">The above
points imply that deep learning has serious limitations. Moreover these limitations
are not special cases, but can happen in any deep learning application. Although
a lot of research is being done, sponsored by big tech companies like Apple,
Google, Facebook, Amazon and Microsoft, it is not expected that these shortcomings
will be resolved soon. Geoffrey Hinton even thinks that the current popular
backpropagation algorithm has reached its limits and new methods need to be
invented to take the AI field further.<o:p></o:p></span><br />
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/piYnd_wYlT8/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/piYnd_wYlT8?feature=player_embedded" width="320"></iframe></div>
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<h3>
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">AI isn't a universal solvent</span></h3>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">Similar to
the 1980’s, AI is positioned as if it can solve any business problem, it is even
put forward to solve well defined business problems, like planning, routing or
scheduling. This seems silly to me as throwing a prediction algorithm at a
well-defined decision problem is a waste of money and time. Better to use the
specific knowledge about the decision problem to obtain better solutions
quicker, for example using mathematical optimisation models. These models
deliver explicit recommendations, not predictions. The mathematics used to create
these models makes them transparent and easily adjustable. <span style="mso-spacerun: yes;"> </span>Using these model, solutions that comply with all
relevant business conditions will be found as these are made explicit in the
model. Analysing the impact of changes to parameters and the data of the model allows
for informed decision making. AI doesn’t provide this type of decision support.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">What
analytics techniques to choose when solving a business problem requires sound understanding
of the business needs but also a good understanding of the applicability, strengths
and weaknesses of the analytics techniques you plan to use. This also applies
to AI. It can do wonderful things, but has serious limitations that need to be
taken into consideration. Just applying it without understanding them well enough
could lead to serious disappointments. AI is an interesting approach to some business
problems, it is not an approach to solve them all. At the core AI is a pattern
detection technique which makes it specifically suitable for creating predictions,
estimations of what could happen, especially in a data rich and stable environment.<o:p></o:p></span></div>
<br />@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-3457296912474867132018-04-02T21:03:00.000+02:002018-04-02T21:03:06.982+02:00Data Driven Sustainability Improvement<br />
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">Last week I
met <a href="https://www.linkedin.com/in/bianca-van-walbeek/">Bianca van Walbeek</a> of </span><a href="http://klimaatgesprekken.nl/">Klimaatgesprekken</a> <span lang="EN-GB">together with several of my colleagues to kick off our
sustainability initiative. In this initiative, we will explore with Bianca’s
help, how we as a person and as company can become more sustainable. Also, we
will develop innovative ways to support our customers in achieving a more
sustainable way of doing business. Our sustainability initiative is part of our
firm wide CSR programme which has as main topics People, Society and Planet.
Last year our focus was on Society, which resulted in our participation in <a href="http://www.weekendacademie.nl/">theWeekend Academie</a> to help educate children from less prosperous neighbourhoods
(see <a href="https://www.consultancy.nl/nieuws/13736/bearingpoint-sluit-samenwerking-met-de-weekend-academie">this link</a> for more). This year we’ll focus on Planet. Stating that our
ambition is to become more sustainable is easy, turning it into an actionable
plan, following it through and achieving measurable impact will not be that
easy. It’s however not different from the types of challenges we help our
customers solve.<span style="mso-spacerun: yes;"> </span>In fact, the approach
to becoming more sustainable is similar to the approach to become more cost
efficient, more customer centric, more data driven, more digital or more innovative.
Let me illustrate.<o:p></o:p></span></div>
<div class="MsoNormal">
<span style="mso-ansi-language: EN-GB;"><span lang="EN-GB"><br /></span></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">If you want
to improve, you require a baseline and a way to monitor your progress. As Peter
Drucker stated:” What gets measured gets improved”. However, as so many things
can and get measured these days, the real challenge is to identify what metrics
best express the goals you want to achieve. With the right set of metrics and
the data to calculate them you can track your progress and get guidance in
deciding on your actions. This is also what we did. We started by using descriptive
analytics to calculate our current carbon footprint and used it to get insights
on what the key drivers of our footprint are. Analysing my own carbon
footprint, I found that a major part of it is flying abroad, using the car to
attend customer meetings and driving to our office in Amsterdam. Flying is the
big contributor, so need to think on how to reduce my impact there. Second is using
my car to attend customers meetings and to drive to the office. I found out
that driving with a low average speed, for example during rush hours, causes
more carbon emissions per km than at medium average speeds. However, high
average speeds (>100 km/h) cause the emissions per km to rise again.<span style="mso-spacerun: yes;"> </span>This suggests that avoiding rush hours, avoid
driving at high speed or taking the train will reduce my footprint. That’s
exactly what I’m going to do. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKxCjV_-mOTvpGaKh9HbXShy2WRDhPMAe1jvIo5BI9k4xkym5CSt2KlylNQbIjex32Fyu04AoPJbndWgHylWvqTfawfwPHMelSXs4XGbNUGO16B5UQsJjrOiBJ2HUhs6t5C6N5ohjbM_90/s1600/Emissions+vs+Avg+speed.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="544" data-original-width="790" height="220" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKxCjV_-mOTvpGaKh9HbXShy2WRDhPMAe1jvIo5BI9k4xkym5CSt2KlylNQbIjex32Fyu04AoPJbndWgHylWvqTfawfwPHMelSXs4XGbNUGO16B5UQsJjrOiBJ2HUhs6t5C6N5ohjbM_90/s320/Emissions+vs+Avg+speed.JPG" width="320" /></a></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">Just having
the numbers however is not sufficient to achieve our goals. To engage our
colleagues and get their support we need to inform them on what we want to
achieve and why. Also we want to keep them updated on how we are doing and what
we did with their suggestions. We will accomplish this by using digital visual
management support, like <a href="http://www.iobeya.com/en/">iObeya</a>, showing our current performance, actions
undertaken, results and expected impact of the actions we plan to take to
become more sustainable.<span style="mso-spacerun: yes;"> </span>Next to
enabling the engagement of our colleagues, iObeya will also allow us to work as
a virtual team which will reduce our need to travel and allow us to make
efficient use of our time. As you can tell from the above example, one relative
simple action, adopting iObeya, can have multiple positive impacts (footprint
reduction, travel cost reduction, efficiency increase) on our objectives and
will use up some of our available resources (It budget, it-support hours, electricity).
Usually you’re not considering just one action but multiple, each with specific
impact on your objectives and resources requirements, which will complicate
your decision making. Which subset of actions will give the best outcome and utilise
our available resources in the best way?<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">To answer
the last question, we need predictive and prescriptive analytics. First, we need
to gather data, analyse it, and use predictive analysis to model resource usage
and impact of each initiative on our sustainability metrics and how they interact.
Next, we will need a prescriptive analytics model to help us choose the best
combination of initiatives. The prescriptive analytics model off course uses the
predictive models to model the impact and resource usage of each of the individual
initiatives. Next, we need to create a learning loop in which we will measure
the impact of the initiatives we have chosen to pursue. We will use newly
gathered data to calibrate the predictive models and get better estimates for
the predicted impact and resource usage, update the prescriptive model wen new
initiatives should be considered and use the prescriptive model to re-optimise
our set of initiatives to assure that we have chosen the best possible set of
initiatives to achieve our goals. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQRqlrw0uZ3zKO1ozmelnRv0FjW6yoNLnDQc0LDJJmvNYEfoio5hK7wdsEZPJn3NGCN9cQecl5yhVRCqjibqi1I9_VyyBdaRYye7oXX1tsY_1HXgXI1KvphbteNoHrtUk92Z05HUAwKgYg/s1600/Improvement+Cycle.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="845" data-original-width="1042" height="259" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQRqlrw0uZ3zKO1ozmelnRv0FjW6yoNLnDQc0LDJJmvNYEfoio5hK7wdsEZPJn3NGCN9cQecl5yhVRCqjibqi1I9_VyyBdaRYye7oXX1tsY_1HXgXI1KvphbteNoHrtUk92Z05HUAwKgYg/s320/Improvement+Cycle.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Data driven improvement cycle</td></tr>
</tbody></table>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="mso-ansi-language: EN-GB;">The above
approach will work for any improvement initiative, yours as well, whether it is
improving the sustainability of your organisation, or for example making it
more innovative. Key elements of the approach are to make explicit and
measurable what counts, share objectives and progress to engage and mobilise
your people. Use predictive and prescriptive analytics to find the best set of
initiatives given limited resources and make sure to create a learning loop by continuously
gathering relevant data on the actual outcomes of your decisions and using it
to calibrate the decision support models and your decision process. <o:p></o:p></span></div>
<br />@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-70263235339328243522018-01-07T21:18:00.001+01:002018-01-07T21:18:13.226+01:00Do You Believe in AI Fairy Tales?
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqdylzndiSw2lwsl6-XY2RpzEwOumwPcJIJZyc_rKPjCcSM9HiB4MpXw7jHvJCeJY-D1g_4rmFCCj4F7n7Rbjhhc2BKHNGMRh6tsxPFdkL3TCOZjXLyQ0tylaM6Oug5UVDwyWm33ojOQPV/s1600/the_three_laws_of_robotics.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" data-original-height="570" data-original-width="622" height="293" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqdylzndiSw2lwsl6-XY2RpzEwOumwPcJIJZyc_rKPjCcSM9HiB4MpXw7jHvJCeJY-D1g_4rmFCCj4F7n7Rbjhhc2BKHNGMRh6tsxPFdkL3TCOZjXLyQ0tylaM6Oug5UVDwyWm33ojOQPV/s320/the_three_laws_of_robotics.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="-webkit-text-stroke-width: 0px; background-color: white; color: black; display: inline !important; float: none; font-family: Lucida,Helvetica,sans-serif; font-style: normal; font-variant: small-caps; font-weight: 500; letter-spacing: normal; orphans: 2; text-align: center; text-decoration: none; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span style="font-size: xx-small;">https://xkcd.com/1613/</span></span></td></tr>
</tbody></table>
<div style="margin: 0px 0px 10.66px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">Automatic
speech transcription, Self-driving cars, a computer program beating the world
champion GO player and computers learning to play video games and achieving
better results than humans. Astonishing results that makes you wonder what
Artificial Intelligence (AI) can achieve now and in the future. Futurist <a href="https://singularityhub.com/2017/03/31/can-futurists-predict-the-year-of-the-singularity/">Ray Kurzweil predicts</a> that by 2029 computers will have human level intelligence and
by 2045 computers will be smarter than humans, the so called “Singularity”. Some
of us are looking forward to that, others think of it as their worst nightmare.
In 2015 several top scientists and entrepreneurs <a href="https://futureoflife.org/ai-open-letter/">called for caution over AI</a> as it
could be used to create something that cannot be controlled. Scenarios envisioned
in movies like 2001, a Space Odyssey or the Terminator in which AI turns
against humans, violating Asimov’s first law of robotics, are not the ones we’re
looking forward to. Question is if these predictions and worries about the
capabilities of AI, now or in the future, are realistic or just fairy tales.</span></span></div>
<br />
<div style="margin: 0px 0px 10.66px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;"><b>What is AI?</b></span></span></div>
<br />
<div style="margin: 0px 0px 10.66px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">AI is
usually defined as the science of making computers do things that require
intelligence when done by humans. To get a computer to do things it requires
software. To let a computer do smart things it needs algorithms. Today the most
common algorithms used in AI are Supervised learning, Transfer learning,
Unsupervised learning and Reinforcement learning. Note that the nowadays popular
term Deep Learning is just a form of Supervised Learning using (special forms
of ) Neural Nets. Supervised learning takes both input and output data
(labelled data) and uses algorithms to create computer models that are able to
predict the correct label for new input data. Typical applications are image
recognition, facial recognition, automatic transcription of audio, (speech to
text) and automatic translation. Supervised learning takes a lot of data, about
50,000 hours of audio are required to train a human like performing speech transcription
system. Transfer learning is similar to Supervised Learning but stores
knowledge gained while solving one problem and applying it to a different but
related problem. For example, applying knowledge gained while learning to
recognise cars to recognise trucks. Unsupervised learning doesn’t use labelled
data and tries to find patterns in data. There are little to no successful
practical applications of Unsupervised learning however. Reinforcement learning
also doesn’t use labelled data but uses feedback mechanisms to let the computer
programme “learn” how to improve its behaviour. Reinforcement learning is used
in AlphaGo (the programme that beat the GO world champion) and in teaching computers
to play video games. Reinforcement learning is even more data hungry than the
other AI techniques. Besides playing (video) games there are no practical
applications of Reinforcement learning yet.</span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGJtOP5narXK_cjMBKxXB1YZNOqmMSk5p__gIHk4v1NUVpG4eymCa3V3Mf_6cNZF0gjMsx2Wfxm00MGx3XFs-r_9jo8GMZBT_tkzin5UYt62y0SdHyuUNaXwbxBG52R923bZnMvIYwSZu4/s1600/Supervised+Learning.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="347" data-original-width="723" height="191" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGJtOP5narXK_cjMBKxXB1YZNOqmMSk5p__gIHk4v1NUVpG4eymCa3V3Mf_6cNZF0gjMsx2Wfxm00MGx3XFs-r_9jo8GMZBT_tkzin5UYt62y0SdHyuUNaXwbxBG52R923bZnMvIYwSZu4/s400/Supervised+Learning.JPG" width="400" /></a></div>
<div style="margin: 0px 0px 10.66px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;"><br /></span></span></div>
<br />
<div style="margin: 0px 0px 10.66px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;"><b>What makes AI
successful?</b></span></span></div>
<br />
<div style="margin: 0px 0px 10.66px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">As <a href="http://www.andrewng.org/">Andrew Ng</a>, Coursera founder and Adjunct Professor at Stanford University indicates,
the most successful applications of AI in practice use supervised learning. He
estimates that <a href="https://www.youtube.com/watch?v=NKpuX_yzdYs">99% of the economic value created today with AI</a> is using this
approach. <span style="margin: 0px;"> </span>The AI supported optimisation
of ad placements on webpages is by far the most successful in terms of the
additional revenue it generates for its users. Very little economic value is
created with the remaining techniques, despite the high level of attention these
have had in the media. Todays “rise” of AI may have struck you as a surprise. A
couple of years ago we were not even aware of the practical usability of AI,
let alone imagined that we would have AI on our phone (Siri) or in our house
(Alexa) supporting us with everyday tasks. However, AI is nothing new, it has
been researched since the 1960’s. The current leading algorithm used to
estimate the Deep Learning neural networks, backpropagation, was popularised by
Geoffrey Hinton in 1986, but has its roots somewhere in the 1960’s. Lack of
data and computational power made the algorithm impractical. This has changed
as the availability of (labelled) data has grown tremendously and, more
importantly, computing power has increased significantly by the introduction of
GPU computing. These two factors are the key reasons for AI to be successful
today. So it’s not research driven progress, but engineering driven progress. Still,
for the best performing supervised learning applications, super computers or
High Performance Computing (HPC) systems are required because huge neural nets
need to be constructed and estimated. To illustrate, Google’s AlphaGo programme
ran on special hardware with 1202 CPUs and 176 GPUs when playing against Go
Champion Lee Sedol. Many experts, among them <a href="https://rodneybrooks.com/the-seven-deadly-sins-of-predicting-the-future-of-ai/">Rodney Brooks</a>, roboticist and AI
researcher, questions if much progress can be expected as computational power
is not expected to increase much further. Therefore, it could be that we're not
at the beginning of an AI revolution, but at the end of one.</span></span></div>
<br />
<div style="margin: 0px 0px 10.66px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;"><b>What can we
expect from AI in the future?</b></span></span></div>
<br />
<div style="margin: 0px 0px 10.66px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">Browsing
through the newspapers and other media the number of stories on the achievements
of AI and how it will impact the world is huge. Futurist predictions about what
AI will allows us to do in the future are mind boggling. Will we really be able
to upload our mind to a computer and live forever or learn Kung Fu like Neo in
the Matrix movie? Most of these predictions state that AI will increase in
power quickly assuming it is driven by an exponential law of progress, similar to
Moore’s law. This is doubtful as for AI to acquire the predicted powers it not only
requires faster computers, it also requires smarter and more capable software
and algorithms. Trouble is, research progress doesn’t follow a law or pattern and
therefore can’t be predicted. Deep Learning took 30 year to deliver value. Many
AI researchers see it as an isolated event. As Rodney Brook says there is no
“law” that dictates when the next breakthrough in AI will happen. It can be
tomorrow, but it can also take a 100 years. I think most futurists make the
same prediction mistake as many of us do. We tend to overestimate the effect of
a technology in the short run and underestimate the effect in the long run (Roy
Amara’s law). Take for example computers. When they were introduced in the 1950’s
there was widespread fear that it would take over all jobs. Now 60 years later,
most jobs are still there, new jobs have been created due to the introduction of
computers and we have applications of computers we never even imagined. </span></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVm_WzGIGKi98rpb2MSbRC0BPc-fDV3dAHJAgPjvgDcLLJsmMzEAwZ34QvcjE2NQOaFdPMtCLwtpkXnYJPULlS5FE42KAZO2qLh1G23F217d_eOXMuqkuO7eU5uVyXEP0fKgH6uaWhp6RP/s1600/matrix+movie.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="827" data-original-width="625" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVm_WzGIGKi98rpb2MSbRC0BPc-fDV3dAHJAgPjvgDcLLJsmMzEAwZ34QvcjE2NQOaFdPMtCLwtpkXnYJPULlS5FE42KAZO2qLh1G23F217d_eOXMuqkuO7eU5uVyXEP0fKgH6uaWhp6RP/s400/matrix+movie.jpg" width="301" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">https://www.warnerbros.com/matrix/photos</td></tr>
</tbody></table>
<div style="margin: 0px 0px 10.66px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;"><br /></span></span></div>
<span lang="EN-GB" style="font-family: "Calibri",sans-serif; font-size: 11pt; line-height: 107%; margin: 0px;">As Niels Bohr said many year ago: ”Predictions
are hard, especially if they are about the future” this also applies to
predicting how Artificial Intelligence will develop in the next years. AI today
is capable of performing very narrow tasks well, but the success is very
brittle. Change the rules of the task slightly and it needs to be retrained and
tuned all over again. For sure there will be progress, and more activities we do
will get automated. Andrew Ng has a nice rule of thumb for it, any mental activity
that takes about of second of thought from a human will get automated with AI. This
will impact jobs, but at a much slower rate than many predict. This will
provide us the time to learn how to safely design and use this technology, similar
to the way we learned to use computers. So, when we are realistic about what AI
can do in the future, there is no need to get too excited or upset, sit back and enjoy
Hollywood’s AI doomsday movies and other fairy tales about AI. If you have the
time I recommend reading some of the work AI researchers publish, for example Rodney
Brooks, Andrew Ng, John Holland or scholars like Jaron Lanier or Daniel Dennett.
<span style="margin: 0px;"> </span></span><br />
<span lang="EN-GB" style="font-family: "Calibri",sans-serif; font-size: 11pt; line-height: 107%; margin: 0px;"><span style="margin: 0px;"></span></span><b></b><i></i><u></u><sub></sub><sup></sup><strike></strike><br />@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-69111597089575048322017-11-05T14:58:00.002+01:002017-11-05T14:58:58.107+01:00Averaging Risk is Risking the Average
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZkle7WHVhOLpQzF8veIhrf0BYEwk2aRv4UxW7DxgUK9sGmCEZp5dEfj0PVgQ6f-kWKmA-5DWR8nnmFQkMiIB0VRTAE2ckEhKVZQKeEZti5uGMP09ATalyNUTN4gEIq25J0f4rP_RFLmX7/s1600/flaw+of+averages.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" data-original-height="373" data-original-width="576" height="207" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiZkle7WHVhOLpQzF8veIhrf0BYEwk2aRv4UxW7DxgUK9sGmCEZp5dEfj0PVgQ6f-kWKmA-5DWR8nnmFQkMiIB0VRTAE2ckEhKVZQKeEZti5uGMP09ATalyNUTN4gEIq25J0f4rP_RFLmX7/s320/flaw+of+averages.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">http://www.danzigercartoons.com/</td></tr>
</tbody></table>
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">To assure
public safety regulatory agencies like the <a href="https://www.acm.nl/en/">ACM</a> in the Netherlands or <a href="https://www.ofgem.gov.uk/">Ofgem</a> in
the UK monitor the performance of gas and electric grid operators on area’s
like costs, safety and the quality of their networks. The regulator compares
the performance of the grid operators and decides on incentives to stimulate
improvements in these areas. Difficulty with these comparisons is that grid
operators use different definitions and/or methodologies to calculate performance,
which complicates a like for like comparison on for example asset health,
criticality or risk across the grid operators. In the UK this has led to a new
concept for risk calculations, the concept of monetised risk. In calculating
monetised risk not only the probability of failure of the asset is used, also
the probability of the consequence of a failure and its financial impact are
taken into account. The question is if this new method delivers more insightful
risk estimations to allow for a better comparison among grid operators. Also,
will it support fair risk trading among asset groups or the development of improved
risk mitigation strategies?</span></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXIQzxg88aiSf0ktlSHC3gJ3YaYtVe9B7I6fqE7O5_Nfzs2fMWzu1n8J_pflKe17dr0hHLR61hPUyHD9Uo5soLPrTCCQRXN6Paa0wklMaDpSZSQS2MrMJfm5ATpgy7750ZPNTJ_qlM2UZV/s1600/Broad+monitised+risk+map+process.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="462" data-original-width="1186" height="155" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXIQzxg88aiSf0ktlSHC3gJ3YaYtVe9B7I6fqE7O5_Nfzs2fMWzu1n8J_pflKe17dr0hHLR61hPUyHD9Uo5soLPrTCCQRXN6Paa0wklMaDpSZSQS2MrMJfm5ATpgy7750ZPNTJ_qlM2UZV/s400/Broad+monitised+risk+map+process.JPG" width="400" /></a></div>
<div style="margin: 0px 0px 11px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">The cost -
risk trade-off that grid operators need to make is complex. Costly risk
reducing adjustments to the grid need to be weighed against the rise in cost of
operating the network and therefore the rates consumers pay for using the grid.
For making the trade-off, an estimate of the probability of failure of an asset
is required. In most cases, specific analytical models are developed to
estimate these probabilities. Using pipeline characteristics like type of material,
age, and data on the environment the pipeline is in (i.e. soil type,
temperature and humidity) pipeline specific failure rate models can be created.
Results from inspections of the pipeline can be used to further calibrate the model.
Due to the increased analytics maturity of grid operators, these models are
becoming more common. Grid operators are also starting to incorporate these failure
rate models in the creation of their maintenance plans. </span></span></div>
<br />
<h4 style="margin: 0px 0px 11px;">
<span lang="EN-GB" style="margin: 0px;">Averaging
the Risk</span></h4>
<br />
<div style="margin: 0px 0px 11px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">As you can
probably imagine, there are many ways for constructing failure rate models.
This makes it difficult for a regulator to compare reported asset conditions
from the grid operators, as these estimates could have been based on different
assumptions and modelling techniques.<span style="margin: 0px;">
</span>That is why, in the UK at least, it was agreed between the 4 major gas
distribution networks (GDN), to standardise the approach. <span style="margin: 0px;"> </span>In short, the method can be described as
follows.</span></span></div>
<ol>
<li><span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">Identify the failure modes of each asset category/sub group in the asset base and estimate the probability of failure for each identified failure mode. </span></span></li>
<li><span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">For each failure mode the consequences of the failure are identified, including the probability of the consequence occurring.</span></span></li>
<li><span lang="EN-GB" style="margin: 0px;"><span style="margin: 0px;"><span style="font-size-adjust: none; font-stretch: normal; font: 7pt "Times New Roman"; margin: 0px;"> </span></span></span><span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">For each consequence the monetary impact is estimated. </span></span></li>
<li><span lang="EN-GB" style="margin: 0px;"><span style="margin: 0px;"><span style="font-size-adjust: none; font-stretch: normal; font: 7pt "Times New Roman"; margin: 0px;"> </span></span></span><span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">By summing up over all failure modes and consequences, a probability weighted estimate of monetised risk for an asset category/sub group is calculated. Summarising over all asset categories/sub groups gives a total level of monetised risk for the grid.</span></span></li>
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;"></span></span></ol>
<span lang="EN-GB" style="margin: 0px;"><br /></span>
<br />
<div style="margin: 0px 0px 11px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">This new
standardised way of calculating risks makes the performance evaluation much easier,
it also <span style="margin: 0px;"> </span>allows for a more in-depth
comparison. See for more details on the method<a href="https://www.ofgem.gov.uk/sites/default/files/docs/2015/11/gdn_asset_health_risk_reporting_methodology_-_v2.0.pdf"> the official documentation.</a></span></span></div>
<div style="margin: 0px 0px 11px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;"><br /></span></span></div>
<div style="margin: 0px 0px 11px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">An
interesting part of this new way of reporting risk is the explicit and
standardised way of modelling asset failure, consequence of asset failure and
cost of the consequence. This is similar to how a consolidated financial
statement of a firm is created. Therefore, you could interpret it as a
consolidated risk statement. But can risks of individual assets or asset groups
be aggregated in the described way and provide a meaningful estimate of the total
actual risk? The above described approach sums the estimated (or weighted
average) risk for each asset category/sub group, so it’s an estimate of the
average risk for the complete asset base. However risk management is not about
looking at the average risk, it’s about extreme values. For those who read Sam
Savage’s The Flaw of Averages or Nassim Taleb’s Black Swan know what I’m
talking about.</span></span></div>
<br />
<h4 style="margin: 0px 0px 11px;">
<span lang="EN-GB" style="margin: 0px;">Risking the
Average</span></h4>
<br />
<div style="margin: 0px 0px 11px;">
<span style="font-family: Calibri;"><span lang="EN-GB" style="margin: 0px;">Risks are
characterised by extreme outcomes, not averages. To be able to analyse extreme values,
a probability distribution of the outcome you’re interested in is required.
Averaging reduces the distribution of all possible outcomes to a point estimate,
hiding the spread and likelihood of all possible outcomes. Also, averaging
risks ignores the dependence between each of the identified modes of failure or
consequence. To illustrate let’s assume that we have 5 pipelines, each with a
probability of failure of 20%. There is only one consequence (probability =1)
with a monetary impact of </span><span lang="EN-GB" style="margin: 0px;">€</span><span lang="EN-GB" style="margin: 0px;">1,000,000. The monetised risk per
pipeline than becomes </span><span lang="EN-GB" style="margin: 0px;">€</span><span lang="EN-GB" style="margin: 0px;">200,000 (=0,20*</span><span lang="EN-GB" style="margin: 0px;">€</span><span lang="EN-GB" style="margin: 0px;">1,000,000), for the total grid it is equal to </span><span lang="EN-GB" style="margin: 0px;">€</span><span lang="EN-GB" style="margin: 0px;">1,000,000. If
we take dependence of the failures into account than there will be a 20%
probability of all pipes failing when these are fully correlated events. There
will be a 0,032% change of all pipes failing if they are fully independent. The
estimated financial impact than ranges from </span><span lang="EN-GB" style="margin: 0px;">€</span><span lang="EN-GB" style="margin: 0px;">1,000,000 in
the fully correlated case to </span><span lang="EN-GB" style="margin: 0px;">€</span><span lang="EN-GB" style="margin: 0px;">1,600 in the fully independent case.
That’s quite a range which isn’t visible in the monetised risk approach.</span></span></div>
<br />
<div style="margin: 0px 0px 11px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">Regulators
must assess risk in many different areas. Banking has been top of mind in the
past years, but industries like Pharma and Utilities also had a lot of attention.
How a regulator decides to measure and asses risk is very important. If risks
are underestimated, this could impact society (like a banking crisis, deaths
due to the admission of unsafe drugs or increase of injuries due to pipeline
failures). If risks are overestimated costly mitigation might be imposed,
again impacting society with high costs. The above example shows that the monetised
risk approach is insufficient as it estimates risk with averages, where in risk
mitigation the extreme values are much more important. What than is a better
way of aggregating these uncertainties and risks than just averaging them? </span></span></div>
<br />
<h4 style="margin: 0px 0px 11px;">
<span lang="EN-GB" style="margin: 0px;">Monte Carlo
Simulation</span></h4>
<br />
<div style="margin: 0px 0px 11px;">
<span style="font-family: Calibri;"><span lang="EN-GB" style="margin: 0px;">The best
way to better understand the financial impact of asset failure is to construct
a probability density function of all possible outcomes using Monte Carlo simulation
and based on that distribution make the trade-off between costs and risk. Monte
Carlo Simulation has proven its value in many industries and in this case will
provide what we need. Using the free tools of Sam Savage’s probabilitymanagement.org
the above hypothetical example of 5 pipe lines can be modelled and the
distribution of financial impact analysed. In just a few minutes the below
cumulative distribution (CDF) of the financial impact for the 5 pipelines case can
be created. Remember that the monetised risk calculation resulted in a risk
level equal to the average, </span><span lang="EN-GB" style="margin: 0px;">€</span><span lang="EN-GB" style="margin: 0px;">1,000,000. </span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjySYQj6dYuu0AsXfr40LUbnw8vFfFKYzg77UTNHaDPUcRKsx0XhIrMukngYGJ7Vc2unN8XhAeeqOz29RTKWx8_rv5a1CE4hg9WDsic6BBVwBJE0wHK5QjAzZdkKcUWyt7Mjep5gtQdcZKz/s1600/CDF+Financial+Impact.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="544" data-original-width="889" height="195" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjySYQj6dYuu0AsXfr40LUbnw8vFfFKYzg77UTNHaDPUcRKsx0XhIrMukngYGJ7Vc2unN8XhAeeqOz29RTKWx8_rv5a1CE4hg9WDsic6BBVwBJE0wHK5QjAzZdkKcUWyt7Mjep5gtQdcZKz/s320/CDF+Financial+Impact.JPG" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div style="margin: 0px 0px 11px;">
<span style="font-family: Calibri;"><span lang="EN-GB" style="margin: 0px;">From the graph it
immediate follows that P(Financial Impact<=Monetised Risk) = 33%. It implies
that the P(Financial Impact></span><span lang="EN-GB" style="margin: 0px;">Monetised Risk</span><span lang="EN-GB" style="margin: 0px;">) = 1-33%=66%. So, a 66% chance that
the financial impact of pipe failures will be higher than the calculated monetised
risk. Therefore we’re taking a serious risk by using the averaged asset risks. Given
the objective of better comparison of grid operator performance and enabling risk
trading between asset groups, the monetised risk method is to simple I would
say. By averaging the risks, the distribution of financial impact is rolled up
into one number leaving you no clue on what the actual distribution looks like (See
also Sam Savage’s : The Flaw of Averages) A better way would be to set an
acceptable “risk threshold” (say 95%) and use the estimated CDF to determine
the corresponding financial impact. </span></span></div>
<br />
<div style="margin: 0px 0px 11px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">This
approach would also allow for better comparison of grid operators by creating a
cumulative distribution for all of them and plotting them together into one
graph (See example below). In a similar way risk mitigations can be evaluated and
comparisons made between different asset groups, allowing for better informed
risk trading. </span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFup-T4Yg-lc4TWHn2nYYweNnphErliq_H8JwHms8Bn5uk7e6u1D9RlnZ6lYJFe17AkhgDiI9ggcZ4zfLQvoehQOdvqh697uowDqot_N5tkdxC86Tdkvq-HIMu9ZFr56ed6jx2TUVnPtJM/s1600/combined+CDFs.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="554" data-original-width="825" height="214" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFup-T4Yg-lc4TWHn2nYYweNnphErliq_H8JwHms8Bn5uk7e6u1D9RlnZ6lYJFe17AkhgDiI9ggcZ4zfLQvoehQOdvqh697uowDqot_N5tkdxC86Tdkvq-HIMu9ZFr56ed6jx2TUVnPtJM/s320/combined+CDFs.JPG" width="320" /></a></div>
<div style="margin: 0px 0px 11px;">
<br /></div>
<div style="margin: 0px 0px 11px;">
<span lang="EN-GB" style="margin: 0px;"><span style="font-family: Calibri;">Standardising
the way in which asset failures and consequence of failures are estimated and monetised
definitely is a good step towards a comparable way to measure risk. But risks
should not be averaged in the way the monetised risk approach suggests. There
are better ways, which will provide insight on the whole distribution of risk. Given
the available tools and computing power, there is no reason not to do so. It will
improve our insights on the risks we face and help us find the best mitigation
strategies to reducing public risks. </span></span></div>
<b></b><i></i><u></u><sub></sub><sup></sup><strike></strike><span style="font-family: Calibri;"></span>@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-57180751906479103502017-01-27T21:26:00.000+01:002017-01-27T21:26:10.244+01:00Want to get value from Data Science? Keep it simple and focussed!<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQUki82eDbBSFx8BmiYRDUNRMompYKcgggezMQxxKvr989omumcq3IaHjZrqml6DGZ_Dp50st37XDuurEwLjj52cTyQo53bvzmWSd6PSw3BgHm2dXeOumREr9eikvyXArmql8doaDs5BPd/s1600/new_products.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="313" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQUki82eDbBSFx8BmiYRDUNRMompYKcgggezMQxxKvr989omumcq3IaHjZrqml6DGZ_Dp50st37XDuurEwLjj52cTyQo53bvzmWSd6PSw3BgHm2dXeOumREr9eikvyXArmql8doaDs5BPd/s320/new_products.png" width="320" /></a></div>
What is the latest data science success story you have read? The <a href="https://www.dezyre.com/article/how-big-data-analysis-helped-increase-walmarts-sales-turnover/109">one </a>from Walmart? Maybe a fascinating result from a <a href="https://www.kaggle.com/competitions">Kaggle</a> competition? I’m always interested in these stories wanting to understand what has been achieved, why it was important and what the drivers for success were. Although the buzz on the potential of data science is very strong, the number of stories on impactful practical applications of data science is still not very large. The Harvard Business Review recently published<a href="https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science"> an article</a> explaining why organisations are not getting value from their data science initiatives. Although there are many more reasons than mentioned in the article one key reason for many initiatives to fail is a disconnect between the business goals and the data science efforts. Also, the article states that the focus of data scientists is to keep fine tuning their models instead of taking on new business questions, causing delays in the speed at which business problems are analysed and solved.<br />
<br />
Seduced by inflated promises, organisations have started to mine their data with state of art algorithms expecting that it is turned into gold instantly. This expectation that technology will act as a philosopher’s stone, makes data science comparable to alchemy. It looks like science, but it isn’t. Most of the algorithms fail to deliver value as they can’t provide an explanation as to why things are happening nor provide actionable insights or guidance for influencing the phenomena being investigated. To illustrate, take the London riots in 2011. Since the 2009 G20 summit, the UK police has been gathering and analysing a lot of social media data, but still they were not able to prevent the 2011 riots from happening nor track and arrest the rioters. Did the police have too little data or lack of computing or algorithmic power? No, millions have been spent. Despite all the available technology <a href="https://youtu.be/XxbpUalZmQY">the police was unable to make sense of it all</a>. I see other organisations struggle with the same problem trying to make sense of their data. Although I’m a strong proponent of using data and mathematics (and as such data science) for answering business questions, I do believe that technology can never be sufficient to provide an answer. Likewise, the amount, diversity and speed of the data.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/XxbpUalZmQY/0.jpg" src="https://www.youtube.com/embed/XxbpUalZmQY?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<br />
<h4>
Inference vs Prediction</h4>
Let’s investigate the disconnect between the business goals and the data science efforts as mentioned in the HBR article. Many of today’s data science initiatives result in predictive models. In a B2C context these models are used to predict whether you’re going to click on an ad, buy a suggested product, if you’re going to churn, or if you’re likely to commit fraud or default on a loan. Although a lot of effort goes into creating highly accurate predictions, questions is if these predictions really create business value. Most organisations require a way to influence the phenomenon being predicted instead of the prediction itself. This will allow them to decide on the appropriate actions to take. Therefore, understanding what makes you click, buy, churn, default or commit fraud is the real objective. To be able to understand what influences human behaviour requires another approach than creating predictions, it requires inference. Inference is a statistical, hypothesis driven approach to modelling and focusses on understanding the causality of a relationship. Computer science, the core of most data science methods, focusses on finding the best model to fit the data and doesn’t focus on understanding why. Inferential models provide the decision maker with guidance on how to influence customer behaviour and thus value can be created. This might better explain the disconnect between business goals and the analytics efforts as reported in the HBR article. For example, knowing that a call positively influences customer experience and prevents churn for a specific type of customer gives the decision maker the opportunity to plan such a call. Prediction models can’t provide these insights, but will provide the expected number of churners or who is most likely to churn. How to react on these predictions is left to the decision maker.<br />
<br />
<h4>
Keep it simple!</h4>
Second reason for failure mentioned in the HBR article is that data scientists put a lot of effort in improving the predictive accuracy of their models instead of taking on new business questions. Reason mentioned for this behaviour is the huge effort for getting the data ready for analysis and modelling. Consequence of this tendency is that it increases model complexity. Is this complexity really required? From a user’s perspective, complex models are more difficult to understand and therefore also more difficult to adopt, trust and use. For easy acceptance and deployment, it is better to have understandable models. Sometimes this is even a legal requirement, for example in credit scoring. A best practice I apply in my work as a consultant is to balance the model accuracy well against the accuracy required for the decision to be made, the analytics maturity of the decision maker and the accuracy of the data. This also applies to data science projects. For example, targeting the receivers of your next marketing campaign requires less accuracy than have a self-driven car find its way to its destination. Also, you can’t make more accurate predictions than the accuracy of your data. Most data are uncertain, biased, incomplete and contain errors, when you have a lot of data this becomes even worse. This will negatively influence the quality and applicability of the model based on this data. In addition, <a href="https://arxiv.org/pdf/math/0606441.pdf">research</a> shows that the added value of more complex methods is marginal compared to what can be achieved with simple methods. Simple models already catch most of the signal in the data, enough in most practical situations to base a decision on. So, instead of creating a very complex and highly accurate model, better to test various simple ones. They will capture the essence of what is in the data and speed up the analysis. From a business perspective, this is exactly what you should ask you data scientists to do, come up with simple models fast and if required for the decision use the insights from these simple models to direct the construction of more advanced ones.<br />
<br />
The question “How to get value from your data science initiative?” has no simple answer. There are many reasons why data science projects succeed or fail, the HBR article only mentions a few. I’m confident that the above considerations and recommendations will increase the chances of your next data science initiative to be successful. Can’t promise you gold however, I’m no alchemist.<br />
<div>
<br /></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-86807396120740030502016-10-13T17:54:00.004+02:002016-10-13T18:14:48.211+02:00The Error in Predictive Analytics<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFQhXC-Z78lDctHJE3uAqbONmwFj6l62T8Ofmu5Qqhb6MWR-8Ssba4eGD1QM7pG9osNKkYThyphenhyphenClxF30ZQlSEZ1esEOmMHdgz_bEcwF0Fc7UHkDJvMob46Ucmozst8x1xHN8jcJP2EY2MOj/s1600/The+Future+according+to+google.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="268" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFQhXC-Z78lDctHJE3uAqbONmwFj6l62T8Ofmu5Qqhb6MWR-8Ssba4eGD1QM7pG9osNKkYThyphenhyphenClxF30ZQlSEZ1esEOmMHdgz_bEcwF0Fc7UHkDJvMob46Ucmozst8x1xHN8jcJP2EY2MOj/s320/The+Future+according+to+google.JPG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">For more predictions see : http://xkcd.com/887/</td></tr>
</tbody></table>
<div class="MsoNormal">
We are all
well aware of the predictive analytical capabilities of companies like Netflix,
Amazon and Google. Netflix predicts the next film you are going watch. Amazon
shortens<a href="https://techcrunch.com/2014/01/18/amazon-pre-ships/"> delivery times</a> by predicting what you are going to buy next, Google
even lets you use their algorithms to build your own prediction models.
Following the predictive successes of Netflix, Google and Amazon companies in
telecom, finance, insurance and retail have started to use predictive
analytical models and developed the analytical capabilities to improve their
business. Predictive analytics can be applied to a wide range of business
questions and has been a key technique in search, advertising and
recommendations. Many of today's
applications of predictive analytics are in the commercial arena, focusing on
predicting customer behaviour. First steps in other businesses are being taken.
Organisations in healthcare, industry, and utilities are investigating what
value predictive analytics can bring. In these first steps much can be learned
from the experience the front running industries have in building and using
predictive analytical models. However, care must be taken as the context in
which predictive analytics has been used is quite different from the new
application areas, especially when it comes to the impact of prediction errors.</div>
<div class="MsoNormal">
<br /></div>
<h4>
<span lang="EN-GB">Leveraging
the data</span></h4>
<div class="MsoNormal">
<span lang="EN-GB">It goes
without saying that the success of Amazon comes from, besides the infinite
shelf space, its <a href="http://fortune.com/2012/07/30/amazons-recommendation-secret/">recommendation </a>engine. Similar for Netflix. <a href="http://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers">According to McKinsey</a>, 35 percent of what consumers purchase on Amazon and 75 percent of
what they watch on Netflix comes from algorithmic product recommendations. Recommendation
engines work well because there is a lot of data available on customers,
products and transactions, especially online. This abundance of data is why
there are so many predictive analytics initiatives in sales & marketing. Main objective of these initiatives is to
predict customer behaviour, like which customer is likely to churn or buy a
specific product/service, which ads will be clicked on or what marketing
channel to use to reach a certain type of customer. In these types of
applications predictive models are created either using statistical (like
regression, probit or logit) or machine learning techniques (like random
forests or deep learning) With the insights gained from using these predictive
models many organisations have been able to increase their revenues.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<h4>
<span lang="EN-GB">Predictions
always contain errors!</span></h4>
<div class="MsoNormal">
<span lang="EN-GB">Predictive
analytics has many applications, the above mentioned examples are just the tip
of the iceberg. Many of them will add value, but it remains important to stress
that the outcome of a prediction model will always contain an error. Decision
makers need to know how big that error is. To illustrate, in using historic
data to predict the future you assume that the future will have the same
dynamics as the past, an assumption which history has proven to be dangerous.
The 2008 financial crisis is prove of that. Even though there is no shortage of
data nowadays, there will be factors that influence the phenomenon you’re
predicting (like churn) that are not included in your data. Also, the data
itself will contain errors as measurements always include some kind of error.
Last but not last, models are always an abstraction of reality and can't
contain every detail, so something is always left out. All of this will impact
the accuracy and precision of your predictive model. Decision makers should be
aware of these errors and the impact it may have on their decisions.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">When
statistical techniques are used to build a predictive model the model error can
be estimated, it is usually provided in the form of confidence intervals. Any
statistical package will provide them, helping you asses the model quality and
its prediction errors. In the past few years other techniques have become
popular for building predictive models, for example algorithms like deep
learning and random forests. Although these techniques are powerful and able to
provide accurate predictive models, they are unable to provide a confidence
intervals (or error bars) for their predictions. So there is no way of telling
how accurate or precise the predictions are. In marketing and sales, this may
be less of an issue. The consequence might be that you call the wrong people or
show an ad to the wrong audience. The consequences can however be more severe.
You might remember the <a href="https://www.theguardian.com/technology/2015/may/20/flickr-complaints-offensive-auto-tagging-photos">offensive auto tagging by Flickr</a>, labelling images of people with tags like “ape” or “animal” or the racial bias in <a href="https://www.theguardian.com/us-news/2016/aug/31/predictive-policing-civil-rights-coalition-aclu">predictive policing algorithms</a>.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a data-flickr-embed="true" href="https://www.flickr.com/photos/132452869@N04/17801937502/" nbsp="" title="Untitled"><img alt="Untitled" height="334" src="https://c7.staticflickr.com/6/5331/17801937502_01455e3efe.jpg" width="500" /></a><script async="" charset="utf-8" src="//embedr.flickr.com/assets/client-code.js"></script></div>
<div class="MsoNormal">
<br /></div>
<h4>
<span lang="EN-GB">Where is
the error bar?</span></h4>
<div>
The point
that I would like to make is that when adopting predictive modelling be sure to
have a way of estimating the error in your predictions, both on accuracy and
precision. In statistics this is common practice and helps improve models and
decision making. Models constructed with machine learning techniques usually
only provide point estimates (for example, the probability of churn for a
customer is some percentage) which provides little insight on the accuracy or
precision of the prediction. When using machine learning it is possible to
construct error estimates (see for example <a href="http://magazine.amstat.org/blog/2016/03/01/jordan16/">the research of Michael I. Jordan</a>) but
it is not common practice yet. Many analytical practitioners are not even aware
of the possibility. Especially now that predictive modelling is getting used in
environments where errors can have a large impact, this should be top of mind
for both the analytics professional and the decision maker. Just imagine your
doctor concluding that your liver needs to be taken out because his predictive
model estimates a high probability of a very nasty decease? Wouldn’t your first
question be how certain he/she is about that prediction? So, my advice to decision
makers, only use outcomes of predictive models if accuracy and precision
measures are provided. If they are not there, ask for them. Without them, a
decision based on these predictions comes close to a blind leap of faith.</div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-40386053561106857862016-08-03T12:15:00.000+02:002016-08-03T12:15:19.182+02:00Airport Security, can more be done with less?<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2iL4W4UM7mNnjaXmcJed9esB8jrctVStMac_aNIiQ5LIfUl4xir-ZBmsPzm64PGVjAYWvAwxxe00xpr8-nuy4pwQ1eGIHVREVHGY0L2D7-4iiUnOiCe7FekBJGd7LkT8E7p9CrnZmg7SU/s1600/fileschiphol-1200x800-720x480.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="265" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2iL4W4UM7mNnjaXmcJed9esB8jrctVStMac_aNIiQ5LIfUl4xir-ZBmsPzm64PGVjAYWvAwxxe00xpr8-nuy4pwQ1eGIHVREVHGY0L2D7-4iiUnOiCe7FekBJGd7LkT8E7p9CrnZmg7SU/s400/fileschiphol-1200x800-720x480.jpg" width="400" /></a></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">One of the main
news items of the past few days is the increased level of security at Amsterdam
Schiphol Airport and the additional delays it has caused travellers both
incoming and outgoing. Extra security checks on the roads around the airport
are being conducted, also in the airport additional checks are being performed.
Security checks have increased after the authorities received reports of a
possible threat. We are in the peak of the holiday season where around 170.000
passengers per day arrive, depart or transfer at Schiphol Airport. With these
numbers of people for sure authorities want to do their utmost to keep us save,
as always. This intensified security puts the military police (MP) and security
officers under stress however as more needs to be done with the same number of
people. It will be difficult for them to keep up the increased number of checks
for long. Additional resources will be required, for example from the military.
Question is, does security really improve by these additional checks or could a
more differentiated approach offer more security (lower risk) with less effort?<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<h4>
<span lang="EN-GB">How has
airport security evolved?</span></h4>
<div class="MsoNormal">
<span lang="EN-GB">If I take a
plane to my holiday destination …I need to take of my coat, my shoes, and my
belt, get my laptop and other electronic equipment out of my back, separate the
chargers and batteries, hand in my excess liquids, empty my pockets, and step through
a security scanner. This takes time, and
with an increasing numbers of passengers waiting times will increase. We all
know these measures are necessary to keep us save but taking a trip abroad
doesn’t start very enjoyable. These measures have been adopted to prevent the same
attack from happening again and has resulted in the current rule based system of
security checks. Over the years the number of security measures has increased enormously,
see for example the <a href="https://www.tsa.gov/timeline">timeline </a>on the TSA website, making it a resource heavy activity
which can’t be continued in the same way in the near future. A smarter way is
needed. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<h4>
<span lang="EN-GB">Risk Based
Screening</span></h4>
<div class="MsoNormal">
<span lang="EN-GB">At present most
airports apply the same security measures to all passengers, a one size fits
all approach. This means that low risk passengers are subject to the same
checks as high risk passengers. This implies that changes to the security checks
can have an enormous impact on the resources requirements. Introducing a one minute
additional check by a security officer to all passengers at Schiphol requires
354 additional security officers to check 170.000 passengers. A smarter way would be to apply different
measures to different passenger types, high risk measures to high risk passengers
and low risk measures to low risk passengers. This risk based approach is at
the foundation of SURE! (Smart Unpredictable Risk Based Entry) a concept introduced
by the NCTV (The National Coordinator for Security and Counterterrorism) Consider
this, what is more threatening, a non-threat passenger with banned items (pocket
knife, water bottle) or a threat passenger with bad intentions (and no banned
items). I guess you will agree that the latter is the more threatening one and this
is exactly where risk based screening focusses on. Key component in risk based security is to
decide what security measures to apply to which passenger, taking into account
that attackers will adapt their plans when additional security measures are installed.
<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<h4>
<span lang="EN-GB">Operations
Research helps safeguard us</span></h4>
<div class="MsoNormal">
<span lang="EN-GB">The concept
of risk based screening makes sense as scarce resources like security officers,
MP’s and scanners are utilized better. In the one size fits all approach a lot
of these resources are used to screen low risk passengers and as a consequence
less resources are available for detecting high risk passengers. Still, even
with risk based screening trade-offs must be made as resources will remain scarce.
Also decisions need to be made in an uncertain and continuously changing environment,
with little, false or no information. Sound familiar? This is the exactly the
area where Operations Research shines. Decision making under uncertainty can
for example be supported by simulation, Bayesian belief networks, Markov
decision and control theory models. Using game theoretic concepts the behaviour
of attackers can be modelled and incorporated, leading to the identification of
new and robust counter measures. Queuing theory and waiting line models can be
used to analyse various security check configurations (for example centralised
versus decentralised, and yes centralised is better!) including the required staffing.
This will help airports to develop efficient and effective security checks
limiting the impact on passengers while achieving the highest possible risk
reduction. These are but a small number of examples where OR can help, there
are many more.</span></div>
<div class="MsoNormal">
<br /><span lang="EN-GB"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHNdVlUJ5_V1kFzr6xZdGZewHjlC9PPn6J5K0WYbn9yvD2O9sv2fRBBIhCPv3PoEk13T6nVKojht-qVdAEwm__sPODbS3BPu56wQ4LsAhMDQfdw3dfYoV-K2IfcWA7zKwLZOYpQA-Yrm0m/s1600/security+check+schiphol.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="197" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHNdVlUJ5_V1kFzr6xZdGZewHjlC9PPn6J5K0WYbn9yvD2O9sv2fRBBIhCPv3PoEk13T6nVKojht-qVdAEwm__sPODbS3BPu56wQ4LsAhMDQfdw3dfYoV-K2IfcWA7zKwLZOYpQA-Yrm0m/s400/security+check+schiphol.JPG" width="400" /></a></td></tr>
</tbody></table>
</span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td class="tr-caption" style="text-align: center;"></td></tr>
</tbody></table>
<br />
<div class="MsoNormal">
<span lang="EN-GB">Some of the
concepts of risk based security checks, resulting from the SURE! Programme are
already put into practice. Schiphol is working towards centralised security and
recently opened <a href="https://www.schiphol.nl/B2B/RouteDevelopment/NewsPublications1/RouteDevelopmentNews/AmsterdamAirportSchipholNewSecurityControlEnhancesComfort.htm">the security check point of the future</a> for passengers traveling
within Europe. It’s good to know that the decision making rigour comes from
Operations Research, resulting in effective, efficient and passenger friendly security
checks. <o:p></o:p></span></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-61325771080605112402016-07-21T17:31:00.000+02:002016-07-21T17:31:19.098+02:00Towards Prescriptive Asset Maintenance<div class="MsoNormal">
<span lang="EN-GB">Every
utility deploys capital assets to serve its customers. During the asset life cycle an asset manager repetitively
must make complex decisions with the objective to minimise asset life cycle
cost while maintaining high availability and reliability of the assets and
networks. Avoiding unexpected outages, managing risk and maintaining assets
before failure are critical goals to improve customer satisfaction. To better manage
asset and network performance utilities are starting to adopt a data driven approach.
With analytics they expect to lower asset life cycle cost while maintaining
high availability and reliability of their networks. Using actual performance data,
asset condition models are created which provide insight on the asset deterioration
over time and what the driving factors of deterioration are. With this insights
forecasts can be made on the future asset and network performance. These models
are useful, but lack the ability to effectively support the asset manager in
designing a robust and cost effective maintenance strategy.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">Asset
condition models allow for the ranking of assets based on their expected time
to failure. Within utilities it is common practice to use this ranking in
deciding which assets to maintain. By starting at the assets with the shortest
time to failure, assets are selected for maintenance until the budget available
for maintenance is exhausted. This prioritisation
approach will ensure that the assets most prone to failure are selected for
maintenance, however it will not deliver the maintenance strategy with the
highest overall reduction of risk. Also the approach can’t effectively handle constraints
in addition to the budget constraint. For example constraints on manpower availability,
precedence constraints on maintenance projects, or required materials or
equipment. Therefore a better way to determine a maintenance strategy is required
taking into account all these decision dimensions. More advanced analytical
methods, like mathematical optimization (=prescriptive analytics), will provide
the asset manager with the required decision support.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">In finding the
best maintenance strategy the asset manager could instead of making a ranking, list
all possible subsets of maintenance projects that are within budget and
calculate the total risk reduction of each subset. The best subset of projects to
select would be the subset with the highest overall risk reduction (or any other
measure). This way of selecting projects also allows for additional constraints,
like required manpower, required equipment or spare parts, time depended budget
limits, to be taken into account. Subsets that do not fulfil these requirements
are simply left out. Also, subsets could be constructed in such a manner that mandatory
maintenance projects are included. With
a small number of projects this way of selecting projects would be possible, 10
projects would lead to 1024 (=2^10) possible subsets. But with large numbers
this is not possible, a set of 100 potential projects would lead 1.26*10^30 possible
subsets which would take too much time, if possible at all, to construct and
evaluate them all. This is exactly where
mathematical optimisation proofs its value because it allows you to implicitly
construct and evaluate all feasible subsets of projects, fulfilling not only
the budget constraint but any other constraint that needs to be included. Selecting
the best subset is achieved by using an objective function which expresses how
you value each subset. Using mathematical optimisation assures the best
possible solution will be found. Mathematical optimisation has proven its value
many times in many industries, also in Utilities, and disciplines, like maintenance.
MidWest ISO for example uses optimisation techniques to continuously balance
energy production with energy consumption, including the distribution of
electricity in their networks. Other asset heavy industries like petrochemicals
use optimisation modelling to identify cost effective, reliable and safe
maintenance strategies.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/sz7C-aeO7Wc/0.jpg" src="https://www.youtube.com/embed/sz7C-aeO7Wc?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<br />
<div class="MsoNormal">
In improving
their asset maintenance strategies, utilities best next step is to adopt
mathematical optimisation. It allows them to leverage the insights from their
asset condition models and turn these insights into value adding maintenance decisions.
Compared to their current rule based selection of maintenance projects in which
they can only evaluate a limited number of alternatives, they can significantly
improve as mathematical optimisation lets them evaluate trillions (possibly
all) alternative maintenance strategies within seconds. Although “rules of
thumb”, “politics” and “intuition” will always provide a solution that is “good”,
mathematical optimisation assures that The Best solution will be found. </div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-47618693788975180352016-07-19T10:54:00.000+02:002016-07-19T10:54:16.595+02:00Big Data Headaches<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh99KvediH2Gv9QSc2UNm21idCHpLy3qyj9IvuA10AsXE69NzFvW0G3bHqL1WwDvl4cRSgL3tboLFNv3FYAp-GdPreCwQ1uTdFn6zKhRb8tpJAWCK2VWA4DmStGOBusOf8OqJmq1tsNcPkE/s1600/big+data+relief.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="213" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh99KvediH2Gv9QSc2UNm21idCHpLy3qyj9IvuA10AsXE69NzFvW0G3bHqL1WwDvl4cRSgL3tboLFNv3FYAp-GdPreCwQ1uTdFn6zKhRb8tpJAWCK2VWA4DmStGOBusOf8OqJmq1tsNcPkE/s320/big+data+relief.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">http://tinyurl.com/jeyjtna</td></tr>
</tbody></table>
<div class="MsoNormal">
<span lang="EN-GB">Data driven
decision making has <a href="http://ebusiness.mit.edu/research/papers/2011.12_Brynjolfsson_Hitt_Kim_Strength%20in%20Numbers_302.pdf">proven</a> to be key for organisational performance
improvements. This stimulates organisations to gather data, analyse it and use
decision support models to improve their decision making speed and quality. With
the rapid decline in cost of both storage and computing power, there are nearly
no limitations to what you can store or analyse. As a result organisations have
started building data lakes and invested in big data analytics platforms to
store and analyse as much data as possible. This is especially true in the consumer
goods and services sector where big data technology can been transformative as it
enables a very granular analysis of human activity (up to the personal level). With
these granular insights companies can personalise their offerings, potentially increasing
revenue by selling additional products or services. This allows for new
business models to emerge and is changing the way of doing business completely.
As the potential of all this data is huge, many organisations are investing in big
data technology expecting plug and play inference to support their decision making.
The big data practice however is something different and is full of rude
awakenings and headaches.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">That big data
technology can create value is proven by the fact that companies like Google, Facebook
and Amazon exist and do well. Surveys from <a href="http://www.gartner.com/newsroom/id/2848718">Gartner </a>and <a href="http://www.idc.com/getdoc.jsp?containerId=prUS40560115">IDC </a>show that the number
of companies adopting big data technology is increasing fast. Many of them want
to use this technology to improve their business and start using it in an exploratory
manner. When asked about the results they get from their analysis many of them
respond that they experience difficulty in getting results due to data issues,
others report difficulty getting insights that go beyond preaching to the choir.
Some of them even report disappointment as their outcomes turn out to be wrong
when put into practice. Many times the lack of experienced analytical talent is
mentioned as a reason for this, but there is more to it. Although big data has
the potential to be transformative, it also comes with fundamental challenges
which when not acknowledged can cause unrealistic expectations and disappointing
results. Some of these challenges are even unsolvable at this time</span>.</div>
<div class="MsoNormal">
<br /></div>
<h4>
<span lang="EN-GB">Even if
there is a lot of data, it can’t be used properly</span></h4>
<div class="MsoNormal">
<span lang="EN-GB">To
illustrate some of these fundamental challenges, let’s take an example of an online
retailer. The retailer has data on its customers and uses it to identify generic
customer preferences. Based on the identified preferences offers are generated
and customers targeted. The retailer wants to increase revenue and starts to
collect more data on the individual customer level. The retailer wants to use
the additional data to create personalised offerings (the right product, at the
right time, for the right customer, at the right price) and to make predictions
about future preferences (so the retailer can restructure its product portfolio
continuously). In order to do so the retailer needs to find out what the
preferences of its customers are and the drivers of their buying behaviour. This
requires constructing and testing hypotheses based on the customer attributes
gathered. In the old situation the number of available attributes (like address,
gender, past transactions) was small. Therefore only a small number of
hypothesis (for example “women living in a certain part of the city are inclined
to buy a specific brand of white wine”) can be tested to cover all possible
combinations. However with the increase in the number of attributes, the number
of combinations of attributes that are to be investigated increases
exponentially. If in the old situation the retailer had 10 attributes per
customer, a total of 1024 (=2<sup>10</sup>) possible combinations needed to be evaluated.
However when the number of attributes increases to say 500 (which in practice
is still quite small), the number of possible combinations of attributes increases
to 3.27 10<sup>150 </sup> (=2<sup>500</sup>)
This exponential growth causes computational issues as it becomes impossible to
test all possible hypotheses even with the fastest available computers. The
practical way around this is to significantly reduce the number attributes
taken into account. This will leave much of the data unused and many possible
combinations of attributes untested, therefore reducing the potential to improve.
This might also cause much of the big data analysis results to be too obvious.</span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVz4zGl2MT2-jjg2vhMBDsE4zc_wB5y_ZiU5RCb5RvfZNTqMUBPClnTYBZntJUxN6lhz4iML3uyms8JRNK1-M0VzuPdDcXabfyIvnC7QtFGp-dA39DlgbWDIDShvQb25qmzsg03cqh0ray/s1600/Dilbert+Big+Data.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="185" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVz4zGl2MT2-jjg2vhMBDsE4zc_wB5y_ZiU5RCb5RvfZNTqMUBPClnTYBZntJUxN6lhz4iML3uyms8JRNK1-M0VzuPdDcXabfyIvnC7QtFGp-dA39DlgbWDIDShvQb25qmzsg03cqh0ray/s400/Dilbert+Big+Data.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><div class="MsoNormal">
<a href="http://dilbert.com/strip/2012-07-29"><span lang="EN-GB">http://dilbert.com/strip/2012-07-29</span></a><span class="MsoHyperlink"><span lang="EN-GB"><o:p></o:p></span></span></div>
</td></tr>
</tbody></table>
<h4>
<span lang="EN-GB">The larger
the data set, the stronger the noise</span></h4>
<div class="MsoNormal">
<span lang="EN-GB">There is
another problem with analysing large amounts of data. With the increase in the
size of the data set, all kinds of patterns will be found but most of them are
going to be just noise. Recent <a href="http://www.di.ens.fr/users/longo/files/BigData-Calude-LongoAug21.pdf">research </a>has provided proof that as data sets
grow larger they have to contain arbitrary correlations. These correlations
appear due to the size, not the nature, of the data, which indicates that most of
the correlations will be spurious. Without proper practical testing of the
findings, this could cause you to act upon a phantom correlation. Testing all
the detected patterns in practice is impossible as the number of detected
correlations will increase exponentially with the data set size. So even though
you have more data available you’re worse of as too much information behaves
like very little information. Besides the increase of arbitrary correlations in
big data sets, testing the huge number of possible hypotheses is also going to
be a problem. To illustrate, using a significance level of 0.05, testing 50
hypothesis on the same data will give at least one significant result with a
92% chance. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal" style="text-align: center;">
P(at least one significant result) = 1 − P(no significant
results) = 1 − (1 − 0.05)<sup>50</sup> ≈ 92%<span lang="EN-GB"><o:p></o:p></span></div>
<div class="MsoNormal" style="text-align: center;">
<br /></div>
<div class="MsoNormal">
<span lang="EN-GB">This
implies that we will find an increasing number of statistical significant
results due to chance alone. As a result the number of False Positives will
rise, potentially causing you to act upon phantom findings. Note that this is
not only a big data issue, but a small data issue as well. In the above example
we already need to test 1024 hypotheses with 10 attributes.</span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<h4>
<span lang="EN-GB">Data driven
decision making has nothing to do with the size of your data </span></h4>
<br />
<div class="MsoNormal">
<span lang="EN-GB">So, should the
above challenges stop you from adopting data driven decision making? No, but be
aware that it requires more than just some hardware and a lot of data. Sure,</span><span lang="EN-GB"> </span><span lang="EN-GB">with a lot
of data and enough computing power significant patterns will be detected even
if you can’t identify all the patterns that are in the data. However, not many
of these patterns will be of any interest as spurious patterns will vastly
outnumber the meaningful ones. Therefore,
with the increase in size of the available data also the skill level for analysing
the data needs to grow. In my opinion data and technology (even a lot of it) is
no substitute for brains. The smart way to deal with big data is to extract and
analyze key information embedded in “mountains of data” and to ignore most of
it. You could say that you first need to trim down the haystack to better locate
where the needle is. What remains are collections of small amounts of data that
can be analysed much better. This approach will prevent you from getting a big headache
from your big data initiatives and will improve both speed and quality of drive
data driven decision within your organisation.<o:p></o:p></span></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-68731741077073201842016-04-29T16:54:00.000+02:002016-04-29T16:54:38.322+02:00Is Analytics losing its competitive edge?<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUWyY1voNWkav_o6nbEqXFsVn8zUOUWN0_Q68RJ1jgzW7ejSF2gZyL_G4vOvPOWhgauE8zMuH5CsXN2hW3VRA1SCHDBZHw3WnkmhXYxIfY2H8JtADIdJkXNnks1UOED-A3VrjXctaBZO8G/s1600/Competing+on+Analytics+v2.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUWyY1voNWkav_o6nbEqXFsVn8zUOUWN0_Q68RJ1jgzW7ejSF2gZyL_G4vOvPOWhgauE8zMuH5CsXN2hW3VRA1SCHDBZHw3WnkmhXYxIfY2H8JtADIdJkXNnks1UOED-A3VrjXctaBZO8G/s320/Competing+on+Analytics+v2.jpg" width="320" /></a></div>
<div class="MsoNormal">
Since Tom Davenport wrote his ground-breaking HBR article on<a href="https://hbr.org/2006/01/competing-on-analytics"> Competing on Analytics</a> in 2006 a lot has changed in how we think about data and analytics and its impact on decision making. In the past 10 years the amount of data has gone sky high due to new technological developments like the Internet of Things. Also, <a href="http://www.mkomo.com/cost-per-gigabyte-update">data storage costs</a> have plummeted so we no longer need to choose whether we would like to store the data or not. Analytics technology has become readily available. Open source platforms like <a href="https://www.knime.org/">KNIME</a> and <a href="https://cran.r-project.org/">R</a> have lowered the adoption thresholds, providing access to state of art analytical methods to everyone. To monitor the impact of these developments on the way organisations use data and analytics MIT Sloan Management review sends out a survey on a regular basis. Recently they published their most recent findings in <i><a href="http://sloanreview.mit.edu/projects/the-hard-work-behind-data-analytics-strategy/">Beyond the Hype: The hard work behind analytics success</a></i>. One of the key findings is that analytics seems to be losing its competitive edge.</div>
<div class="MsoNormal">
<br /></div>
<h4>
Analytics has become table stakes</h4>
<div class="MsoNormal">
Comparing their survey results over several years MIT Sloan reports a decrease in the past 2 years in the number organisations that gained a competitive advantage in using analytics. An interesting finding, especially now when organisations seems to be set to leverage on the investments they have done in (big) data platforms, visualisation and analytics software. An obvious explanation for this decline is that more organisations are using analytics in their decision making, therefore it lowers the competitive advantage. In other words analytics has become table stakes. The use of analytics in decision making has become a required capability for some organisations to stay competitive. For example in the hospitality and airlines industry. All companies in those industries use analytics extensively to come up with the best offer for their customers. Without the extensive use of analytics they would not be able to compete. There are however more reasons for the reported decline in competitive advantage.</div>
<div class="MsoNormal">
<br /></div>
<h4>
Step by step </h4>
<div class="MsoNormal">
From the MIT Sloan report, several of the reported reasons for having difficulty in gaining a competitive edged with analytics are related to organisational focus and culture. The survey results show that this is due to lack of senior sponsorship. Also, senior management doesn’t use analytics in their strategic decision making. As a consequence there are only localised initiatives that have little impact. I see this happen in a lot of organisations. Many managers see value in using analytics in decision making but have difficulty convincing senior management in supporting them. There can be many reasons for that. It could be that senior management simply doesn’t not know what to expect from analytics and therefore avoid investing time and money in an activity with uncertain outcome. It could also be that the outcomes of analytics models are so counterintuitive senior management simple can’t believe the outcomes. There are several ways to change this and benefit more from analytics than just in local initiatives. Key is to take a step by step approach, starting with the current way of decision making and gradually introduce analytics to improve it. Simple steps with measurable impact. That way senior management can familiarise itself with what analytics can do and gain confidence in its outcomes. It can take some time, but each step will be an improvement and will grow the analytical competitiveness of the organisation.</div>
<div class="MsoNormal">
<br /></div>
<h4>
Investing in People</h4>
<div class="MsoNormal">
One other main reason from the survey for having difficulty in gaining an edge with analytics is that organisations don’t know how to use the analytics insights. One important reason for this to happen is that analytics projects are not well embedded in a business context. Driven by the ambition to use data and analytics in decision making, organisations rush into doing analytics projects without taking enough time to assure the project addresses an important enough business issue, has clear objectives and scope and implementation plan. As a results insights from the analytics project are knocks on an open door or are too far of what the business needs or its unclear what to do with the outcomes.</div>
<div class="MsoNormal">
Another reason I come across often is that analytics projects are started from the technology perspective: “We have bought analytics software, now what can we do with it?”. It should be the other way around. The required analytics software comes after understanding the business issue and the conditions under which it needs to be solved. Therefore analytics is more than buying software or hardware, people need to be trained to recognise business issues that can be solved from an analytics perspective and be able to choose the appropriate analytical methods and tools. The training will also result in a common understanding of the value of analytics for the organisation which in turn will help change the current way of decision making into one that incorporates the analytics insights.</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHKkVoTl9rUNIVyQHRgTAcuUqOHKtTjLx1MtqV8EJLF-P0P3cIauSlfTTBUue8UO8k4uvnzjt3489A49f7Zl8ZklUPzOdJDbnb122tPrUG8TjQo7ouGlAOUugMzAOOJ6YdikcgNcw_FkAl/s1600/satir+change+curve.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="301" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHKkVoTl9rUNIVyQHRgTAcuUqOHKtTjLx1MtqV8EJLF-P0P3cIauSlfTTBUue8UO8k4uvnzjt3489A49f7Zl8ZklUPzOdJDbnb122tPrUG8TjQo7ouGlAOUugMzAOOJ6YdikcgNcw_FkAl/s320/satir+change+curve.JPG" width="320" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
So, has analytics become less competitive? The picture I get from the above reasons is that most organisations have difficulty changing into a new and more analytical way of working. Many organisations are just starting to use analytics, the MIT Sloan survey reports conforms this given the significant increase in first users (the Analytically Challenged Organisations). These organisations have high expectations on what they will get from analytics but will need to go through organisational changes and changes in the way decisions are made before the benefits of using analytics become visible. This will, following <a href="http://stevenmsmith.com/ar-satir-change-model/">a Satir like change</a> curve, at first cause a decrease in productivity causing in my opinion the lower expectation on the competitive gain these organisations expect to get from using analytics. But this will change over time, and end in a new and improved productivity level. As with any new capability or technology, you first need to learn how to walk, then run and then jump</div>
<div>
<br /></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-20422423373416979842016-04-03T17:59:00.000+02:002016-04-03T17:59:09.468+02:00The most dangerous equation in the world<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzyTc99Z1E3OVeZlnW1UXEGL6xkmRe40EXy4n_c3eZSZ-iPzzbZf4yWKSINh3R9FsbhhsFprwSuD8p4IBoF8mbadLQDk2olBka5__0LxYYoIIlHdyin5nPYLrc222hukm_1eL98AbzZtGl/s1600/Lego+Car+accident.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="213" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzyTc99Z1E3OVeZlnW1UXEGL6xkmRe40EXy4n_c3eZSZ-iPzzbZf4yWKSINh3R9FsbhhsFprwSuD8p4IBoF8mbadLQDk2olBka5__0LxYYoIIlHdyin5nPYLrc222hukm_1eL98AbzZtGl/s320/Lego+Car+accident.jpg" width="320" /></a></div>
<div class="MsoNormal">
<span lang="EN-GB">Each year
Generali, one of the biggest insurers in the world, analyses the claims of its
car insurance customers in the Netherlands. Results of that analysis can be
found on their <a href="http://www.bestechauffeur.nl/">website</a>. In their analysis, Generali relates the number of claims
to where people live & drive, the age of the driver, the age of the car and
the car brand. Some of these statistics provide insights that are to be
expected. For example, you expect young drivers to have the highest claim rates,
as their analysis confirms. Cars in less populated areas have the lowest claim
rates, which seems plausible as well. There is however one finding that raised my
eyebrows and that is that drivers of specific car brands have significant
higher claim rates than others, suggesting that driving a car of a certain
brand makes you either a better or a worse driver. This year Mazda drivers had the
highest claim rates according to the Generali analysis, this was for the second
year in a row. Drivers of a Citroen had the lowest claim rate, making them the
safest drivers of 2015 according to Generali. So, is their truth in their
finding and should you therefore avoid a Mazda driver or at least not buy a
Mazda yourself? Generali’s statistics suggest you should, don’t they?<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<h4>
<span lang="EN-GB">Putting it in
perspective</span></h4>
<div class="MsoNormal">
<span lang="EN-GB">Let’s take
a closer look at the numbers. The claim rates themselves don’t tell much, but combining
them with other data will. What will be interesting is to see how the distribution
of car brands compares to the number of claims per brand. Unfortunately, but
understandable, Generali only reports the relative difference in claims of a
brand compared to the average claim rate. However, Generali claims that its
findings apply to all drivers in the Netherlands, so it’s fair to assume that
the distribution of car brands in their car insurance portfolio is similar to
the overall distribution of car brands in the Netherlands. With <a href="https://opendata.rdw.nl/">data</a> from the
Netherlands Vehicle Authority (RDW) gathers, selecting only the brands reported
by Generali, we find that 7,545,266 vehicles were registered in the Netherlands
in 2015, with the following relative distribution over brands. Clearly
Volkswagen, Opel and Peugeot are the biggest brands, while Skoda, Mitsubishi
and Mazda are the smallest brands. <o:p></o:p></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhW744M45pRVYNsJc26SHqHudZtaAnZcB8ZT1WIgXuCYhAjIYXhdB8cgV2b9mVLZhGhd99dwKV9kJDmSK2RxPMlz_t9oZhOhfTNSjqP-eMAtR3dOn4tc4CJA6jfRtlULQUblymbmjJsonvh/s1600/Rel+distribution+car+brands.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="335" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhW744M45pRVYNsJc26SHqHudZtaAnZcB8ZT1WIgXuCYhAjIYXhdB8cgV2b9mVLZhGhd99dwKV9kJDmSK2RxPMlz_t9oZhOhfTNSjqP-eMAtR3dOn4tc4CJA6jfRtlULQUblymbmjJsonvh/s400/Rel+distribution+car+brands.jpeg" width="400" /></a></div>
<h4>
Does
driving a Mazda make you the worst driver?</h4>
<div class="MsoNormal">
<span lang="EN-GB">By plotting
the population size per car brand against the relative claim performance per
car brands an interesting pattern appears, the spread in claim performance is
bigger when the population size decreases. So smaller car brands have a bigger
spread in claim performance than bigger brands. This is the result of a not so
well known statistical law, De Moivre’s Equation, which provides us with that standard
deviation of the sampling distribution of the mean, </span><span class="mi"><span lang="EN-GB" style="background: #FEFEFE; border: none windowtext 1.0pt; color: #222426; font-family: "MathJax_Math-italic",serif; font-size: 12.5pt; line-height: 107%; mso-ansi-language: EN-GB; mso-border-alt: none windowtext 0cm; padding: 0cm;">σ</span></span><span class="mi"><sub><span lang="EN-GB" style="background: #FEFEFE; border: none windowtext 1.0pt; color: #222426; font-family: "MathJax_Math-italic",serif; font-size: 9.0pt; line-height: 107%; mso-ansi-language: EN-GB; mso-border-alt: none windowtext 0cm; padding: 0cm;">x</span></sub></span><span class="mo"><span lang="EN-GB" style="background: #FEFEFE; border: none windowtext 1.0pt; color: #222426; font-family: "MathJax_Main",serif; font-size: 12.5pt; line-height: 107%; mso-ansi-language: EN-GB; mso-border-alt: none windowtext 0cm; padding: 0cm;">=</span></span><span class="mi"><span lang="EN-GB" style="background: #FEFEFE; border: none windowtext 1.0pt; color: #222426; font-family: "MathJax_Math-italic",serif; font-size: 12.5pt; line-height: 107%; mso-ansi-language: EN-GB; mso-border-alt: none windowtext 0cm; padding: 0cm;">σ</span></span><span class="mo"><span lang="EN-GB" style="background: #FEFEFE; border: none windowtext 1.0pt; color: #222426; font-family: "MathJax_Main",serif; font-size: 12.5pt; line-height: 107%; mso-ansi-language: EN-GB; mso-border-alt: none windowtext 0cm; padding: 0cm;">/</span></span><span class="msqrt"><span lang="EN-GB" style="background: #FEFEFE; border: none windowtext 1.0pt; color: #222426; font-family: "MathJax_Main",serif; font-size: 12.5pt; line-height: 107%; mso-ansi-language: EN-GB; mso-border-alt: none windowtext 0cm; padding: 0cm;">√</span></span><span class="mi"><span lang="EN-GB" style="background: #FEFEFE; border: none windowtext 1.0pt; color: #222426; font-family: "MathJax_Math-italic",serif; font-size: 12.5pt; line-height: 107%; mso-ansi-language: EN-GB; mso-border-alt: none windowtext 0cm; padding: 0cm;"> n. </span></span><span lang="EN-GB">Howard Wainer named this equation <a href="http://faculty.cord.edu/andersod/MostDangerousEquation.pdf">the most dangerous equation in the world</a> because too little people are aware of it and as
a consequence made faulty decisions with serious impact. Look at the formula we
see that the standard deviation of the mean is inversely proportional to the
square root of the sample size. As a consequence car brands with a smaller
number of cars in the Netherlands will have a larger variation in relative
claim performance than bigger brands. To illustrate, a small brand with no
claims will have the best claim performance in one year, while a small number
of claims will make it the worst performing brand the next year. For the bigger
brands this is not an issue. Note that the brand with the best claim performance <a href="https://www.generali.nl/over-ons/in-de-media/skoda-rijders-beste-chauffeurs-van-nederland/">last year was Skoda</a>, this year Skoda was among the worst performers, De
Moivre’s equation in action. <o:p></o:p></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZRwNnhJFpklI7pOKEVN86gJkxUotJNVkh2nsFDed7r5DysInvBKo5ZKR1ap9fqRVygYCLtfwuQ3e4Lw_TcEZVu1aRO1D3To8GHaLGieWDO4k-hVp4DPMHArF9SSzfwOlFxMF4i4WySnmK/s1600/Claim+perf+vs+Population+size.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="335" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZRwNnhJFpklI7pOKEVN86gJkxUotJNVkh2nsFDed7r5DysInvBKo5ZKR1ap9fqRVygYCLtfwuQ3e4Lw_TcEZVu1aRO1D3To8GHaLGieWDO4k-hVp4DPMHArF9SSzfwOlFxMF4i4WySnmK/s400/Claim+perf+vs+Population+size.jpeg" width="400" /></a></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<br /></div>
<h4>
<span lang="EN-GB">The most
dangerous equation in the world</span></h4>
<br />
<div class="MsoNormal">
<span lang="EN-GB">So, Generali’s
claim that driving a Mazda makes you the worst driver in the road is much to
strong. Whether you are a good or a bad driver depends on many things, but I
doubt that it will be the brand of your car, and would require a much more detailed
analysis. Hopefully Generali doesn’t take the brand of your car into account
when calculating their premium levels. Chances are they either over or under
price it when you choose to drive one of the rarer car brands. When you want to
avoid that, better choose one of the larger car brands as it will be unlikely
for them to end up being the worst performing category. Insurance claims are
not the only subject affected by De Moivre’s equation. Wainer <a href="http://faculty.cord.edu/andersod/MostDangerousEquation.pdf">shows </a>with some
compelling examples what the consequences can be of being ignorant of the most
dangerous equation of the world and why understanding variability is critical
to avoid serious errors.<o:p></o:p></span></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-15848815960318704352016-02-28T17:56:00.001+01:002016-02-28T17:56:27.215+01:00Will we have an Algorithm Economy?<div class="MsoNormal">
<span lang="EN-GB">We all know
it, data doesn’t have any value whether it is big or small, structured or
unstructured, available in real time or just sitting in your data warehouse.
You need to process data to create insights and then act upon them. For
example, being able to accurately forecast next year’s share price of a company
doesn’t bring you any value, unless you decide to invest (or divest). Analysing
data, creating predictions and determining the best possible action all require
algorithms. Organisations are increasingly adopting algorithms to support them in
decision making. Gartner expects that the use of algorithms will increase
heavily in the next 5 years. Gartner SVP Peter Sondergaard envisions in his 2015
keynote that by 2020 there will be marketplaces similar to app stores where
algorithms can be bought or sold. Algorithms can be bought to solve a specific
problem or create new opportunities from the exponential growth of data and the
Internet of things. Or organisations can
monetise their algorithms by selling them to other organisations. The Algorithm
Economy will bring the App Economy to analytics according to Sondergaard.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/L9F6NgGLHUk/0.jpg" src="https://www.youtube.com/embed/L9F6NgGLHUk?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<h4>
<span lang="EN-GB">Algorithms
are ill utilised</span></h4>
<div class="MsoNormal">
<span lang="EN-GB">Algorithms
are not new nor is interest in them caused by the growing amount of data or the
Internet of Things. Algorithms have been around for a quite a while, <a href="https://en.wikipedia.org/wiki/Timeline_of_algorithms">some areover 3500 years </a>old, and exist because we as humans had an interest in
solving problems in an efficient and repeatable manner. Computers have sped up
the development and use of algorithms allowing us to solve bigger and more
complex problems faster and enables the analysis of vast amounts of data. Companies
like Google, Facebook and Amazon use algorithms at the core of their business, it
has been this capability for them to have become so big and influential. This
is however not because they were the only ones with access to algorithms. Everyone
can get access to state of art algorithms as many of them are taught in university’s
maths or computer science classes. A well trained computer scientist or operation
researcher can design and implement them for you. Some of them are even for
free, open source <a href="https://www.r-project.org/">statistical package R </a>for example contains the latest
and most advanced machine learning algorithms. What is striking is that, even
though a lot of very advanced algorithms are easily accessible, not a lot of
companies seem to be using them. In a <a href="http://meetings2.informs.org/analytics2013/Advancing%20Analytics_LKart_INFORMS%20Exec%20Forum_April%202013_final.pdf">Gartner survey</a> from 2013 as little as 3%
of the interviewed companies reported using prescriptive analytics, 16% used
prescriptive analytics. So, even
though we have the algorithms available, still a lot of companies are not using
them. Why is that? <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<h4>
<span lang="EN-GB">Algorithmic
decision making requires high level of analytics maturity</span></h4>
<div class="MsoNormal">
<span lang="EN-GB">The
explanation is simple. Having the data and technology available simply is not
enough, it’s a necessary but not a sufficient condition for success. For
organisations to be successful with algorithms they need the technology, but also
require the people that understand algorithms and decision makers that are
willing to act upon the algorithm outcomes.
Acquiring the right analytics talent requires finding people with the
technical competency to design, build, assess and use algorithms, usually they
have a background in operations research, mathematics or computer science. Next
to that, the analysts must have the right business sense to understand the business
problem and the right domain knowledge. Analysts need to have well developed
communication skills so the right business requirements are identified,
otherwise the right answer to the wrong question will be found. Besides the right people that understand
algorithms, decision makers must be convinced that with algorithms they can make
better decisions. For analytics to be more than just a one-of initiative, senior
management needs to support the development of an analytical culture and
facilitate algorithm supported decision making throughout the organisation, fully
automated or in support of human decision making. They should show their trust in
algorithms by using it in their own decision making, show the benefits and
stimulate others to do so as well. The current low adoption of advanced
analytics methods shows that currently the majority of organisations either are
not mature enough or do not have the need for analytical methods. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><a href="http://www.gartner.com/newsroom/id/3192717">Gartner expects </a>that by 2018, more than half of the <a href="http://fortune.com/global500/">large organizations globally</a>
will compete using advanced analytics and proprietary algorithms, causing the
disruption of entire industries. This is quite a bold prediction (even though
it is not very precise). For that to happen, in my opinion, these organisations
must first grow their analytics maturity. So, instead of investing only in technology
companies should invest in getting the right talent and develop an analytical
culture. This will benefit them on the short term as doing analytics right will
bring immediate value. It will however
take more than the projected 2 years for more than half of the large organizations
globally to compete with algorithms in such a manner that it will disrupt
entire industries. <o:p></o:p></span></div>
<h4>
<span lang="EN-GB"><br /></span><span lang="EN-GB">A competitive
edge comes from specific algorithms not the Algorithm Economy</span></h4>
<div>
From an economic
perspective, I don’t think that there will be a huge market for algorithms. I
expect the demand in the algorithms market place to be low as these algorithms
will be general purpose algorithms like face recognition algorithms or SVM
implementations. Nice building blocks, but not the differentiator you are
looking for. For your organisation to gain a competitive edge, your algorithms need
to be unique and specific to you organisation’s business. You therefore will need
to design and build them yourself or hire people who can do that for you. That
is what Google, Facebook and Amazon did and is also the case in other
industries. Take for example pricing algorithms in the airline industry. All
major airlines have their own algorithm to optimise their ticket prices even
though there are general purpose pricing algorithms available. Reason they don’t
use those is that they don’t expect to gain a competitive edge if they would use
the technology that is available for everyone else. So, I expect the algorithm markets as envisioned
by Sondergaard to be rather small only containing general purpose algorithms. I
do think that algorithms will take the centre stage as they will be a key
enabler for companies to become and stay competitive in the future. For that,
companies not only need the technology, but should invest in the right talent
and proceed with building their analytical competences and culture. </div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-5974619455976077502016-01-05T10:45:00.000+01:002016-01-05T10:45:09.383+01:00What’s keeping you from getting optimised? <span style="font-family: inherit;">Attention for using data and analytics in decision making is at a level never seen before. Most organisations acknowledge that data is essential for them to keep track of their performance and to be able to analyse why observed and expected performance differ. Some even use it to take a prospective look into the future and prepare plans accordingly. To be able to do this, descriptive and predictive analytics methods are used and as a result these methods are becoming a common instrument in the toolbox of the business analyst. The results of analytics methods are incorporated in decision making processes more often as speed, accuracy and usability of the analysis has increased heavily due to better data management practices and the increased adoption of data visualisation/analysis software. Even though the use of descriptive and predictive analytics has brought many benefits and cost savings to organisations, it is only pocket money compared to the potential that prescriptive analytics has to offer. Still very few organisations have adopted the use of prescriptive analytics. Results of a <a href="http://meetings2.informs.org/analytics2013/Advancing%20Analytics_LKart_INFORMS%20Exec%20Forum_April%202013_final.pdf">Gartner survey </a>presented by <a href="http://www.gartner.com/analyst/40753/Lisa-Kart">Lisa Kart</a> during the 2013 INFORMS <a href="http://meetings2.informs.org/analytics2013/execforum.html">Executive Forum</a> showed that only 3% of the interviewed organisations used prescriptive analytics in their decision making. Although the number of organisations adopting prescriptive analytics will be rising, I don’t expect that the number has risen significantly in the past 2 to 3 years, which implies that a lot companies have the opportunity to unlock their unused improvement potential. This post is on why they should. </span><br />
<span style="font-family: inherit;"><br /></span>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhsu5NIdwBMgiKbiJGnsK_JzBP1IZlDXFZORZE356CnY9JmhRqK6oZWFJquzvKXI4JP22kAzdF32v46TIKu_POOeDuHcA4ifguVH9L2t3OeBtmQ_ji4rik68Es89pRVMEcca2eFDYpQXzu/s1600/advancing+analytic+INFORMS+Lisa+Kart+2013.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="290" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhsu5NIdwBMgiKbiJGnsK_JzBP1IZlDXFZORZE356CnY9JmhRqK6oZWFJquzvKXI4JP22kAzdF32v46TIKu_POOeDuHcA4ifguVH9L2t3OeBtmQ_ji4rik68Es89pRVMEcca2eFDYpQXzu/s400/advancing+analytic+INFORMS+Lisa+Kart+2013.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: xx-small;">See <span style="font-family: Calibri, sans-serif; line-height: 107%;"><a href="http://meetings2.informs.org/analytics2013/Advancing%20Analytics_LKart_INFORMS%20Exec%20Forum_April%202013_final.pdf"><span lang="EN-GB">http://meetings2.informs.org/analytics2013/Advancing%20Analytics_LKart_INFORMS%20Exec%20Forum_April%202013_final.pdf</span></a></span></span></td></tr>
</tbody></table>
<br />
<h4>
An Insight or a forecast isn’t actionable</h4>
<span style="font-family: inherit;">A data driven performance overview, insight or forecast can be useful information but has little value. That’s because the outcomes of descriptive or predictive analytics are not actionable. Real value is only created when insights and forecast are used to make better decisions. This is exactly what prescriptive analytics, a.k.a. optimisation, offers. Given your objective(s), conditions and decision variables it will provide explicit recommendations to achieve the best possible outcome. The recommendation results from considering all possible solutions to your decision problem in a smart way, not just considering a few, and choosing the one that results in the best objective value while satisfying all conditions.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">Many <a href="http://john-poppelaars.blogspot.nl/2015/06/prescriptive-analytics-next-big-step.html"><span style="font-family: inherit;">analytics overview charts</span></a> put optimisation at the top or as final step in a process of growing in analytical maturity, suggesting that predictive and descriptive analytics are prerequisites to start with optimisation. This is not the case. There are many organisations successful in optimisation without the ability or the need to forecast. For example, hospitals optimise the utilisation of human capital by constructing <a href="http://doc.utwente.nl/87812/1/thesis_E_van_der_Veen.pdf">optimal shift rosters</a> for nurses and maximise the utilisation of their operating theatres without the ability to forecast. Similarly, delivery firms construct routes for their delivery vans to <a href="http://repub.eur.nl/pub/41513">maximise vehicle utilisation</a> and customer service while minimising cost per km. </span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">Prescriptive analytics translates a business decision into a mathematical model and uses optimisation algorithms or simulation to find the best answer. With a mathematical model the analysis becomes repeatable and can be re-done quickly, for example re-optimising the production schedule when certain demand conditions change. This brings agility to an organisation and allows it to quickly adapt to new conditions. Also, a mathematical model solidifies knowledge of specialists which enables decision makers to use that knowledge and take action without having to be a specialist themselves.</span><br />
<br />
<h4>
The whole is better than the sum of parts</h4>
<div>
<span style="font-family: inherit;">Some mathematical optimisation problems are easy, but many of them become unsolvable as they increase in size. This is called the combinatorial explosion, expressing that time to solve the problem grows exponentially in problem size. The size of the mathematical problem that can be solved is therefore depended on the computing power available. Luckily the speedup of computing power is tremendous, for example my <a href="https://www.macnn.com/articles/11/05/10/ipad.2.benches.as.fast.as.cray.2.from.1985/">4 year old iPad 2 </a>has the same computing power as a CRAY2 supercomputer from 1985. This speedup, together with the <a href="http://john-poppelaars.blogspot.nl/2015/04/whats-stronger-than-moores-law.html">progress in algorithmic optimisation, </a>gives us the ability to solve larger and more complex mathematical problems than 30 years ago. To illustrate, decision problems that would have taken 85 years or ~45 million minutes to solve on the hardware and algorithmic capabilities of 1988 can now be solved within 1 (!) minute. In the 80’s and 90’s large decision problem had to be broken up into smaller parts and solved separately. With the progress in technology we don’t need to break up models into smaller arts but can solve decision problems covering multiple departments in an organisation or across organisations in supply chains in one go. This holistic way of optimisation most certainly will lead to better decisions as more relevant conditions are taken into account and it will consider more possible solutions</span>.</div>
<br />
<h4>
Optimisation delivers real value</h4>
<span style="font-family: inherit;">Attention for optimisation is rising, Gartner and other analyst firms signal that attention for this technology is growing. Optimisation has been around for over 70 years and has proven its value many times. To illustrate, each year INFORMS organises a competition in which the best business applications of analytics compete for the <a href="https://www.informs.org/Recognize-Excellence/Franz-Edelman-Award">Franz Edelman Award</a>. As a former participant and winner, together with TNT Express, I can tell it’s a tough competition where only the best analytics practitioners have a chance to win. Illustrative for the value optimisation can bring is a graph that contains the benefits of the selected finalists of the Edelman competition. Measured since 1972 total benefits exceed $223 Billion! What is interesting is that the graph seems to level up a bit, indicating that the reported benefits are rising. The benefits from optimisation are not only monetary, it increases the agility of an organisation, allows for continuous improvement, stimulates knowledge sharing, and leads to changes that improve health, safety, cooperation, decision making, and job satisfaction.</span><br />
<span style="font-family: inherit;"><br /></span>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqvw5ZfNnsxHSeU-CcriSgQQIRcnKW6JP2NcmUq_i27q91U0J_M0r2bZ_uvIKqfmZdB08tHSxIy9JR0FN3Jw2wtqXWsjnS0DpZebZLvCL7JYeA68Z3msb1nvnE76qsmN2y2SCAL3lQwltw/s1600/Edelman+223+billion+benefit.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqvw5ZfNnsxHSeU-CcriSgQQIRcnKW6JP2NcmUq_i27q91U0J_M0r2bZ_uvIKqfmZdB08tHSxIy9JR0FN3Jw2wtqXWsjnS0DpZebZLvCL7JYeA68Z3msb1nvnE76qsmN2y2SCAL3lQwltw/s400/Edelman+223+billion+benefit.JPG" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: xx-small;">From https://www.informs.org/content/download/305209/2925970/file/Edelman%20Gala%202015.pdf</span><div class="MsoNormal">
<span lang="EN-GB"><span style="font-size: xx-small;"><o:p></o:p></span></span></div>
</td></tr>
</tbody></table>
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">Although prescriptive analytics is very powerful, it is no substitute for human brainpower, experience and or judgment. In the end a mathematical model is a simplified version of reality and can’t possible cover all aspects of a decision. In my experience the best results are achieved when decision makers are supported by prescriptive analytic models in their decision making. The results from Edelman finalist proof that. As complexity and speed of decision making is growing you need a better tool than just descriptive or predictive analytics. You need actionable insights, access to prescriptive modelling therefore is a must. So, what is keeping you?</span><br />
<div>
<br /></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-52269185146214602802015-10-28T09:05:00.002+01:002015-10-28T09:35:46.088+01:00How bad is having a bacon sandwich really? <table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj548006_yqWxiWEdB243-NEfR049pc2H7T_jJNM6-XLTaRUfPGqN9NfibVXMyxPXzT7t8quracy4Tp0g5UuybUoqFpLjPoXsVn4rezqdIn6s9R7ZSPbaC0G02Dzm6AdBwBREMm_yx0CLt/s1600/stove_ownership.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="277" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj548006_yqWxiWEdB243-NEfR049pc2H7T_jJNM6-XLTaRUfPGqN9NfibVXMyxPXzT7t8quracy4Tp0g5UuybUoqFpLjPoXsVn4rezqdIn6s9R7ZSPbaC0G02Dzm6AdBwBREMm_yx0CLt/s320/stove_ownership.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><div class="MsoNormal">
<span lang="EN-GB"><a href="http://imgs.xkcd.com/comics/stove_ownership.png">http://imgs.xkcd.com/comics/stove_ownership.png</a><o:p></o:p></span></div>
</td></tr>
</tbody></table>
<div class="MsoNormal">
<span lang="EN-GB">Reading this
week’s <a href="http://www.theguardian.com/society/2015/oct/26/bacon-ham-sausages-processed-meats-cancer-risk-smoking-says-who">headlines</a> warning us that eating bacon and sausages cause bowel cancer
will probably turn you into a vegetarian instantly. But the way this message
has been put forward is very misleading. The headlines are referring to a <a href="http://www.iarc.fr/en/media-centre/pr/2015/pdfs/pr240_E.pdf">press release</a> of the International Agency for Research on Cancer (IARC) in
which processed and cured meats like bacon are classified as group 1
carcinogens. The IARC reached that conclusion after carefully studying research
which convinced them that there is a causal link between consuming these meats
and bowel cancer. Eating 50 grams of processed meat a day will increase your
<a href="http://www.thelancet.com/pdfs/journals/lanonc/PIIS1470-2045(15)00444-1.pdf">risk of bowel cancer by 18%.</a> To put things in perspective, group 1
carcinogens also include tobacco, alcohol, arsenic and asbestos, all known to
cause certain cancers. So, is having a bacon sandwich as risky as smoking a
cigarette or having a drink? Before diving into the interpretation of the
outcome let’s first have a look at the research conducted. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<h4>
<span lang="EN-GB">It’s just
an opinion</span></h4>
<div class="MsoNormal">
<span lang="EN-GB">The above estimated
risk increase originates from <a href="http://edepot.wur.nl/214252">a meta-analysis</a> of prospective studies on
meat consumption in relation to bowel cancer published in 2011, it’s a “mashup”
of research conducted in the past. In a prospective study a group of
individuals with a meat rich diet is monitored over a period of time and compared
to a control group that has a different diet (no meat, or to a much lesser
extend). So, a relative risk is estimated. To have a like for like comparison
between the two groups, as this is not a controlled experiment, the estimated
relative risks are corrected with factors like age, BMI, alcohol consumption,
sex, hypertension, diabetes, etc. As different studies probably correct for
different factors and in different ways, it’s at least questionable whether the
results from different studies can be compared. Are apples being compared to
apples? More important however, there might be a factor not included in the
corrections that explains why an individual eats more meat (red or cured) and develops
bowel cancer. This is also known as <a href="https://en.wikipedia.org/wiki/Confounding">confounding</a>, which leads to <a href="http://www.tylervigen.com/spurious-correlations">spurious correlations</a>. So, the IARC found that there is a positive correlation
between meat consumption and bowel cancer, but can’t provide the proof that the
relation is a causal one. It’s the opinion of the IARC. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<h4>
<span lang="EN-GB">Is eating
bacon as risky as smoking? </span></h4>
<div class="MsoNormal">
<span lang="EN-GB">What
everybody should know is that the classifications of the IARC are based on
strength of the evidence <b>not</b> on the degree
of risk. So, two risk factors could be classified similarly even if one causes
many more types of cancers than the other. Therefore bacon ends up in the same
class as cigarettes and asbestos. These classifications are not meant to convey
<b>how dangerous</b> something is, just <b>how certain</b> we are that something is
dangerous. I can imagine that this way of communicating the risk is really
confusing to anyone trying to work out how to lead a healthy life. <a href="http://scienceblog.cancerresearchuk.org/2015/10/26/processed-meat-and-cancer-what-you-need-to-know/">CancerResearch UK</a> is doing a much better job, by making explicit what the
risks are. Clearly smoking is much riskier than eating a bacon
sandwich.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfFJ03lXRVSgEtV8LRHUPgqjMXRVzE_y75BOUo-qa_r1c814GC25jxFc2Z407SfIvGlrbShyu-FSJJiXvw73J1mJNvxotlf37uu3eJVsj06Vq1kEMULQ3NNwxpoHAu-grFA4OM1_iZl72G/s1600/151026-Tobacco-vs-Meat-UPDATE.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfFJ03lXRVSgEtV8LRHUPgqjMXRVzE_y75BOUo-qa_r1c814GC25jxFc2Z407SfIvGlrbShyu-FSJJiXvw73J1mJNvxotlf37uu3eJVsj06Vq1kEMULQ3NNwxpoHAu-grFA4OM1_iZl72G/s400/151026-Tobacco-vs-Meat-UPDATE.png" width="312" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Source : Cancer Research UK</td></tr>
</tbody></table>
<div class="MsoNormal">
<br /></div>
<h4>
<span lang="EN-GB">Should you change
your diet?</span></h4>
The IARC
reports an 18% increased risk of getting bowel cancer when eating 50 grams of
cured meat a day. But what does this mean? Increased risk compared to what? Clearly
the missing information to know whether we should be worried about the increase
in risk is how many people will get bowel cancer even if they don’t eat bacon
sandwiches. If there is a large risk, an
18% increase will be a lot worse than when the risk is small. The risk for an
individual to get bowel cancer in the Netherlands is about <a href="http://www.darmkanker.info/pages/view.php?page_id=55">1 in 20</a>, or 5%. It’s
this risk of people getting bowel cancer that increases. So the risk becomes 5.9%
instead a 5%. In absolute numbers, when 100 people eat a bacon sandwich every
day, the number of people diagnosed with bowel cancer will increase from 5 to
6. So the 18% increase sounds a lot, but in absolute terms we should expect
just one extra person to get bowel cancer. What do you think? Will this stop you from eating bacon
sandwiches? I’m not too fond of them, but with the above you should be able to
decide if you want one. And in case you do, enjoy!@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-77083376023425387262015-08-31T14:34:00.000+02:002015-08-31T14:52:41.861+02:00HR Analytics: Breaking through the wall in HR measurement<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg13Ch_xn721FYL4GnDmO9zOUh8tHmSW3dRs0pdEXShkLvr0v5ju9KmNjfQMxS41nZbJfDGrViLXaOP2NqPUspbT6dhF-IJjrE0ZdPtqvJqPTLImu2QPX1Auv_ZO-5Z4ywKJZoIO19KimZC/s1600/HR+measurement+wall.JPG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="252" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg13Ch_xn721FYL4GnDmO9zOUh8tHmSW3dRs0pdEXShkLvr0v5ju9KmNjfQMxS41nZbJfDGrViLXaOP2NqPUspbT6dhF-IJjrE0ZdPtqvJqPTLImu2QPX1Auv_ZO-5Z4ywKJZoIO19KimZC/s400/HR+measurement+wall.JPG" width="400" /></a></div>
<div class="MsoNormal">
<span lang="EN-GB">Data and
analytics are key to solving all kinds of business problems. Already, many
organisations are using data and analytics to gain insights on their
performance and use mathematical models to find viable directions for
improvement while keeping track of the gains of this fact based way of decision
making. Organisations apply analytics to all kinds of challenges in business
areas like Operations, Customer Services and Marketing & Sales. The business
area that seems to lag behind in using these advanced methods is Human
Resources (HR). Of course a lot of HR related data is being gathered. However,
much of the current HR related analytics create little impact. Even though more
and more data is gathered and more sophisticated analysis becomes possible, HR
rarely drives a strategic change. As <a href="https://books.google.nl/books/about/Investing_in_People.html?id=Xy9962AuIHQC&redir_esc=y">Boudreau and Cascio </a>indicate in <i>Investing
in People</i>: “There is increasing sophistication in technology, data
availability, and the capacity to report and disseminate HR information, but
investments in HR data systems, scorecards and ERP fail to create strategic
insights needed to drive organisational effectiveness. In short, many
organisations are “hitting the wall” in HR measurement.” It’s my conviction
that HR will be able to demolish this wall of measurement by turning to use advanced
analytical methods and as a result increase its impact on organisational decision
making and performance. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">In
practice, much of the HR related analytics happens in Business Intelligence (BI)
tools like SpotFire, Tableau, ClickView or even MS Excel. These are great at accomplishing
routine production of HR related reports and dashboards, but do not provide the
support required to for example find the drivers for employee satisfaction or
steer preventive measures to reduce turnover. To find those, predictive
analytics capabilities are requires which BI tools typically don’t offer nor
will the typical dashboard user have the capability to use these methods wisely.
A BI tool will allow drill downs and supports the analyses of KPI’s of subgroups,
but will not provide the explanation why this subgroup has these scores. You
need to find your own explanation (or adopt a belief) which may be incorrect
causing you to set up expensive change programs, possible addressing the wrong
issues. To find the real drivers and causes, more advanced analytics tools and
skill are required. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">To illustrate,
let’s have a look at employee turnover (data can be found <a href="https://community.watsonanalytics.com/wp-content/uploads/2015/03/WA_Fn-UseC_-HR-Employee-Attrition.csv">here</a>). Being able to
understand the drivers of employee turnover and predict who is going to leave is
of crucial importance to any company. <a href="http://www.eremedia.com/tlnt/what-was-leadership-thinking-the-shockingly-high-cost-of-employee-turnover/">It is estimated</a> that for entry-level
employees the costs of replacing them is between 30% and 50% of their annual
salary. For mid-level employees, it costs upwards of 150% of their annual
salary and for high-level or highly specialized employees, you're looking at
400% of their annual salary. Clearly understanding the drivers and being able
to react on them can be a huge cost saver. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikDUXaxtVgqZATh1IV4Gj7qGOpPdubGHUzTw4Ld51_nhHTi6zN8421BMMhlCUg86WRv-LIsV9Xr-wYU-pAojYpWuah_bW-YdOndCImNYLDiJiUIcwQOCF3svLIXCAy-OsL7Rx6hfudUwJY/s1600/resignation+histogram.JPG" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikDUXaxtVgqZATh1IV4Gj7qGOpPdubGHUzTw4Ld51_nhHTi6zN8421BMMhlCUg86WRv-LIsV9Xr-wYU-pAojYpWuah_bW-YdOndCImNYLDiJiUIcwQOCF3svLIXCAy-OsL7Rx6hfudUwJY/s400/resignation+histogram.JPG" width="393" /></a></div>
<div class="MsoNormal">
A typical
way of how employee turnover is presented in BI tools is by means of a
histogram. The histogram clearly shows that a lot of employees leave the
company in the second year. This is especially true for the Human Resources and
the Research and Development department. Also, at the Research and Development
department, there is another peak of employees leaving the company at 10 years.
Question is why? To answer that question, the histogram is not very useful and a
more advanced methods are required.</div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">Given the similarity
with customer churn, it might be tempting to go for a logistic regression to
predict the probability of turnover, or use a decision tree to find the relevant
factors that drive turnover. However that would imply that we can’t incorporate
an important factor that we are interested in, and that is time till
resignation. A method that explicitly takes the time to an event into account
is <a href="https://en.wikipedia.org/wiki/Survival_analysis">Survival analysis</a>, also known as reliability analysis or duration modelling.
The survival curve expresses the probability of survival (in this case staying
at the company) over time. Survival analysis allows us to account for <a href="https://en.wikipedia.org/wiki/Censoring_(statistics)">censoring</a>and time-dependent explanatory variables, so incorporating the time since last
salary raise or the time since last promotion. By estimating survival curves for
different departments, job roles or other dimensions of interest, comparisons
can be made and differences in resignation probabilities over time analysed. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi57AnSFl8Wa3aRAieTUpTYDFt4SS1e55iaXl2IwgdhRS0GDn4sdibXqwn2weKWtS59THKIPU-BHnfg5vLHNIlq91KCg4p635qPjlZZzNWlUcs_-RxaU41CMB1UjgMbC__QZ1I_RGOTYbKs/s1600/estimated+survival+curves.JPG" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi57AnSFl8Wa3aRAieTUpTYDFt4SS1e55iaXl2IwgdhRS0GDn4sdibXqwn2weKWtS59THKIPU-BHnfg5vLHNIlq91KCg4p635qPjlZZzNWlUcs_-RxaU41CMB1UjgMbC__QZ1I_RGOTYbKs/s400/estimated+survival+curves.JPG" width="393" /></a></div>
<div class="MsoNormal">
<span lang="EN-GB">Using the data
from the histogram I created the following survival curves per department and
job role. The high level of turnover in year 2 and 10 as seen in the histogram show
as a strong reductions in the survival probability. Clearly visible are the
differences per department and job roles (Sales reps seems to have a short
future). The survival curves help us understand the rate of resignation better
than the histograms as it shows how the probability of resignation develops
over time, but it doesn’t provide the reason why people resign. For this we
need a method that allows for additional explanatory variables to explain the
resignations over time. A much used method for answering this type of question is
the <a href="https://en.wikipedia.org/wiki/Proportional_hazards_model">Cox Proportioned Hazard model</a>, in medicine they are commonly used to
describe the outcome of drug studies. To find out why so many people leave the Research
& Development department, I used the Cox model to find that Years Since
Last Promotion, Overtime and Job Satisfaction are the most significant factors.
Job involvement, Job Level and Frequent Business Travel also explain
resignation but are less significant. With these insights the HR department can
turn to the manager of the Research & Development department and pro-actively
come up with ways to reduce resignation levels by addressing the key factors.<o:p></o:p></span></div>
<br />
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB">The above
example is just an illustration of how advanced analytics can be of value to the
HR department and the organisation it is part of. With access to these advanced
methods strategic impact of HR will increase, tearing down the wall of HR
measurement. However, as this type of analysis is typically not routine and hence
difficult to capture in a standard tool or way of working, HR departments also need
to acquire the right analytical skills and mind set. There is more to using advanced
analytical methods than just loading data in some analytics platform and pushing
the run button, accepting the outcome as the best possible answer. Adequate business
knowledge, being able to select and use the right analytical method and communicating outcomes to business owners are as much a requirement as having
access to analytical software. With this in mind, for sure the HR measurement
wall will cease to exist.<o:p></o:p></span></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-48004720750284931482015-06-21T18:49:00.002+02:002015-06-21T18:49:20.240+02:00Prescriptive analytics, the next big step?<div style="color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; line-height: 24px; margin-bottom: 30px;">
Now that you have hooked all the data of your organisation to your KPI dashboard to monitor every day performance and are busy estimating forecasting models for order intake and customer satisfaction, you’re wondering what will be your next step in analytics. Should it be prescriptive analytics? It’s the most advanced, most promising variant of analytics, at least that’s what vendors of analytics software are saying, but it is also the most demanding. </div>
<div style="color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; line-height: 24px; margin-bottom: 30px;">
<img class="center" height="385" src="https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAIiAAAAJDJhODg4ZWE1LWRmYjItNGUxNy1hZTc2LTk4NTY0NTM3ZDMyOQ.jpg" style="display: block; height: auto; margin-bottom: 15px; margin-left: auto; margin-right: auto; max-width: 100%; text-align: center;" width="400" />Reviewing the literature on analytics you deduct that the only way to be able to use prescriptive analytics is to gradually grow your analytics maturity from descriptive, diagnostic, and predictive to prescriptive analytics. The graph Tom Davenport uses in <a data-mce-href="http://www.amazon.com/Competing-Analytics-The-Science-Winning/dp/1422103323" href="http://www.amazon.com/Competing-Analytics-The-Science-Winning/dp/1422103323" rel="nofollow">Competing on Analytics </a>to position the different types of analytics cleary shows that. Gartner positions prescriptive analytics as an emerging technology in the hype cycle, comparable to autonomous vehicles and biochips, suggesting it is a new high tech kind of thing. Something that need to proof it's value still. You wonder if it's the right way to go.....</div>
<div style="color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; line-height: 24px; margin-bottom: 30px;">
<img class="center" height="333" src="https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAANOAAAAJDgwMWZhZTliLTkxNWYtNDVmNS1hZmRlLTdlYzljYzc0ZWRmMg.jpg" style="display: block; height: auto; margin-bottom: 15px; margin-left: auto; margin-right: auto; max-width: 100%; text-align: center;" width="588" /></div>
<div class="center" style="color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; line-height: 24px; margin-bottom: 30px; text-align: center;">
<strong>Gartner hype cycle august 2014</strong></div>
<div style="color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; line-height: 24px; margin-bottom: 30px;">
My experience is that analytics maturity is of less importance when it comes to the the kind and complexity of analytics used to solve a business problem. Analytics maturity is about the factors that determine the organisation readiness to adopt analytics in decision making <em>throughout a</em>n organisation Davenport uses the DELTA (Data, Enterprise orientation, Leadership, Targets, and Analysts) metaphor to asses an organisations’ maturity. When you review <a data-mce-href="http://www.slideshare.net/saurabh0883/competing-on-analytics-by-thomas-h-davenport-jeanne-g-harris" href="http://www.slideshare.net/saurabh0883/competing-on-analytics-by-thomas-h-davenport-jeanne-g-harris" rel="nofollow">Davenport’s DELTA model,</a> you will see that the complexity of the analytics used is not a driving factor for maturity. the other way around also holds.</div>
<div style="color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; line-height: 24px; margin-bottom: 30px;">
Gartners’ positioning of prescriptive analytics as a new technology is strange to me. Prescriptive analytics (or Operations Research as we used to call it) has been around for some time already, it originated from the research done by <a data-mce-href="http://www.washingtonpost.com/opinion/blacketts-war-the-men-who-defeated-the-nazi-u-boats-and-brought-science-to-the-art-of-warfare-by-stephen-budiansky/2013/03/29/3083879a-75ee-11e2-8f84-3e4b513b1a13_story.html" href="http://www.washingtonpost.com/opinion/blacketts-war-the-men-who-defeated-the-nazi-u-boats-and-brought-science-to-the-art-of-warfare-by-stephen-budiansky/2013/03/29/3083879a-75ee-11e2-8f84-3e4b513b1a13_story.html" rel="nofollow">the British Army to beat the Nazi’s during the 2nd World War</a>. At that time, analytics was essentially the application of common sense and the careful study of data to the messiness of war. With success, as the insights from the analysts let to the defeat of the German U-boat campaign. Since then Operations Research has been applied to all kinds of decision problems within big and small organisations, some of them could be called analytical competitors (organisations like Google and Amazon) many of them analytical impaired. </div>
<div style="color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; line-height: 24px; margin-bottom: 30px;">
In my 25 year career as an analytics professional I have come across many examples in which operations research (or prescriptive analytics) proved to be of immediate value, even though the organisation didn’t have sophisticated analytical skills. I have supported Mon and Pop 3PLs with route optimisation models to create routing schedules for their trucks. With the low margins they get, making the most out of their assets is crucial to them. In healthcare, not really a sector in which analytics has gained a strong foothold, the use of shift optimisation and shift scheduling has let to better balanced schedules, reducing illness and stress, beneficial to both nurses and patients, lowering the cost of healthcare. Similar, <a data-mce-href="http://john-poppelaars.blogspot.nl/2008/05/and-winner-is.html" href="http://john-poppelaars.blogspot.nl/2008/05/and-winner-is.html" rel="nofollow">benchmarking</a> using optimisation modelling resulted in better insights in hospital performance and the identification of best practices. Governments also are not very analytical mature, than again using optimisation to construct routes for <a data-mce-href="http://john-poppelaars.blogspot.nl/2007/12/mail-delivery-for-safer-roads.html" href="http://john-poppelaars.blogspot.nl/2007/12/mail-delivery-for-safer-roads.html" rel="nofollow">the de-icing of high ways and local roads</a> reduced cost and improved road safety. I could go on with many more examples, but I guess you get the point, it is not your analytics maturity that determines whether you can use prescriptive analytics, but the problem you need to solve.</div>
<div style="color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; line-height: 24px; margin-bottom: 30px;">
In summary, prescriptive analytics is not a concept in a hype stage, nor an approach with little use in every day decision making. The above examples proof that. It doesn’t require big budgets nor is it only available to you when you have mastered predictive or descriptive analytics. It is the problem you need to solve that determines the analytics technique you require. So what’s keeping you? Start optimising and start today!</div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-57356356518037433722015-05-25T17:58:00.000+02:002015-05-25T17:58:16.961+02:00There is more to analytics than just fishing in the data lake<div style="background-color: white; border: 0px; box-sizing: border-box; color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; font-stretch: inherit; line-height: 24px; margin-bottom: 30px; padding: 0px; vertical-align: baseline;">
We live in an era in which we celebrate technology, we live for the l<a href="http://www.apple.com/watch/films/" rel="nofollow" style="border: 0px; box-sizing: border-box; color: #96999c; font-family: inherit; font-stretch: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">atest gadgets</a>. Data is now longer a scarce resource, expectations about what can be done with it are rising fast. On the other hand, lakes of data are overwhelming and frustrating people while hard- and software vendors are inviting us to go on a <a href="https://twitter.com/tableau/status/585477629861462016" rel="nofollow" style="border: 0px; box-sizing: border-box; color: #96999c; font-family: inherit; font-stretch: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">data fishing trip</a>. They tempt us to spend many Euros on data warehouses, hardware, and state of the art analytics software. However, no matter how many Euros you’re spending, if people who work with the data don’t know how to make sense of it or are unable to clearly present what they find, that investment is clearly wasted. The problem of our time is not the lack of data, but rather the inability to make sense of it. In a typical analytics project, data is loaded into software and the “<em style="border: 0px; box-sizing: border-box; font-family: inherit; font-stretch: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; padding: 0px; vertical-align: baseline;">find the best model</em>” button is pushed. According to the ads of the software vendors, decision makers can act immediately on the outcomes as the software is guaranteed to find the best possible model. This however can have serious problems.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; font-stretch: inherit; line-height: 24px; margin-bottom: 30px; padding: 0px; vertical-align: baseline;">
Best practice before running any statistical analysis is to first visually inspect the data. In 1973, <a href="http://www.jstor.org/stable/2682899" rel="nofollow" style="border: 0px; box-sizing: border-box; color: #96999c; font-family: inherit; font-stretch: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Anscombe presented</a> four data sets that have become a classic illustration for the importance of visualizing data, not merely relying on summary statistics or model fitting procedures of analytics software. The four data sets are now known as "<a href="http://en.wikipedia.org/wiki/Anscombe%27s_quartet" rel="nofollow" style="border: 0px; box-sizing: border-box; color: #96999c; font-family: inherit; font-stretch: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 0px; padding: 0px; text-decoration: none; vertical-align: baseline;" target="_blank">Anscombe's quartet.</a>"</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; font-stretch: inherit; line-height: 24px; margin-bottom: 30px; padding: 0px; vertical-align: baseline;">
<img alt="" class="center" data-loading-tracked="true" height="346" src="https://media.licdn.com/mpr/mpr/shrinknp_750_750/AAEAAQAAAAAAAAKnAAAAJGM3ZDFlOTcyLTgwY2UtNGE5OC1hM2NhLTY3ODY3NzJmOTc2Yg.jpg" style="border: 0px; box-sizing: border-box; display: block; font-family: inherit; font-stretch: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; height: auto; line-height: inherit; margin: 0px auto 15px; max-width: 100%; padding: 0px; text-align: center; vertical-align: baseline;" width="453" /></div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; font-stretch: inherit; line-height: 24px; margin-bottom: 30px; padding: 0px; vertical-align: baseline;">
When summarizing the data of the four series it becomes clear that the summary statistics are the same. Assuming a simple linear relationship between each X and Y results in four identical models, Y=3.000+ 0.500 X. But are these series indeed the same?</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; font-stretch: inherit; line-height: 24px; margin-bottom: 30px; padding: 0px; vertical-align: baseline;">
<img alt="" class="center" data-loading-tracked="true" height="176" src="https://media.licdn.com/mpr/mpr/shrinknp_400_400/AAEAAQAAAAAAAALYAAAAJDhiNjA1NmVjLTgyYjUtNDk4NC1iMjZjLWQxYTZhNjdlZTY5ZA.jpg" style="border: 0px; box-sizing: border-box; display: block; font-family: inherit; font-stretch: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; height: auto; line-height: inherit; margin: 0px auto 15px; max-width: 100%; padding: 0px; text-align: center; vertical-align: baseline;" width="387" /></div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; font-stretch: inherit; line-height: 24px; margin-bottom: 30px; padding: 0px; vertical-align: baseline;">
Things turn out to be very different when we visualise the data. As can be seen directly from the graphs it’s dangerous to assume you understand the nature of the data just from its summary statistics or the model output. Each of Anscombe’s examples shows an interesting and valid relationship, but only one of them matches the story drawn out from the summary and the fitted model. Set 2 clearly isn’t linear but quadratic. Set 3 is linear, but the outlier (upper right) skews the fitted model. Set 4 is a more extreme example of the effect of an outlier. A linear relation between X and Y in this case doesn’t make any sense. </div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; font-stretch: inherit; line-height: 24px; margin-bottom: 30px; padding: 0px; vertical-align: baseline;">
<img alt="" class="center" data-loading-tracked="true" height="414" src="https://media.licdn.com/mpr/mpr/shrinknp_750_750/AAEAAQAAAAAAAANSAAAAJDAwYzlkNTUzLTg3NjYtNDlkOS04MzYyLTEwYjg3ZTFjODhmYg.jpg" style="border: 0px; box-sizing: border-box; display: block; font-family: inherit; font-stretch: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; height: auto; line-height: inherit; margin: 0px auto 15px; max-width: 100%; padding: 0px; text-align: center; vertical-align: baseline;" width="588" /></div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; font-stretch: inherit; line-height: 24px; margin-bottom: 30px; padding: 0px; vertical-align: baseline;">
Data visualisations help us perceive and appreciate the features of the data but also let us look behind such features and let us see what else is there. Good analysis is not a routine matter and will require switching between graphical display of the data, model estimation results and crunching the numbers.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; font-stretch: inherit; line-height: 24px; margin-bottom: 30px; padding: 0px; vertical-align: baseline;">
To be successful in analytics two skills are essential. First of all statistical thinking, the ability to find insights that live in the data and make sense of them. Second visual thinking, the ability to see meaningful patterns in data by representing and interacting with them visually. Having lots of data, the latest hard- and software and the urge to go on a fishing trip are no substitute for these skills.</div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; font-stretch: inherit; line-height: 24px; margin-bottom: 30px; padding: 0px; vertical-align: baseline;">
Code to reproduce the data, tabels and graphs can be found on my <a href="https://github.com/ORatWork/Anscombe">GitHub page</a> </div>
<div style="background-color: white; border: 0px; box-sizing: border-box; color: #4d4f51; font-family: Helvetica, Arial, sans-serif; font-size: 16px; font-stretch: inherit; line-height: 24px; margin-bottom: 30px; padding: 0px; vertical-align: baseline;">
<br /></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-53678356283646352902015-04-25T15:58:00.000+02:002015-04-25T15:58:16.426+02:00What’s stronger than Moore’s law? <a href="http://en.wikipedia.org/wiki/Moore%27s_law">Moore’s law</a> turned 50 this week. In a now famous <a href="https://www.cis.upenn.edu/~cis501/papers/mooreslaw-reprint.pdf">paper </a>from 1965 <a href="http://en.wikipedia.org/wiki/Gordon_Moore">Gordon Moore</a> predicts that every 1-2 years the number of transistors on an integrated circuit will double, lowering production cost and increasing its capabilities. Even more, in the same paper Moore predicts that “integrated circuits will lead to such wonders as home computers, automatic controls for automobiles and personal portable communication equipment”. Can you imagine today’s world without them? This technological progress has boosted computational power enormously and enabled us to solve larger and larger optimisation problems faster and faster. But, even though the progress has been phenomenal, there is even a greater power available. It’s called mathematics.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9H828va7oO5MXCsGHb0Dm8G6jPG30mXBL-h0i4kyNGWsdgKbiaXxIF78U3ZvsKqqeR2UNIPYcbALCgDLXvI6BQJbR0qyd-lB8HQ3Ma-eHu0WwqRP0xp-csXJELRh9ilnmkgDN6y13W3r-/s1600/Handy+Home+Computers.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9H828va7oO5MXCsGHb0Dm8G6jPG30mXBL-h0i4kyNGWsdgKbiaXxIF78U3ZvsKqqeR2UNIPYcbALCgDLXvI6BQJbR0qyd-lB8HQ3Ma-eHu0WwqRP0xp-csXJELRh9ilnmkgDN6y13W3r-/s1600/Handy+Home+Computers.JPG" height="166" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: "Calibri",sans-serif; font-size: 11.0pt; line-height: 107%; mso-ansi-language: NL; mso-ascii-theme-font: minor-latin; mso-bidi-font-family: "Times New Roman"; mso-bidi-language: AR-SA; mso-bidi-theme-font: minor-bidi; mso-fareast-font-family: Calibri; mso-fareast-language: EN-US; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin;"><span lang="EN-GB">from : <a href="https://www.cis.upenn.edu/~cis501/papers/mooreslaw-reprint.pdf">https://www.cis.upenn.edu/~cis501/papers/mooreslaw-reprint.pdf</a></span></span></td></tr>
</tbody></table>
The impact of Moore’s law is best illustrated by the cost per transistor. This cost decreased from about $10 per transistor in 1970 to less than $ 0.000000001 in 2010. That’s less than the cost of ink for one letter of newsprint. It allowed Google to develop self-driving cars, NASA to send satellites into space and allows us to navigate to our destination using real time traffic information. Moreover, it puts computing power at our fingertips and stimulates the application of techniques from Operations Research and artificial intelligence to real world problems.<br />
<br />
When looking at the performance improvement over the years there is a remarkable development. <a href="http://de.wikipedia.org/wiki/Martin_Gr%C3%B6tschel">Martin Grötschel</a> (actually it's work from <a href="http://www.caam.rice.edu/~bixby/">Robert Bixby</a>) reports a <a href="http://www.math.washington.edu/mac/talks/20090122SeattleCOinAction.ppt">43 million (!) fold speedup</a> over a period of 15 years for one of the key algorithms in optimisation, the linear programming problem. Algorithms to solve linear programs are the most important ingredient of the techniques for solving combinatorial and integer programming problems. They are one of the key tools for an analytics consultant in solving real world decision problems. Grötschel shows that a benchmark production planning problem would take 85 years to solve on 1988 hard- and software, but that it can be solved within 1(!) minute using the latest hard- and software. Breaking the speedup down in machine independent speedup and the speedup of computing power shows that the progress in algorithms beats Moore’s law by a factor 43. <br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbyp6rVKqnoFEFuoj24oBXvmNBMXdFgGzGH6wfQncidHVvJE22txEWCFbf009hDmNIQJSt6MSsnQhzF5uilHgm0TF4M9Nhof-xg_ZedRCIw8kUZh7W6P5kRkBnNemZT-xFnapXgpCSIVRM/s1600/LP+progress.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbyp6rVKqnoFEFuoj24oBXvmNBMXdFgGzGH6wfQncidHVvJE22txEWCFbf009hDmNIQJSt6MSsnQhzF5uilHgm0TF4M9Nhof-xg_ZedRCIw8kUZh7W6P5kRkBnNemZT-xFnapXgpCSIVRM/s1600/LP+progress.JPG" height="296" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">from <a href="http://www.math.washington.edu/mac/talks/20090122SeattleCOinAction.ppt" style="font-size: 12.8000001907349px;">http://www.math.washington.edu/mac/talks/20090122SeattleCOinAction.ppt</a></td></tr>
</tbody></table>
<br />
With trends like big data, decision models will increase in size and will become more optimisation driven. As Tom Davenport puts it “Although Analytics 3.0 includes all three types [descriptive, predictive, prescriptive], it emphasizes the last”. Davenport predicts that prescriptive models will be embedded into key processes and support us in our everyday decision making. This requires the models to be fast and robust. Technological progress is not the only power that enables this, it´s mathematics. And mathematics seems to have the upper hand on this,<br />
<div>
<br /></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-42593366388058683612015-04-05T20:31:00.000+02:002015-04-05T20:31:49.528+02:00Do numbers really speak for themselves with big data?<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_JxjD4AwATlvvpQhOpGz4Vw69EGYYUGcvw93UgKBXXqhQTzYJorhADcRtTJ89E0DAfzamtgO1YbkNcs6bUfVmUTtBYVKafbeC8zy8E8pdDwrOaap9sd1XpczyXPljjgdYjHWFZOND9GsS/s1600/simple_answers.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_JxjD4AwATlvvpQhOpGz4Vw69EGYYUGcvw93UgKBXXqhQTzYJorhADcRtTJ89E0DAfzamtgO1YbkNcs6bUfVmUTtBYVKafbeC8zy8E8pdDwrOaap9sd1XpczyXPljjgdYjHWFZOND9GsS/s1600/simple_answers.png" height="320" width="262" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">http://xkcd.com/1289/</td></tr>
</tbody></table>
Chris Anderson, former editor in chief of Wired was clear about it in his provocative essay “<a href="http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory">The End of Theory: The Data Deluge Makes the Scientific Method Obsolete</a>”. He states that with enough data, computing power and statistical algorithms we can find patterns where science cannot. No need for theory, formal methods to test validity and causation. Correlation is enough, according to Anderson and with him many others.<br />
<br />
How would this work in practice? Suppose we would like to create a prediction model for some variable Y. This could for example be the stock price of a company, the click-through rates of online ads or next week’s weather. Next we gather all the data we can lay your hands on and put it in some statistical procedure to find the best possible prediction model for Y. A common procedure is to first estimate the model using all the variables, screen out the unimportant ones (the ones not significant at some predefined significance level ) and re-estimate the model with the selected subset of variables and repeat this procedure until a significant model is found. Simple enough, isn't it?<br />
<br />
Anderson suggested way of analysis has some serious drawbacks however. Let me illustrate. Following the above example, I created a set of data points for Y by drawing 100 samples from a uniform distribution between zero and one, so it’s random noise. Next I created a set of 50 explanatory variables X(i) by drawing 100 samples from a uniform distribution between zero and one for each of them. So, all 50 explanatory variables are random noise as well. I estimate a linear regression model using all X(i) variables to predict Y. Since nothing is related (all uniform distributed and independent variables) an R squared of zero is expected, but in fact it isn't. It turns out to be 0.5. Not bad for a regression based on random noise! Luckily, the model is not significant. The variables that are not significant are eliminated step by step and the model re-estimated. This procedure is repeated until a significant model is found. After a few steps a significant model is found with an Adjusted R squared of 0.4 and 7 variables at a significance level of at least 99%. Again, we are regressing random noise, there is absolute no relationship in it, but still we find a significant model with 7 significant parameters. This is what would happen if we just feed data to statistical algorithms to go find patterns.<br />
<br />
So yes, Chris Anderson is right. With data, enough computing power and statistical algorithms patterns will be found. But are these patterns of any interest? Not many of them will be, as spurious patterns vastly outnumber the meaningful ones. Anderson’s recipe for analysis lacks the scientific rigour required to find meaningful insights that can change our decision making for the better. Data will never speak for itself, we give numbers their meaning, the Volume, Variety or Velocity of data cannot change that.<br />
<br />
<span style="font-size: x-small;">Remark : Details of <a href="https://github.com/ORatWork/blog_april5th">the regression example </a>can be found on my GitHib</span><br />
<div>
<br /></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-11761832081538095542015-03-10T12:09:00.001+01:002015-03-10T12:09:26.999+01:00A toast to Occam’s razor; Accuracy vs Interpretability<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXxkhTFP883M7nWoms3Xx09VhRPWwIffFTFXJXZOxtaetGBtRuEoZBjN1cDqfsaOttGdBffiJMFuxhyphenhyphenpbeaaNKMqu4IRJ03olOMnUQbiD5olXPBITb9fMuIo8hW6y1Q4VJGDhve1LxsZLn/s1600/Along-the-Vinho-Verde-Route.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXxkhTFP883M7nWoms3Xx09VhRPWwIffFTFXJXZOxtaetGBtRuEoZBjN1cDqfsaOttGdBffiJMFuxhyphenhyphenpbeaaNKMqu4IRJ03olOMnUQbiD5olXPBITb9fMuIo8hW6y1Q4VJGDhve1LxsZLn/s1600/Along-the-Vinho-Verde-Route.jpg" height="208" width="320" /></a></div>
A question that I get asked a lot these days is when selecting a predictive model how to make the trade-off between model accuracy and model interpretability. Reason for this is that methods like neural nets and random forests are becoming more popular in predictive analytics. They tend to generate more accurate predictions than traditional statistical methods like a logistic regression but are much harder to interpret. Some practitioners, following <a href="http://en.wikipedia.org/wiki/Occam%27s_razor">Occam’s razor </a>principle, prefer simple methods over complex ones in supporting their customers. And I agree, most non mathematically trained people would be able to understand a logistic regression, but would have trouble understanding a neural net or a random forest. But sacrificing accuracy over interpretability? It’s a rather simplistic interpretation of Occam’s razor to prefer simple over complex models. Occam’s advice is to choose the simplest model in case the competing models have the same predictive ability. So he puts model accuracy first!<br />
<br />
One of the golden rules in analytics consulting is that a customer needs to trust the analytic methods you use before your customer is willing to accept and implement the outcomes of your analysis. Understanding the analytics method and the outcomes (interpretability) is one way for your customer to gain trust. For a simple model or method this is relatively easy, but what if the method becomes more complex? It would require your customer to become a mathematician to understand the model you created and verify if it is correct, but there is no need to do so. Objectively reporting the model quality is another way. For example by reporting the model calibration results (how well did the model fit the data) or its predictive accuracy. To show the predictive accuracy of a model a simple and straightforward method is to use a confusion matrix and report performance indicators deducted from it.<br />
<br />
Suppose you want to predict the quality of wine based on its chemical components. You are considering a <a href="http://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a> and a <a href="http://en.wikipedia.org/wiki/Random_forest">random forest</a> and want to select the best model. First both models are trained, in this case using the data from the <a href="https://archive.ics.uci.edu/ml/">UCI Machine learning Repository</a> which contains results of the chemical analysis of 6497 Portuguese "Vinho Verde" wines. To test both models, the quality of wines is predicted for a randomly selected subset of the wines which was excluded from the data before training. The results of the tests are summarized in the confusion matrices below. The matrix contains the results of the predicted quality of 1948 wines and compares it with the true classification for both models.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-5hTUNe6XrxkkaTAsgmJGhXi1NCJCFgHhnZUYyzTVUPH8guIkKRlKM1lCya-QSYjHqKAtCRUlpU1TL0w2515vfCvtdNylPAOb4MuEsunHLd5hYHXVoYRwA0j3OW76SzvwybhrRQN9S6DF/s1600/confusion+matrix.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-5hTUNe6XrxkkaTAsgmJGhXi1NCJCFgHhnZUYyzTVUPH8guIkKRlKM1lCya-QSYjHqKAtCRUlpU1TL0w2515vfCvtdNylPAOb4MuEsunHLd5hYHXVoYRwA0j3OW76SzvwybhrRQN9S6DF/s1600/confusion+matrix.JPG" height="171" width="400" /></a></div>
<br />
Based on the confusion matrix several criteria can be constructed to assess the prediction quality of the trained models. Criteria such as<br />
<br />
<ul>
<li><i>Accuracy</i>, the portion of correct predictions </li>
<li><i>Error rate</i>, 1 - <i>Accuracy</i></li>
<li><i>Sensitivity</i>, the portion of correctly predicted good quality wines versus the total number of good quality wines </li>
<li><i>Specificity</i>, the portion of correctly predicted bad quality wines versus the total number of bad quality wines</li>
<li><i>Lift</i>, the ratio of the portion of correct good wine classifications to the portion of actual good wines. So, it measures the strength of our model on the basis of positive classifications predicted by it correctly.</li>
<li><i>False Positive Rate</i>, portion of true negatives that are incorrectly predicted positive</li>
<li><i>False Negative Rate</i>, portion of true positives that are incorrectly predicted negative</li>
</ul>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBBeWPXiqnEvjRaBfvOw3Z78jaczUMO0hzuuroU8Ad6zCIfJni-MIL-n-tzIIntlLfc7kdNhMHeaXgzIp5cAtM15VxrVzy7SUEr6hPOC7vEMNbcUYlV3dFkj2FzB0cEzLIhrwZP1VY6bYX/s1600/Meaures.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBBeWPXiqnEvjRaBfvOw3Z78jaczUMO0hzuuroU8Ad6zCIfJni-MIL-n-tzIIntlLfc7kdNhMHeaXgzIp5cAtM15VxrVzy7SUEr6hPOC7vEMNbcUYlV3dFkj2FzB0cEzLIhrwZP1VY6bYX/s1600/Meaures.JPG" height="180" width="320" /></a></div>
<br />
Based in the computed performance measures the random forest model outperforms the logistic regression on all measures. It is the best model to predict Portuguese "Vinho Verde" wine quality. Of course we need to regularly measure the model performance as new data will become available and update it if required.<br />
<br />
The above example shows that accuracy requires more complex prediction models, it’s also a lesson I have learned in using both classical statistical (econometric) methods and machine learning to create prediction models for my customers. Simple models tend to be worse predictors, adding more variables (more information) increases the accuracy of predictions. As the inventor of the random forest algorithm <a href="http://en.wikipedia.org/wiki/Leo_Breiman">Leo Breiman</a> states in <a href="http://projecteuclid.org/euclid.ss/1009213726">Statistical Modelling: The Two Cultures</a> in predictive modelling the primary goal is to supply accurate predictions, not interpretability. Focus should therefore be on accuracy and when models are level on that score, follow Occam and choose the simplest one.<br />
<br />
<span style="font-size: x-small;"><i>Notes</i> </span><br />
<br />
<ol>
<li><span style="font-size: x-small;">The R code that I used for this blog can be found on my <a href="https://github.com/ORatWork/LRvsRF">GitHub</a>. </span></li>
<li><span style="font-size: x-small;">All estimation procedures used for this blog are part of the <a href="http://topepo.github.io/caret/index.html">CARET </a> (=Classification And REgression Training) package in R</span></li>
</ol>
<br />
<div>
<br /></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-80540026922757497622015-01-18T17:34:00.003+01:002015-01-18T17:34:26.772+01:00Is Big Data Objective, Truthful and Credible?In the past few years the attention for big data has grown enormously. Both business and science are focused on the use of large datasets to find answers to previously unsolvable questions. In the size of the data there seems to hide some kind of magic, which will answer any question that can be imagined. As former Wired editor-in-chief Chris Anderson puts it: <a href="http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory">“with enough data, the numbers speak for themselves.”</a> As if massive data sets and some predictive analytics always will reflect the objective truth. But can big data really deliver on that promise? Is big data objective, truthful and credible?<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglWrZ6JBnLN1_vmMMzNBPadSzLj5RjtW4jFkjYAPmB2B3VnzsVnjQ2p1mfK_T7ckhDS9F8FhdmpJDSMgYEwH-nHHNhvQE_5vHq88m6X_QedmR92Wk72PFCjlzAAmSPJlE1WC71C5hK_gJW/s1600/IBM+Global+Data+Growth.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglWrZ6JBnLN1_vmMMzNBPadSzLj5RjtW4jFkjYAPmB2B3VnzsVnjQ2p1mfK_T7ckhDS9F8FhdmpJDSMgYEwH-nHHNhvQE_5vHq88m6X_QedmR92Wk72PFCjlzAAmSPJlE1WC71C5hK_gJW/s1600/IBM+Global+Data+Growth.JPG" height="232" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Source : IBM Global Data Growth</td></tr>
</tbody></table>
The amount of digital data has grown tremendously in the past few years. It is predicted that this year we will reach around 8<a href="http://nl.wikipedia.org/wiki/Zettabyte"> zetta-bytes</a> worldwide. The amount of data is growing and is expected to grow exponentially because more and more devices are connected to the internet (<a href="http://en.wikipedia.org/wiki/Internet_of_Things">the internet of things</a>). Second factor stimulating the growth of data is the use of social media. It is expected that the total amount data with an IP address will reach a whopping <a href="http://www.emc.com/infographics/digital-universe-2014.htm">44 zetta-bytes by 2020</a>. Can we treat all that new data the same as the data from traditional sources, like the ERP system? Let’s start with social media content. To what level do you <a href="http://www.buzzfeed.com/laraparker/if-we-were-actually-honest-on-facebook#.ucEv0wzW4">trust </a>the content of a customer review, a tweet or Facebook post? How to detect rumours from fact and how to deal with contradicting information? Also, can we really expect that we have all data? There are many examples in which the observation bias negatively impacts the outcomes of an analysis based on social media content. See Kate Crawford’s <a href="https://hbr.org/2013/04/the-hidden-biases-in-big-data">HBR post</a> on this. Even companies like Google struggle with it as became clear in their <a href="http://www.nature.com/news/when-google-got-flu-wrong-1.12413">overestimation</a> of the number of flu infections. I guess it’s fair to say that social media content is highly uncertain in both expression and content. With sensor data it’s not better. If you use a satnav system you will probably know what I mean. Try navigating the inner-city streets of Amsterdam and you’ll see measurement error in action. Due to measurement errors, but also senor malfunctions, approximation errors, sampling errors, etc sensor data is highly uncertain as well. So although the amount of data grows (exponentially), the uncertainty in the data grows as well (exponentially). <br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglnTmKu7LaeL9EvoDf1xQVlY4jDuvQugMruVW3YkoWUThcojTgqpU_udsRRsWY1RMfpOYo8XE8dEpnZtcuc6FPoL1NRy-J5IY-u7SG46ZSNdy-0W3OmNQslHybyfM9_pDZgas3JA8kaNHU/s1600/Google+got+flu+wrong.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglnTmKu7LaeL9EvoDf1xQVlY4jDuvQugMruVW3YkoWUThcojTgqpU_udsRRsWY1RMfpOYo8XE8dEpnZtcuc6FPoL1NRy-J5IY-u7SG46ZSNdy-0W3OmNQslHybyfM9_pDZgas3JA8kaNHU/s1600/Google+got+flu+wrong.png" height="396" width="400" /></a></div>
<br />
Decision makers must understand the impact of data uncertainty on their decisions and should think of ways to making this impact explicit. This is not new and is not depended on whether the data comes from a big data source or not. Data uncertainty has been around ever since the first optimisation model was created. In practice this uncertainty is simplified by using a single measure, for example the minimum, maximum or average. The impact of that simplification is manifold as Sam Savage explains in <a href="http://flawofaverages.com/">The Flaw of Averages</a>. Without explicitly taking into account the uncertainty in (big) data, the outcomes of optimisation models using that data are no better than a wild guess. With the high level of uncertainty of big data, explicitly taking into account the data uncertainty is even more important. Luckily Operations Research offers various ways to incorporate this uncertainty into the modelling and changes a wild guess into an informed decision. Some well-known approaches are what-if analysis, fuzzy logic, robust optimisation and simulation.<br />
<br />
Big data is not objective nor truthful nor credible; It’s a creation of human design and therefore biased. Numbers get their meaning because we draw inferences from them. Biases in the data collection, data analysis and modelling stages present considerable risks to decision quality, and are as important to the big-data equation as the numbers themselves. Decision makers must know about this uncertainty, know how it will impact decision making.<br />
<br />
<h4 style="text-align: center;">
”Not everything that counts can be counted, </h4>
<h4 style="text-align: center;">
and not everything that can be counted counts.”</h4>
<h4 style="text-align: center;">
<br />Albert Einstein</h4>
<div style="text-align: center;">
<br /></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-73741949555494811542014-12-08T09:45:00.002+01:002014-12-08T09:52:07.477+01:00The Age of Algorithms started on a diet<div class="MsoNormal">
<span lang="EN-GB">With the exponential
growth of interest in data analytics, either big or small, the attention for the
use of algorithms has risen strongly as well. An algorithm is nothing more than
a step-by-step procedure for a calculation.
That is also what makes it so powerful. Algorithms make our lives
easier. <a href="http://www.indierecon.org/2014/02/understanding-amazons-recommendation.html">Recommendations engines</a> single out the product we have been looking for
based on our previous purchases and those of buyers similar to us. The <a href="https://nest.com/">NEST</a>thermostat programs itself and continually adapts to our changing life. <a href="https://blog.bufferapp.com/facebook-news-feed-algorithm">Facebook</a>
selects the news items that are of interest to us, based on what we have been
reading, liking and sharing. All of this
would not have been possible without the use of algorithms. Using computers,
algorithms can do things close to magic as <a href="http://en.wikipedia.org/wiki/Clarke's_three_laws">Arthur C. Clarks third law</a> predicts.
Algorithms even let a computer think up new <a href="http://www.ibm.com/smarterplanet/us/en/cognitivecooking/tech.html">recipes </a>when it gets bored playing
Jeopardy! We are experiencing the <a href="http://www.economist.com/news/special-report/21621156-first-two-industrial-revolutions-inflicted-plenty-pain-ultimately-benefited">third great wave</a> of invention and economic
disruption; we live in the Age of Algorithms. <o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB"><br /></span></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHAgxMu-9F7o6ySBuWAm-UKUT06R522SDZLRH-tHTdPgS9smbyN0pFzL1YqKg0bSJ1buvApW50Vf2PBnpjt05q__6HwiWjHCqJvCpt7utghPn_cO7NYFygW7J346djlmpThX4vDFwNXmF8/s1600/gass-scoop-4-scientific-planning-01.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHAgxMu-9F7o6ySBuWAm-UKUT06R522SDZLRH-tHTdPgS9smbyN0pFzL1YqKg0bSJ1buvApW50Vf2PBnpjt05q__6HwiWjHCqJvCpt7utghPn_cO7NYFygW7J346djlmpThX4vDFwNXmF8/s1600/gass-scoop-4-scientific-planning-01.jpg" height="320" width="236" /></a><span lang="EN-GB">Many people
see Google as the initiator of the Age of Algorithms; 16 years ago Larry Page
and Sergey Brin created their page rank algorithm that enables us to efficiently
find what we are looking for on the web. It accelerated the growth of data and the
use of algorithms to analyse it. Most of the analyses focus on detecting patterns
in the digital bread crumbs we leave behind when wandering on the web. With algorithms
companies and organisations try to better understand our needs and wants so
they can improve their marketing and sales strategies. Google’s page rank however
wasn’t the start of the use of algorithms to solve practical decision problems,
the age of algorithms kicked off much earlier than most people think.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span lang="EN-GB">Shortly
after the Second World War the Pentagon based research group <a href="http://www.orms-today.org/orms-12-07/history.html">SCOOP</a> was formed.
SCOOP stands for Scientific Computation of Optimal Programs. The group set out
to find methods for the programming problems of the Air Force. Programming
problems are concerned with the efficient allocation of scarce resources to
meet some desired objective, for example the determination of time phased
requirements of materials in support of a war plan. Mathematically these
problems look like this:<o:p></o:p></span><br />
<br /></div>
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCje5XpBQYm5e9UHyprQL7G_BwencnHfeSzWeZVoolhxdUtRNoLL7I2kBFprct8TT9-nbidEGv9CoTGXC3dwhtHSlrfMwLWW55AoCOWVGsvxTH-HzBmOsMfykMgqBeOhHGLFPyD438rUhX/s1600/Simple+LP.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCje5XpBQYm5e9UHyprQL7G_BwencnHfeSzWeZVoolhxdUtRNoLL7I2kBFprct8TT9-nbidEGv9CoTGXC3dwhtHSlrfMwLWW55AoCOWVGsvxTH-HzBmOsMfykMgqBeOhHGLFPyD438rUhX/s1600/Simple+LP.JPG" /></a></div>
<br /></div>
<div class="MsoNormal">
<span lang="EN-GB">Fact is
that thousands of real life problems in business, government and the military
can be formulated (or approximated) this way. A way (algorithm) to solve these linear
programming problems would therefore be very useful and that is exactly what
George Dantzig, Chief mathematician of SCOOP, did. In 1947 he invented the simplex
method. The impact and power of Danzig’s invention is hard to overstate, I dear
to say it’s the most used algorithm today. The journal Computing in Science and
Engineering listed it as one of the <a href="http://www.computer.org/csdl/mags/cs/2000/01/c1022.html">top 10 algorithms of the twentieth century</a>.
It’s probably on your computer as well as it comes standard with Excel. <o:p></o:p></span></div>
<br />
<div class="MsoNormal">
<span lang="EN-GB">One of the first
problems solved using the simplex method was a diet problem named after Nobel
Laureate George Stigler. <a href="https://dl.dropboxusercontent.com/u/5317066/1990-dantzig-dietproblem.pdf">Dantzig</a> wanted to test if his new method would work
well on a rather “large scale” problem. You can try solving the <a href="http://en.wikipedia.org/wiki/Stigler_diet">Stigler Diet problem</a>
yourself either by hand as Stigler did or use the power of the simplex
algorithm to find the optimum. The age of algorithms starts <a href="https://docs.google.com/spreadsheets/d/1La-wsp1MBt77TJda0RuGnIUjIe6A_PM1_wBBT_WzMsw/edit?usp=sharing">here </a>(or in 1947 actually). <o:p></o:p></span></div>
@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0tag:blogger.com,1999:blog-1537119231740022182.post-84237744506961241422014-11-22T18:05:00.000+01:002014-11-22T18:05:16.870+01:00Let the data do the talking<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj3hwp43dPDiq1dJIa-OfiQPYjYBpCGAhkLeBeYVHusmeB7wJ_T8Bs6MwZ93nLsre06vI4WOSMMPtrI-qu5C6Ez7OjHXViUd4FV1kQtklvFkDB7_Mw4N7sTk0_mVI-Dk_YBiNUjz9Mls3D/s1600/Scientific+Management.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj3hwp43dPDiq1dJIa-OfiQPYjYBpCGAhkLeBeYVHusmeB7wJ_T8Bs6MwZ93nLsre06vI4WOSMMPtrI-qu5C6Ez7OjHXViUd4FV1kQtklvFkDB7_Mw4N7sTk0_mVI-Dk_YBiNUjz9Mls3D/s1600/Scientific+Management.jpg" height="400" width="217" /></a></div>
<span style="font-family: inherit;">Although
the principles of <a href="http://en.wikipedia.org/wiki/Scientific_management">scientific management</a> from <a href="http://en.wikipedia.org/wiki/Frederick_Winslow_Taylor">Frederick Taylor</a> have long become
obsolete, many parts of the theory are still important for organisations today.
When was the last time you were involved in a project concerned with efficiency
improvement, the elimination of waste or the identification of best practices? These are just a few topics from scientific
management that are still part of industrial engineering and everyday management
decision making. Key for the success of these kinds of projects is to have (or obtain)
an in depth understanding of the work processes that require improvement. You can imagine that without this, changing the
process might cause you to end up with a worse performance than you started
with. The default way to gather information for analysing a process is by
studying business process maps, interviewing people and fact finding on the shop
floor. This can however be very time consuming, where the quality and accuracy
of the gathered data could be questionable. Business process maps are known for
their outdatedness (written to pass some ISO certification step years ago),
people have different views on how processes are performed, while fact finding
many times can only cover part of the processes under investigation. Not a very
good start for a successful improvement project wouldn’t you say. There is
however a solution and it’s called data!</span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="font-family: inherit;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="font-family: inherit;">Many
organisations today have implemented workflow management, CRM and/or ERP
systems. These systems are what you could call “process aware”. Key is their
capability to log events, like a new order coming in, a request to process an invoice,
the rejection of an insurance claim, the admittance of a patient, etc. These
systems register very detailed information on the activities that are being
performed, information that could be used to mine the data to uncover the
actual work processes. Using the logged events related to the same case (e.g a
new customer order) in process mining, the sequence in which they were
performed is used to identify all the activities required to process the
complete case. If the event log also contains information on the performer
(person/resource, etc) of the activity and timestamps on when the activity took
place; resource usage, duration and productivity can be measured as well. So, the data does all the talking instead of
the interviewees.<o:p></o:p></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="font-family: inherit;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="font-family: inherit;">Traditionally
process mining is focussed on deriving information about the actual work process,
the organizational context (who performs what), and execution properties (resource
usage, duration, performance, etc) from event logs. With the resource
information from the event logs social networks can be extracted; this allows
organizations to monitor how people, groups, or software/system components are
working together. Next to the discovery of actual work processes, process
mining can be used to test conformance with the to-be (or designed) work
processes, enabling the work processes to be audited in a fast and objective
manner. This can especially be of value in highly regulated businesses like
banking or insurance, checking conformance with regulations like <a href="http://en.wikipedia.org/wiki/Basel_III">Basel III</a>. A
third area which process mining can be of value is by extending an existing process
model with new information, for example using the information from the event
log to detect the data dependencies or decision rules for a specific activity.<o:p></o:p></span><br />
<span lang="EN-GB" style="font-family: inherit;"><br /></span></div>
<div class="MsoNormal">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJL-DIDEI5oBSsmXFTGzB6mbVFxRf0N6jPKw2394UsWYzNxfhMe0wZba05OHH9k6sAJaJvOc8FCgxF9_7UVmWMGYBvPwVpNTgmBY64KKeR-OL72TMjXhcpQ1nSjpV1e4fuGu2VPTjnrOsa/s1600/View.gif" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJL-DIDEI5oBSsmXFTGzB6mbVFxRf0N6jPKw2394UsWYzNxfhMe0wZba05OHH9k6sAJaJvOc8FCgxF9_7UVmWMGYBvPwVpNTgmBY64KKeR-OL72TMjXhcpQ1nSjpV1e4fuGu2VPTjnrOsa/s1600/View.gif" height="90" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Process Flow gynaecological oncology patients</td></tr>
</tbody></table>
<span style="font-family: inherit;">To illustrate
how process mining works I used an example data set containing the event logs
of gynaecological oncology patients of a Dutch hospital. The data set contains the event logs of 627 individual
patients. Using the open source data mining platform RapidMiner and the Process
Mining package ProM from the Process Mining expertise centre at Eindhoven University
I created the following process flow from the data using the ILP miner (an integer
linear programming based model to extract a process flow). Note that I did not use any a-priori knowledge
about the care process of this group of patients. This all comes from the event
log data. Using the same tools and data the social network can be constructed
providing insight on who works with who, who delegates work to who and the intensity
of these work relations. Also using visualisations like the Dotted Chart specific
patterns can be detected in the way the patients are treated.</span><br />
<span style="font-family: inherit;"><br /></span>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0yKNhZ8_-dbkJtpyTCVkjQ1goZBecS_TaIym4p-15heWxPMrCBBw56rWtq1ayUmq1tUqNCemGdd_yGai4Aj6C9ax2uXFZimTqQvos5ocBYOBQIWenYLU2baQsqZVB69joEmlkeRLbEheZ/s1600/dottedchart.JPG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0yKNhZ8_-dbkJtpyTCVkjQ1goZBecS_TaIym4p-15heWxPMrCBBw56rWtq1ayUmq1tUqNCemGdd_yGai4Aj6C9ax2uXFZimTqQvos5ocBYOBQIWenYLU2baQsqZVB69joEmlkeRLbEheZ/s1600/dottedchart.JPG" height="238" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Dotted Chart </td></tr>
</tbody></table>
<span style="font-family: inherit; line-height: 115%;">Using process mining to discover work processes
from event logs can be very powerful and less time consuming than the “old” way
of interviewing, studying outdated process flow descriptions and fact finding expeditions.
When you let the data do the talking, first results can be delivered quickly. Crucial
is of course to have access to data. It
requires skill to extract the right data from an ERP-like system, probably a lot
of data cleansing needs to be done, including the check on completeness and validity
of the data. Also it’s quite easy to get swamped in data, especially when the
number of log events is big and a lot of process steps are involved. In environments like hospitals a lot of
unstructured processes exist which will make it more difficult to use techniques
like process mining as is, however using techniques from data mining like
clustering first, satisfactory results can be achieved. Compared to traditional
business Intelligence tools that only provide an aggregate view of the process inputs
and outcomes, process mining dives inside the processes and provides the
insights to give your next improvement project a head start.</span><br />
<span lang="EN-GB" style="font-family: inherit;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="font-family: inherit;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="font-family: inherit;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: inherit;"><br /></span></div>
<div class="MsoNormal">
<span lang="EN-GB" style="font-family: inherit;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: inherit;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: inherit; font-size: 11pt; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: inherit;"><br /></span></div>
<span lang="EN-GB" style="font-size: 11pt; line-height: 115%;"><span style="font-family: inherit;"><br /></span></span>@ORatWorkhttp://www.blogger.com/profile/09446587181442453824noreply@blogger.com0