Sunday, 26 October 2014

“That which is measured, improves”

Some people attribute the above law to Karl Pearson, a famous statistician and founder of mathematical statistics. Others attribute it to Peter Drucker, a well-known management consultant. The source of the above law is however not very relevant. The fact that most people believe it to be true is one of the fundamentals of modern decision making and a driving force behind the increasing use of analytical and optimisation methods to improve organisational performance.  Without measurement it is difficult to assess an organisation’s performance, set measurable objectives and makes decision making to achieve those goals equivalent to a leap in the dark. As in any other business, measurement is important in healthcare. Not only to keep track of efficiency and effectiveness, but to measure and improve quality as well.

Each year a large Dutch newspaper, het Algemeen Dagblad, publishes a ranking of all Dutch hospitals, the AD TOP100, to identify the best Dutch hospital. It has done so for 11 years. Each year the ranking is much debated as it is said to measure the quality of care in an arbitrary way and does nothing to support patients in deciding which hospital to go to. Because of the erratic movement of hospitals up and down the ranks, sometimes a hospital moves from a top position to the lower end of the list in a consecutive year and vice versa, it is suggested by the critics that the yardstick used by the AD is ill suited to assess quality of care. Therefore the list is dismissed as irrelevant. I think that it is too easy to dismiss the efforts of the AD journalists that way. There are useful insights to be gained form in the data they gathered.  Although not perfect, it can assist hospitals and policy makers in deciding on the improvement of healthcare quality and patients on which hospital to visit. Much has to do with how the results of the quality assessment are presented.

Measuring and comparing quality of hospitals is a tough job. Hospitals keep track of many indicators on the performance of their care processes and procedures, sometimes over a 1000. Which ones best provide insights on quality? The AD TOP100 uses only 0.3% of the available number of indicators. Not much you would say. What is important to mention however is that the indicators hospitals keep track of are not the same for each hospital, they don’t have a shared definition and most importantly they are not publicly available for the obvious reasons. The indicators the AD uses are, by law, publicly available and well defined. That makes the AD list verifiable. Moreover, the indicators are set by the Dutch Health Care Inspectorate, a government institute that promotes public health through effective enforcement of the quality of health services. To me this makes the list of criteria even less arbitrary. There are still many factors that make it difficult to objectively compare the performance of a hospital, for example a hospital serves a different patient population and has certain focus areas (for example cardiology, skin cancer, etc) which creates an unequal playing field for scoring the criteria. The AD tries to compensate for these factors, it’s however not very clear how they do that. Maybe a more sound (and analytical! approach) would be more objective, transparent and trustworthy and would reduce the debate.

"Starry Skies " of consecutive rankings
Besides the criticism on the criteria used and the scoring, much of the criticism is focussed on the rather erratic outcomes of the assessment. When reviewing the position of the hospitals over the consecutive years, the rank of a hospital can change quite a bit. Using scatter plots (they look like “starry skies”) and correlations on the consecutive ranking of hospitals the critics try to show that the outcomes are irrelevant.

And I think they are right, looking at the absolute position of a hospital in the list is not very informative to assess the quality of care it provides. When quality of care is on about the same level for all hospitals (as you might expect in the Netherlands) small changes in the overall score can lead to very different ordering. Focussing on ranks alone is therefore not very informative, as the changes in score most likely are caused by random variation, data errors, etc. 
Things change when a instead of rankings the actual scores of the hospitals are analysed, than a clearer picture arises. From the scatter plots of scores in consecutive years it is directly clear that there is a positive relationship (although still far from perfect).

Quality score in consecutive years
Also the plots show that the quality for the majority of hospitals was worse in 2011 compared to 2010 (majority of points are below the grey line), a similar conclusion holds for 2012. Things are getting better in 2013 and in 2014 quality has improved for most hospitals compared to 2013 (majority of points are above the grey line). Although the data is the same, changing the way it is presented also changes the insight it brings. By displaying the outcomes over time, better insights are provided on the development of quality over the years.  An even clearer picture arises when we create buckets for scores and count the number of hospitals in each bucket. The downward trend in 2011 and 2012 is very clear, what is interesting to know is what caused it (a change in criteria measured maybe?). The same applies to the increase in overall quality in 2014. Does what is measured indeed improve? These insights should be an incentive for both journalist and hospital to have a deeper look at the cause of this increase.

Hospital quality score development 2010-2014
How to decide which hospital is best? Given that small changes in the overall score can change the absolute position in a quite drastic way, a better way to measure “best” could be to use an average score. From the perspective of a patient this would make sense. Would you prefer to undergo surgery in hospital which ranked first this year, or a hospital that on average scored high over a period of several years? Using this as a proxy a different “winner” arises, although the 2014 number one is also in the list. The big difference is that the rank of the alternative best hospital in the 2014 list is 22, some difference! The same applies to number 4 in the list.

Best Average Hospital 2010-2014
Besides evaluating the individual quality of the Dutch Hospitals, the AD TOP100 data has much more to offer. The data gathered could be combined with data on the financial performance of hospitals, the insurers that contract them, the kind of care they offer, the demographics of the area they serve, etc. Much more insights can be gained that can be beneficial for quality and performance improvement. What about the effects of hospital mergers on quality? As an example, I looked at how the quality of care (according to the AD TOP100 data) has evolved geographically across the Netherlands in 2010 and 2014. In the maps the hospitals having a below average quality score are coloured red, the one with an above average score blue.

The results show that in 2010 the majority of hospitals in the North West, South East and South West had above average scores on quality. But in 2014 this has changed significantly, only the majority of hospitals in the South West have an above average score.  Is this an effect of the aging population in both North East and South East of the Netherlands? Are more people visiting hospitals in Germany and Belgium? Nice directions for further journalistic investigations I would say.

Hospitals with below (-) or above (+) average core per region
Measurement is crucial for informed decision making. Besides having a clear objective and being careful on what and how to measure, it is also important to think of how to analyse the data and present the outcomes in an insightful way. I think that research desk of the AD newspaper can do a better job in presenting the results of their analysis. Also by broadening the scope and including multiple years the impact if their work will increase as it will become more relevant and informative to hospitals, politics and patients. A missed opportunity is that the AD doesn’t go into details on WHY hospitals score better than others. This would however be very insightful and has news value. It will direct actions of the hospitals to improve, or patients/insures/politics can undertake action. A way to achieve it would be to extend the TOP100 scoring and work towards a benchmark that not only ranks but also supports the identification of best practices. It will push overall quality of healthcare to a higher level.

For those interested in performing or extending the analysis themselves, the data and script I used can be found on my GitHub 
Post a Comment