Great article. My only commment would be on the mobile phone brand risk. I would divide the number of robberies by the market share before claiming a brand is riskier than another one.

I know, I'm picky, but your post is so good that it was hard to find something that might be improved :)

Very Interesting, I have heard of Operations Research before but never really thought of actually applying it to Business Management. Do you have any link where I can read to understand the whole concept of Operations Research? Thank you. " Big Data " has become a buzzword. It conveys the notion that 

our interconnected world is generating a vast array of data

Interesting topic, thanks for the post.

"The random quest for correlations in large data sets" gave us the Bible Code. As John suggests, that's not much of a recommendation.

Better that the analyst discover that the data she wants has been collected and is available. In other words, we should treat big data as research without too many preconceptions about how it will ultimately be used.

Well, I think the Big Data serves to enhance the potential applications of OR. The main problem in implementing OR applications (at least in my case) is getting good input data for the models. And the Big Data is a very good solution for that.

Dear Michelle, thanks for your comments. You're right, not all exams have multiple choice. For the open questions similar techniques can be used (text mining) to detect fraud. These techniques are also used to detect plagiarism, for example to test if someone just copied a summary or thesis from the internet. Indeed the answers to the multiple choice questions were not stolen. The logical thing to do would be to ask someone that is an expert to answer the questions in advance and copy those which will lead to similar answer sequences which will be detected. I'm no expert in the legal implications, in my view these statistical tests indicate that something could be wrong and further investigations are required. That's interesting, but: not all (Dutch) exams are multiple choice - the exams contain a lot of open questions. Secondly, the exams were stolen - not the answers. So the student must still find the answers him/herself and will still experience difficult questions. And will this statistical indication also be a valid proof of fraud?

Hmm I tend to not agree with you on this one John. Although it also depends on the exact formulation of the problem/question.


I doubt that if you know that the family has at least one daughter, the remaining options {Girl, Girl}, {Boy, Girl} and {Girl, Boy} have an equal likelyhood. In probability it would mean that the initial probability of {Boy, Boy} of 1/4 is distributed evenly over the remaining options. The daughter in question that was mentioned can be either one of the two girls in {Girl, Girl} or the girl in {Boy, Girl} or the girl in {Girl, Boy}. Out of these 4 options, 2 of them end up in {Girl, Girl}. In my perspective, this makes the probability of ending up in {Girl, Girl} equal to 1/2 and 1/4 for {Boy, Girl} and 1/4 for {Girl, Boy}. This again resulting 50% chance of the other child being a girl or a boy.

Consider the following extreme example: 
There are 4 houses. House number 1 contains N girls. House number 2 and 3 contain 1 girl and N-1 boys. House number 4 contains N boys. 
Now consider the situation where N is a high number, like a milion.
I randomly go to one of the 4 houses (giving them an equal probability of 25% in starting situation) and that the chance of either a boy or a girl opens is merely depending on numbers of them and not the fact whether it is a boy or a girl. Suppose a girl opens the door. What does that tell me on the probability that I am at the 4 houses. Clearly I am not at house number 4, so that one get 0 probability. In this case it would not be fair that all three remaining houses have the same probability of 1/3. It is far more likely you have arrived at house number 1, given the amount of girls there. In fact the chance of being at house number 1 would be (N/(N+2)) where houses number 2 and 3 have a chance of 1/(N+2). 

The original example is the same situation in my opinion, but with N=2. Giving the resulting probability of 50% for {Girl, Girl}, 25% for {Boy, Girl} and 25% for {Girl, Boy}. 

On another note, adding non-relevant information does not change the probability mass over the Boy/Girl combos. It differentiates the probability mass of the original combos over the added non-relevant combinations. In other words: {Girl Mon, Girl Fri} does not have the same probability to occur as {Boy Mon, Girl Fri} after I know the extra information that at least one of the children is a girl. All the {Girl anyday, Girl anyday} combinations add together up to 50% chance. If you add information there was one born on a Friday. You get the combinations {Girl Friday, Girl anyday} and {Girl anyday, Girl Friday} (if you want you can feave the Friday/Friday option out, doesn't change the reasoning) that again add up to 50% of the total probability. You can do the same trick with adding additional useless information as: her name is Sara, she is born on the 1st of January, she has blue eyes. It does not change the probability on the total {Girl, Girl} option.

IMHO of course. 
Gregor Brandt

Hi John,
Again an exciting story from you..
Here
Hi John,

Again an exciting story from you..
Here is an other viewpoint.
In all the assumptions you take in consideration the assumption is a uniformly distribution across all possible outcomes. 
What if we have a database with, say, millions of persons and information about their gender and birthdate.
It is then just a matter of getting a frequency distribution of all possible outcomes. 

Regards,
Fred (still single)

I don't think the statement of the problem removed the (G,G) = (Fri, Fri) possibility. You are told there is a daughter born on Friday. You are not told there is exactly one daughter born on Friday (just like you are told there is a daughter, but not necessarily exactly one daughter). It changes the numbers, but not the overall story.

IS IT REALLY LOTTERY?
Early exit of Netherland surprises not only Dutch people but also everyone following EURO2012. They were in the group of death, qualifying to the next round was so hard, it is known for everyone, however; going back to home without a point was not acceptable. It is not predictable as well*. As you made analysis of Dutch qualification the next round, you had an assumption that they should get 6 points in order to reach QF. Actually this is very wrong. A team can progress to next round by taking only 2 points in this 4-team-containing-group (9-2-2-2 with goal difference). Thus, other 3 games are also crucial for a team. That's why I thought it should be much higher. 

In the analysis I took your probabilities.
I also estimate the other probabilities as in the table:
Table.1: Probabilities of Winning Matches
PROB'S OF WINNING MATCHES
 LOSERS
 POR DEN GER NED
WINNERS POR - 0,25 0,3 0,3
 DEN 0,25 - 0,3 0,3
 GER 0,4 0,4 - 0,25
 NED 0,4 0,4 0,25 -

Each match is simulated by itself, results are taken into account for teams as "3-Won, 0-Lost , 1-Drawn". In the end final standing of the group is obtained. Here is an example:
MATCHES & POINTS TAKEN FINAL STANDING
POR-NED NED-POR DEN-NED NED-DEN POR-DEN DEN-POR POR-GER GER-POR DEN-GER GER-DEN NED-GER GER-NED
0 3 3 0 3 0 0 3 1 1 1 1
POR NED DEN GER
3 4 4 5 

In the example Germany finishes the group 1st. Denmark and Netherlands have same points. Here, there is an assumption. No matter how the match ends between Netherlands and their opponent and how their goal difference is in the final standing, Dutch team both goes to quarter finals and exit tournament. In other words, answer of the question whether they qualify or not is counted as 0.5.

There are 1000 scenarios generated and their results are collected. Probability of qualification is about 0.55, twice as you suggested. It should have been realized that holding all probabilities equal, a team have %50 probability to reach quarter finals (2 teams out of 4). I guess you were trying to find a reasonable explanation about being out of Euros, however it is not true :) When the probabilities are concerned, it may be argued that they are underestimating winning chances of NED and GER against POR and DEN. If they are re-arranged, chance of NED will increase to %60 at least.


*Probability of having at least a point for Netherlands is %97.8. Despite the mountains of evidence, it's difficult for many sports fans to accept that the probability of the better team winning a game/match is often very close to 50%.

You should remove this entry if we ever win the finale of a European or world chamionship.

Or, even better, do a new analysis that makes us feel good when we win and when we lose. ;-)

I had recently been to a conference where this company called Mu Sigma also talked about the same thing.. they have added another step (inquisitive) in the process of defining analytics

http://www.mu-sigma.com/analytics/delivery-framework/analytics-delivery-framework.html

I'm not a big fan of OR because it thinks lots of concept is not practical. But, you've changed my mind today. You show me how to apply complex theory into actual business problem, this is kind of nice!

Low BMI can also come from low bone mass (thin frame or low bone density)or dehydration (as on the average, our bodies are 60% water). All of us lose height with aging, in particular after 60 years of age, it will be about 6 cm in a woman's lifetime. This will push the BMI up. So I am all for a measure tape and a look into the mirror rather than the scales and the BMI

1. I don't mean to defend the BMI, but your reasoning is not entirely correct:

So when you start to exercise to lose weight and build up muscle, you will be worse of according to the BMI

This would hold if the volume of fat lost would (more or less) match the volume of muscle tissue gained. In general, that is not the case for overweight people who start exercising. They tend to loose weight, so their BMI goes down.

2. Based on the BMI, the dangerous question "do you think I'm fat?" can safely be answered with "no, honey, you're short." :-)

Happy new year!
René

I'm told that too little body fat is a problem, at least for young women (makes it difficult to carry pregnancies to term?). Also, the jeans test may be deceptive for men. I think I read that a "pear" shape (where the flab concentrates above the belt and below the shoulders) is the most dangerous for men in terms of heart disease, whereas a big butt is somehow healthier. (Being a rippling mass of muscle, with one of those deceptive BMIs, I speak purely hypothetically here.)

It also seems that traffic light timings and speed limits should also be adjusted to be more "green".

I use the "decision quality wheel" at work:
1. frame the decision
2. seek meaningful & reliable data
3. create alternative options
4. logically (and fairly) compare the alternatives
5. pick the best option and commit to action

I'm told this thinking originally came from the Rand corporation in the 1950's, but I've also read similar in Peter Druckers classic book "the practice of management" first published in 1955. So not sure who worked it out, but I wish I learned this at school, its simple and effective!

Great point. Of course the reverse is true too - good results do not automatically imply good decision-making. One should track both and use both to improve results.

Hi John, This post reminds me on the book 'Fooled by Randomness' written by Nassim Nicholas Taleb. He mentioned some similar examples based on erroneous statistical reasoning.
He also states that even 'professionals' who understand the concepts of statistics are often fooled by them in practice.

By the way I enjoyed reading your blog.

Arno

This year I have given a few seminars on how to calculate social media ROI. The attendants are normally people from communication and design areas. Then, when I explain that by means of statistics you can build a model that allows/helps you to differentiate the impact of different marketing mix strategies applied all at once they stare at me like if I was crazy. Hope they aren't thinking to call Torquemada (http://en.wikipedia.org/wiki/Tom%C3%A1s_de_Torquemada). ;)
Finally a text that explains the importance of OR to my 14yr old son. And my 9yr old daughter!
Thanks John.
Regards,
Hugo