Kahneman, D. (2011). Intuitions vs. Formulas. In Thinking Fast and Slow (pp. 222-233). New York, NY: Farrar, Straus and Giroux.
In chapter 21, Intuitions vs. Formulas we explore the validity of expert intuition vs. statistical formulas for prediction. Kahneman began the chapter referencing Paul Meehl who he credits with one of his earliest accomplishments, an interview for new recruits to the Israeli Defense Forces. Paul Meehl’s work, “Clinical versus statistical prediction: A theoretical analysis and a review of the evidence” (1954) found, in-a-nut-shell, that “simple, statistical rules are superior to intuitive ‘clinical judgements’ ” (p.230). This had a great influence on Kahneman during his early career when he was assigned, by the “Israeli Defense Forces”, to create a more reliable interview for incoming soldiers (p. 229). By “focusing on standardized, factual questions”, he created an interview based on “six traits in a fixed sequence, rating each trait on a five-point scale before going on to the next” (p. 231). The interviewers were extremely displeased with the prospects of throwing out all their intuitive expertise for a boring scale. So, he added, at the end, “close your eyes, try to imagine the recruit as a soldier, and assign him a score on a scale of 1 to 5” (p. 231). The results were significantly better than the old interview process and still in use 45 years later, he found when he went back to visit. Additionally, the “close you eyes” intuitive, subjective, expert evaluation question at the end was also found to be as accurate as the overall sum of the six ratings which meant that the expert intuition or subjective measure was also valid and reliable but only after answering the six objective questions. The objective priming influenced the experts’ ratings. Research methods should have both subjective and objective measures that show support for each other to illustrate validly.
Nevertheless, Meehl’s research found that statistical formulas were the same or better than expert intuition for prediction of just about everything. Despite the upset and controversy that his research started (he referred to his book as, “my disturbing little book” because it upset so many “experts” in various fields), to this day – his findings have only been further supported by subsequent studies. Kahneman cited various famous examples of prediction by simple statistical analyses that proved to out perform experts’ predictions for years and even decades to follow. Furthermore, it wasn’t just statistical analyses that were found to be superior but “simple” statistical analyses, as illustrated by “Robyn Dawes’s famous article ’The Robust Beauty of Improper Linear Models in Decision Making‘ (p. 226)”. That is, weighting of variables in complicated multiple regression analyses were found unnecessary in many cases. As, simply picking six or so valid predictive variables and weighing them all equally was enough and even optimal to achieve accurate prediction.
One example presented was of “Princeton economist and wine lover Orley Ashenfelter” who developed a simple statistical analyses to predict the future price and quality of wine from various regions. His results out-predicted (correlation greater than .90) experts who traditionally provided such predictions (p. 224). This sent experts and the wine community overall, into upset and denial.
Virginia Apgar, an anesthesiologist in 1953 utilized Meehl’s process by developing a simple statistic of “five variables (heart rate, respiration, reflex, muscle tone, and color) rated on a 3 point scale (0, 1, 2) as a predictor of a newborn baby’s distress, one minute after birth which is still used to today. This standardized statistic predicts respiratory function to prevent brain damage or death and alerts any need to intervene with the baby right after delivery. Previously, there was no consensus among experts as to what to monitor or when, resulting in higher infant mortalities.
Experts have been found, not only to be less accurate than a simple statistical formula but to actually contradict themselves approximately 20% of the time – even only moments after they were presented with the same data. This can be very disconcerting in the medical field. As Kahneman stated, “Unreliable judgements cannot be valid predictors of anything” (p. 225). For more examples on the “virtues of checklists and simples rules”, Kahneman refers us to Atul Gawande’s, A Checklist Manifesto [http://atulgawande.com/book/the-checklist-manifesto/] (p. 227).
Kahneman reminded us of another influence on experts’ judgements mentioned previously – that is that, “unnoticed stimuli in our environment [can] have substantial influence on our thoughts and actions” (p. 225). Remember that a “cool breeze on a hot day” or judgements made right after a food break can make you more optimistic which was illustrated by parolees being granted more lenient judgements post lunch than before lunch. Again, this information tends to be met with “hostility” because experts may know they are skilled but they tend not to know the “boundaries of their skill” and when or why a simple statistic of only a few equally weighted variables “could outperform the subtle complexity of human judgement”, resulting in confusion and defensiveness (p. 228). Additionally, Kahneman stated, “when human competes with a machine”, “our sympathies lie with our fellow human” (p. 228). Kahneman goes on to state that “prejudice against algorithms is magnified when the decisions are consequential” and that Meehl and others have “argued strongly that it is unethical to rely on intuitive judgments for important decisions if an algorithm is available that will make fewer mistakes” (p. 229).
This argument relates to today’s transition to more autonomous systems in cars, airplanes, and anything else that can have improved performance by reducing the impact of human error. Human error has been at fault for up to 80% of airline accidents, 60% of nuclear accidents, and 90% car accidents. Automation is also not 100% accurate but it usually has less error than the human user. Our attention is limited and our response time slower than automation but people still feel uncomfortable when the pilot announces that we are going to let the automated system land us today. Naturally, we are human biased despite the facts about human limitations. Kahneman describes statistical prediction over human intuition and judgement may feel “artificial” or “synthetic” and difficult to accept but it will likely gain acceptance over time, as reliance increases with knowledge (p. 228).
All that being said… there is a large body of research on the ability of human experts to instantly make judgements that can’t be matched by technology, at this time. Malcolm Gladwell’s book Blink nicely summarizes some of this research, if not romantically. Humans also have the still not completely understood ability to take extensive years of experience and synopsize it, in an instant, to make judgements that are complex and accurate. Another example of expertise in action was when Captain Chesley “Sully” Sullenberger saved hundreds of people in 2009 by utilizing decades of expertise to distill complex judgements under extreme time pressure and stress to complete an emergency water landing on the Hudson River after birds took out both of his aircraft’s engines. I must say, that happening so successfully seems like it would have been statistically impossible to predict.