The Seven Pillars of Statistical Wisdom
I’d like to spend this post gushing about statistics.
I didn’t get Statistics when I was introduced to it during junior college. As far as I knew, it was a messy, behemoth undertaking that involved ancient Greek symbols. It didn’t help that it was being taught like trigonometry and other Euclidian concepts; like a formula.
Don’t get me wrong, there is nothing wrong with learning things through formulae. It is, after all, a clean, quick way to summarize important, dense, mathematical concepts. It reminds me of the hermeneutical circle that my Existentialism professor talked about during class. With dense, self-referring knowledge, it is impossible for you to understand from a first pass, because what you need to understand the front-end is explained in the back-end, and what you need to understand the back-end is explained in the front-end. The only way to ‘get’ the knowledge is thus to engage with the information again and again, piecing together the concepts slowly as you experience revelation after revelation. This technique of studying, employed heavily in the humanities, is strangely absent in the Sciences, to the detriment of people learning in those field, I think.
Science has been quite misleading in their description of the phenomena that they study. The simple formulae, the reductive theories, and the core concepts are often anything but. They are steeped in a rich history and tradition of problem-solving, and it seem remiss in the modern education system that we never mention that while teaching the sciences to people. To emphasize, the problem is not that we teach the reduced, cutting-edge parts of the science, it is that we do not teach about the story of how those parts came to be through iterations of solving problems.
One of the most grievous examples of this is statistics, a mathematical science that replies on consensus rather than undeniable truths. Statistics is unique, in that the mathematics supporting it are rigorous and replicable, but the interpretation of results is dependent solely on consensus. There is no mathematical reason for using 95% confidence level in our analysis, we just do because everyone is doing so. There is no science basis for the usage of means over medians, only a collective agreement that the mean provides some information about a population that is more valuable than the median for most purposes.
But the consensus was not haphazard, it was formed through decades of testing, debate, and showing of proofs. Likewise, the formulae and systems of testing that is ubiquitous in statistics today arose from centuries of problem-solving.
I believe strongly that education is merely the activity of passing down the stories of ideas [a potential blogpost in the future, maybe?]. As human beings, our brains are wired with a tendency to find narrative patterns that fit a cause and effect model. We also become very good at making sense of the narrative through checking for errors and compensating for gaps in information. All this is often underutilized in the education of statistics. Again, if your learning was anything like mine, then you most likely learned solely through understanding the logic and formulae involved. Name-drops, such as Pearson’s p coefficient, or Fisher’s F-test, are often ignored for being superfluous.
Over time, my interest in statistics grew. Along with it came an interest in understanding the principles behind the formulae and logic that is often employed. Imagine my surprise when I discovered that the science of Statistics has a controversial, dramatic, and almost whimsical history, involving Charles Darwin, the orbit of planets, the shape of the earth, and testing for the quality of beer.
Which brings me to this book, the Seven Pillars of Statistical Wisdom, by Stephen M. Stigler. Stigler is a professor of Statistics in the University of Chicago. In this text, he plunged into the history of Statistics and how the pillars of how we use statistics was built over centuries of problem-solving.
Remember the hermeneutic circle that I mentioned at the start of this post? The circle applies into understanding this book. Seven Pillars is accessible but tough; If you dived into this book without any knowledge of statistics, you will not be able to understand it. I found myself flipping through my old notes on Statistics to make sense of the concepts that were mentioned.
But –slowly and painfully –things began to make sense. Concepts like the basic mode, median, and mean, Student’s t-distribution, Pearson’s correlation, and Galton’s work on regression came to life as their discovery was explained and elaborated with care and precision. These formulae no longer merely existed in some mathematical ether; they were artefacts, tools that were formed for solving important, fascinating, or amusing problems. It was mind-boggling to learn that the average was a very recent invention, and actually very unpopular in its infancy. It was a laugh to learn about Gosset, the Guinness employee behind Student, who devised that t-distribution. It was fascinating to learn that regression was a product of Galton’s work on explaining the paradox of Darwin’s evolutionary theory. Real, hard-core, science was being conducted, and it was inspiring for this nerd.
Stigler wrote about seven basic principles that formed from this history statistical problem-solving, and I think it’s interesting to consider them independently of other statistical concepts. In order, the seven pillars are:
1. Aggregation: The idea that new information can be gained by discarding information, particularly personalized, individual points of data.
2. Information Measurement: The idea that each successive collection of data points becomes increasingly less valuable. The first set of data points are thus worth more in analysis than the next.
3. Likelihood: The idea that we can (and should) calibrate our inferences with probability.
4. Inter-comparison: The idea that we can make comparisons without needing an external standard.
5. Regression: The idea that extreme phenomena tend to produce less extreme phenomena in the next iteration.
6. Experimental Design: The idea that we can gain information by testing randomly and combining variables.
7. Residual: The idea that we can learn by using a theory, subtracting the effects that are accounted for, and seeing what’s left.
Personally, I noticed that these principles of statistics are not only applicable to hypothesis testing, but also for general, good problem-solving. While mathematics will give us the rigour and precision needed to be more certain (probably) of our knowledge, these principles hold for every-day simple problem-solving as well. They were, after all, adopted to solve the problems faced in reality.
But my biggest take-away from Seven Pillars is the fact that we seem to miss a great opportunity to learn about important concepts through stories and the hermeneutic circle. Education is the act of passing down the stories of ideas, and it is the missing link that allows us to understand parts of these important concepts that we always took for granted, without really knowing why.