An ode to the error term
Sara Kok
For about three months a year, I teach statistics to very uninterested and unwilling first- and second-year students. In that group, maybe about ten percent is actually interested and dedicated enough to figure out how this ‘math stuff’ actually works. In my first year of university, I was not one of those students either. But in my second year, something clicked. Everything just started to sound logical to me and I figured out the ‘math stuff’. I had always liked algebra, and now I loved statistics. Because of this, I now regularly stand in a room, look into the glazed-over eyes of hungover 18-year-olds, and hope to detect that spark that will make them love statistics, too.
*Image source: Wikimedia
Despite my dedication to the science of statistics, our relationship is not without problems. I love the calculations, the feeling of being able to make raw data into something. However, at the same time, statistics, for me, ultimately exemplifies what many scholars call the logic of mastery. The logic of mastery entails the idea that we, as humans, can become masters over nature. It is this logic that ultimately informs Western thinking.
The idea of mastery gained prominence during the industrial revolution, but had been around for much longer. It is informed by a dualistic thinking – a thinking in distinctions between human and nature, between man and woman, between Black and white. Science then, has become an extension of this mastery – to dominate nature, we need to know nature, never once doubting that we can. This logic of mastery has informed many projects of our time and times past, including that of colonization and the genocide accompanying it. It has become the basis for rapid industrialization with the climate catastrophe that follows.
However dominant the logic of mastery is in our world; it is not the only way of thinking that exists. Many indigenous groups that are surviving the process of colonization have very different ways of viewing knowledge and the world. They do not subscribe to this logic of mastery and the dualistic thinking that accompanies it. For these groups, there is no distinction between man and nature, between human and animal – there are only relationships (Gagnon-Bouchard & Ranger, 2020). There is no need to dominate nature because we live in reciprocity with it: it is a gift given to us and we have a responsibility to care for it.
Still, in Western thinking, science is an extension and mimicry of mastery. We need to understand nature in order to dominate it and we replicate this logic in the way we practice science – we need to be able to neatly categorize things. Something either is or is not a certain way, in order for us to be able to understand and use it. However, this can lead to problems. Because of this binary thinking, we leave little room for uncertainty or inconsistency.
Back to statistics – why does this it ultimately exemplify the logic of mastery? When we perform statistical analysis, we create models, portrayals of mathematical relationships between variables, which we hope represent the real world in some way. Based on these models, we predict alternative scenarios, we publish articles and we make policy decisions. When we collect data - what we see as facts about the world around us - we immediately catalogue it. We give it a score, a yes or a no. We become masters over nature by modeling it, by being able to practice our domination and manipulations in the little systems that we create on the computer, falling fully in step with the logic of mastery that has led to so much devastation.
I believe, however, that there is hope – there is one reason why I still have some faith in this hopelessly masterful field of Western science: the error term.
One of my most-used statistical tools is regression analysis. Regression analysis models causation. If you do it right, you will have a model that you can use to predict alternative scenarios, all kinds of different situations. Your basis is a neat mathematical formula that looks similar to the ones we all learned in high school: y = a + b(x) + ε. A and b can be extrapolated from the data you have. You can then fill in any X in order to get to Y. Even if you have not observed that certain X, you can predict what would happen to Y, leaving a way for you to predict alternative scenarios. If the model has enough explanatory power, it can then be used to say something about the ‘real world’.
The explanatory power of a model is limited, however – too much of it will make any researcher a bit suspicious that something has gone wrong in creating the model. This is because it is not possible to explain everything. A model is a model, not reality. This acknowledgement is also implicit in the last term of the formula, ε: the error term. Ε stands for everything in the data that cannot be explained by the formula itself. It is an acknowledgement of the imperfections of the model, and of the statistical method in itself – it is an acknowledgement, a confession that the real world is simply too complicated to be explained in one model – or multiple models. The standard inclusion of this term into the formula implies a certain imperfection that I hope opens up the practice to different ways of thinking. Maybe, we might think, if a system has to acknowledge its inherent problems, it might leave us some room to consider why. It gives me hope that we might be able to realize at some point that our project of mastery is not going to work out – that we will not be able to fully know nature, and that, hopefully, we will not need to.
*Context from the following article, about the Nasa people in South America:
Now, just as trees and humans are technically interchangeable, so land and the human body—conceived as a territory unto itself “[…] composed ofwater, stones, peaks, hills, hollows, roots, stems, buds, leaves, etc” —become commutable. (Portela 2005, 206) The continuity established between people and trees is reproduced at a wider scale between the human body and Earth itself with humanness operating as a kind of universal threading motif stringing together everything and everyone. Such continuum is then conceptualized through patterned sequences of relational affinities based for the most part on morphology or functionality. In Europe this kind of reasoning was best approximated and illustrated by Gian Battista della Porta (1535–1615) in his work Phytognomonica (1588) in which he established a series of similarities between human as well as non-human organisms and plants through morphological criteria.