stop simplifying – Anna A. Meier

(or, how I learned that the kitchen sink model is the least of my concerns)

Early on in grad school, I was taught the dangers of the “kitchen sink” model. Throwing too many variables into a statistical model, I was told, would produce incorrect results. This felt intuitively wrong at first—the world is complicated, and don’t we want to make sure we’re accounting for all possible factors that could influence the outcome we care about, so we have the full picture—but I came around to understanding concerns about collinearity, post-treatment bias, and the other jargon we throw around to mean “don’t do this”. Better to include fixed effects for temporal and spatial variation, I learned. Better still to do experiments, if possible. If not, better to search the world until I found a natural one.

There were two competing aims, it seemed. On the one hand, we wanted to model the real world as accurately as we could (briefly setting aside all of the assumptions we make in frequentist modeling and how we immediately violate most of them when modeling the social world), and that meant thinking through where certain effects were likely to occur in a causal process, how variables might affect each other, and so on. This felt right—more thought, not less. More care in our specifications, not less.

But this line of reasoning led to a second, and somewhat contradictory, aim: simplification. The natural conclusion of the first aim—more thought, not less—was the realization that the social world is complicated. Endlessly complicated. Fascinatingly complicated—except not, not for some, because that complexity was too much, I was taught. It couldn’t be fully accounted for, or if it could, it would take too much work, too much time. We know that thinking deeply and thoroughly is challenging in the best circumstances, much less under the pressure to produce that we often face in academia. And so the solution presented was to remove complexity and create an artificial scenario, in the lab or in the field or in a survey, that was somewhat like the real world (but also necessarily not) in which all factors were accounted for (but if the world is actually so complex, how is this ever possible) and those of interest could be manipulated independent of the others (but nothing is independent and all of our mediation analyses and priming tasks show us this). The real world, the thing we were actually supposed to study, was too hard. Too suboptimal.

Coming from IR, a subfield that has been chastised (rightly, I think) for oversimplifying the world into enormous paradigms riddled with unrealistic assumptions, this turn toward simplicity felt wrong. Was simplicity (or replace with a euphemism—elegance, parsimony) good or bad? Was it fine as long as “methodologists” were doing it? Was the implication that theorists who simplified were not doing rigorous enough work? When was simplicity okay, and when wasn’t it? And who got to decide?

(Side note: I anticipate some pushback here. Certainly much has been written about establishing external validity in experimental research agendas, testing the exclusion restriction in instrumental variable designs, and so on. My aim right now is not to dig into the particular mechanics about how to make these approaches better, but rather to probe whether they should be constructed as normatively desirable for doing science in the first place.)

I do primarily, though not exclusively, qualitative work. I do entirely critical work. I am lucky, and also unusual, to have received formal training in those approaches as a political science Ph.D. student at a top U.S. program. In the corners of these spaces not touched by the KKV poison (ratio me; I’m right), we do not talk about simplicity. We do not talk about removing richness. Instead, we celebrate it. We go to the field because we know we must—because without understanding the context, our explanations will always be hopelessly confounded. We spend years in the archives because that is how we know what we’re even studying in the first place. We do this work to probe our own assumptions: a common refrain in my corner of the academy is, “I didn’t know what I was actually studying until I got to the field.” This can feel daunting, and there is a reason hardcore ethnographers start fieldwork trips very early on in their careers. (Tangent: this is one reason some qualitative graduate students are so worried about the impact of the pandemic on their work.)

We also know that by going to the field in the first place, by necessarily selecting documents because we are not machines (rant about machine learning postponed until a later date), by knowing that what’s in the archives in the first place is itself selected, we are producing the very contexts we study. A common critique directed at fieldwork by experimentalists is that our research sites are as artificial as theirs (which, interestingly, is a critique I never see leveled at experimental work by experimentalists absent the comparison to ethnographic research; I would love to see evidence that my impression here is incorrect). I do not wish to elevate one research methodology above others. There is sloppy, harmful research in all areas of the discipline. By “sloppy” I mean proceeding with assumptions the researcher knows are wrong—and whose wrongness affects the research design—because it is easier to make those assumptions than to relax them. By “harmful” I mean studies that rank causal identification above participants’ well-being, or that flit into and out of people’s lives without making an effort to center and understand those lives. This is not knowledge production. This is not “objectivity”. This is selfish, active harm.

But the other side of the coin—the complexities of the social world, and those complexities as intrinsic and valuable parts of that world—does not have to be viewed as a negative. They are inescapable, so why recoil from them? Why write complexity out of our work rather than trying to engage with it? Engaging is harder, it’s true. It slows the process of research down. In an environment of publish or perish, the incentives are not in complexity’s favor.

Still, I think we have to. If we minimize complexity, box it off, or assume it away, we are, in fact, performing the act of producing the social world—but one that appears only within the confines of academic research. We study a world that does not actually exist. Constructivists tell us that humans in general—and that includes researchers!—produce the world “out there”. But researchers also produce a world “in here”, in our labs and R consoles and focus groups and endless rambling blog posts. (Hi.) We then try to use the world “in here” to explain the world “out there”, often without acknowledging that that’s what we’re doing. To put this in quantitative or formal language, this is like writing a model. We know models are simplifications, but we also generally think they approximate something like “right”. I’d like to take this a step further, though. We think models are representations of the world “out there”, but they’re not, because we’ve intentionally written out all of the context, meanings, and irrationalities that actually converge to produce an outcome we care about. We view these as noise at best and nuisance parameters at worst—when, in fact, they are everything. The world we create in a model is sanitized, antisocial, inhuman. And as a social scientist who studies humans in a messy world, this feels fundamentally wrong.

This isn’t a call to return to the kitchen sink model. The kitchen sink model is complex, but it is not careful. The more variables we add, the less attention we pay to how each new index we throw in is put together. Who funded that data collection? Who defined key concepts? Was a coder uncaffeinated one morning and so failed to properly cross-check different spellings of Ukrainian village names? Did a year’s worth of data fall off the back of a truck? (Urban legend regarding the missing 1993 data from the Global Terrorism Database, but a fun one.) Did y’all even read the codebook?

If we tried to write a more careful kitchen sink model, I think we’d quickly come to the realization that most social processes are actually really hard to model mathematically. That’s not to say it can’t be done—folx much smarter than me have written about how. And that’s not to say qualitative work is a foolproof alternative (though my bias here should be clear by this point). Doing careful work is hard within any research tradition. The incentives of the discipline move against it. Figuring out how to do it takes a career, and may in fact take multiple careers in the form of more collaboration.

But to enable any of that to happen, and to create a foundation from which to figure out the details, we have to first stop viewing complexity as something to be managed at best and as the enemy at worst. Complexity is a gift. It’s what makes the social world so fascinating, and it’s what drew me to the social sciences in the first place. What a privilege, to study something so intricate. What a loss, to view that privilege as a liability.