Should We Care About Replication?
Reflections on a recent debate
Over the last week, there has been a lot of chatter on social media about replication (or lack thereof) of empirical studies in economics. This springs up from time to time where people note that this or that study is hard to replicate. Other times, people point out that there is simply an absence of replication studies that are done each year. The discourse around this is that the failure to have a branch of the literature dedicated to replication is a failure of the profession. Some even go so far to say that this just demonstrates that the entire field of economics is nonsense.
I think that it is worth having a conversation about these issues, but that the discourse largely misses the point. Sometimes missing the point is deliberate because people are acting in bad faith.
Thus, in today’s newsletter, I would like to think through some of these issues in a much more serious way. First, I would like to talk about why replication matters. Then, I would like to address some problems with the criticism. For example, by appealing to certain statistics, it is easy to weave a narrative. However, it is important to recognize that these statistics reflect equilibrium outcomes. In addition, it is important to understand what replication failures do and do not mean for the usefulness of economics. Finally, I would like to conclude by discussing the importance of price theory (of course).
Setting the Stage
Let me begin by saying that I think replication is important. In part, it is important because it helps us to root out fraud in the profession. If people are making up data or manipulating results, it is important to know that this is happening and who is doing it. But even when people are doing work to the best of their ability, they will still occasionally make mistakes in collecting data or merging data from different sources. They might have errors in their code. Although these are just honest mistakes, it is still important to know that they exist. In fact, they give us an excuse to revisit the questions asked in that research not only using the same data, but also data from other places and times.
It is also important to keep in mind that we don’t always need literal replication. In other words, we typically think of replication as simply collecting the same data and estimating the same models to see if we get the same results. However, the literature often features similar analyses using different data or a longer sample. The accumulation of that research can be just as important as making sure a particular paper or result can be replicated.
Replication can also be done in different ways, including ways that aren’t easily identifiable to people searching for replications. For example, one of my first publications was designed to see how particular empirical results in monetary economics would hold up if researchers used properly measured monetary aggregates rather than the simple sum aggregates used in prior research and provided by the Federal Reserve. In that paper, I estimated the same models with an updated sample of data. I did so using both the simple sum aggregates used in previous research and the theoretically correct Divisia monetary aggregates. What I wanted to know is whether the results of those previous studies would hold up in a new, larger sample, and if so, whether replacing the simple sum aggregates with the Divisia aggregates would overturn the previously puzzling results. I found that the earlier work did hold up for the most part in the larger sample. I also found that the Divisia aggregates overturned the puzzling results.
I bring this up because this is a form of replication. As part of this project, I used the same methodology of the previous papers from the literature and I tested to see whether the results held up in a longer sample. But my starting point was to assume that the results would hold up. My hypothesis was that their puzzling results were caused by the use of an imperfect measure of money, not that these results wouldn’t hold up to an attempt at replication. Nonetheless, in the process, I did perform a type of replication. Similar types of work are often neglected by those searching for replication studies.
On the Importance of Solving for Equilibrium
One statistic that I have seen bouncing around is a time series of comments, replies, and rejoinders. What it shows is an upward-trajectory of published papers in economics that are comments, replies, or rejoinders to previously published papers beginning in the early 20th century. These types of articles peak by the late 1960s and early 1970s and subsequently decline. This dramatic reversal in these types of articles has been used to contribution to the debate by saying that this demonstrates that economists are no longer interested in debate, or in the validity of empirical results. I am not sure that this is the correct interpretation. I think it reflects a change in the publication process.
Economists in academia are by now well aware of the inanity of the peer review process. But you don’t have to be a member of the profession to notice this. Peer review of empirical research often goes through multiple rounds of revisions. The final product often consists of very long appendix, filled with robustness checks. The paper presents the main results. The appendix presents all of the other results that either the author or the reviewers (but mostly the reviewers) wanted to see to make sure that the result held up to other assumptions and model specifications. In some cases, the appendix can be longer than the paper itself.
In talking to older generations of economists, they lament that this has become the norm. When they were younger, this wasn’t the case. Although I haven’t studied it closely, it certainly seems like this presents a theory that would produce an observationally equivalent time series. In other words, whereas in the past people would submit comments in response to a published paper, many of those comments are now incorporated into the review process. Thus, it is not necessarily the case that economists have become less interested in debate or engagement with existing research, but rather that the comments and replies are happening behind the scenes with authors, reviewers, and editors.
In fairness to critics, I would prefer that the profession go back to the model of comments and replies. The review process is becoming more arduous, and unnecessarily slow. Why not allow reviewers who think the paper is worthy of publication to get published with the option to submit a comment that examines the robustness of the results to alternative model specifications? This is more difficult than it sounds, as it creates a different standard for publication and people will adjust on various margins. Nonetheless, it might be worth pursuing.
In addition, critics miss the mark on a valid criticism. One issue with the economics profession is that it is very hierarchical. One notable economist, let’s call him OB, once wrote a paper using a framework that he had previously rejected as too insignificant for publication. When the author of the original work inquired into what had changed, OB is said to have replied, “an idea isn’t discovered until it is discovered in Cambridge.” If you think that the profession stifles debate, one would be better served by focusing on the research networks and the snobbish tendencies of a subset of the profession. (But such criticism is also difficult to do from within the profession, lest there be retaliation. Fortunately, I’m a tenured, full professor.)
Do Not Fall for a Bait and Switch
Discussions of replication also seem to confuse frontier research with the core fundamentals of economics as a discipline. People who are publishing economic research in journals are engaged in conversations at the frontier of the field. They are trying to move the field forward and gain more understanding. Independent of issues related to replication, the quality of research at the frontier should be expected to be quite noisy. Some people are going to publish novel theories and empirical results that challenge conventional wisdom. Some of those publications will have a significant impact on the field. Most will not. Theories will be found not to match the data, or only to match the data under particular circumstances. Provocative empirical results will be found to only apply in a particular sample of data, but not robust to data from other places or time periods.
The moral of the story is that one individual paper or one particular empirical result is unlikely to change the way that people think about a particular topic. At the frontier, it tends to be the case that people are working on similar ideas and topics. Over time, some of those research agendas turn out to be disappointing or dead ends. Other research agendas move the field in a particular direction.
The importance of one particular paper is therefore not as significant as some people would lead you to believe. For replication failures to truly affect one’s views of the state of the literature, such failures would have to be widespread within that literature. If there is a large body of research that says policy X causes outcome Y, the failure of one study to replicate is unlikely to change many people’s minds.
Nonetheless, some critics seem intent on saying that failures to replicate indict the profession as a whole. This is a charge that economics is useless and the research untrustworthy. But one does not follow from the other. The fact that one particular study at the frontier doesn’t replicate — or even that a lot of those studies don’t replicate — does not tell us anything about the usefulness of supply and demand, for example.
The Importance of Price Theory
This is one reason why I am so passionate about sharing my appreciation for price theory. I will let Brian speak for himself, but one reason that I really wanted to write this newsletter is that I think the profession occasionally loses touch with the importance of basic price theory and the importance of simple models. I also think that the profession has swung too far in the direction of pure empirics. There is very much an attitude among younger scholars that we can’t say much of anything without using particular research methods and that we must let the data speak. However, as I often write in these newsletters, data do not speak.
We need price theory to disciple our thinking. Price theory provides us with a guide for thinking about topics. Someone who understands price theory well will ask better research questions and be in a better position to evaluate empirical results, regardless of whether they have been replicated.
At the frontier, we love surprising results. We love clever arguments. But there is often a reason why surprising results are surprising. They often don’t hold up well to scrutiny.
But some do hold up to scrutiny. In fact, I would argue that it is typically the result of a solid foundation in price theory. On the other hand, surprising results from theory-free empirics and ex post storytelling of an empirical result do not.
This shift towards letting the data speak likely means that there is a greater need for more widespread replication. At the same time, a good price theorist can often point you in the direction of what is most in need of attention.
There is a (possibly apocryphal) story that might be of some value. Supposedly, in the heyday of UCLA price theory, economists would show up and present their work and the empirical results would be subjected to the Alchian test. This test essentially consisted of the faculty looking at Armen Alchian and asking, “Armen, does that sound right?” This is an amusing story to tell, but there is an actual lesson here. Armen Alchian was such a good price theorist that his colleagues knew that he would immediately be able to say whether a particular marginal effect estimated in someone else’s paper was reasonable. This wasn’t blind faith. This indicated such mastery of price theory that they assumed Alchian would be able to put a ballpark figure on what the estimated elasticity should be. Perhaps that is an alternative form of replication.

