About 1 in 3 Economics Studies Don't Replicate
Editor's Note: This article was provided by our partner, ScienceNordic. The original is here.
How durable is the research that comes out of peer-reviewed scientific journals?
This is a question that many economists are now asking themselves.
A criterion for good research is that it can be reproduced, and that's exactly what a large team of researchers have tried to do.
In a meta-study published in the journal Science, they examined 18 laboratory experiments published in two of the most prestigious economic journals, American Economic Review and Quarterly Journal of Economics, between 2011 and 2014.
"We managed to get the same result in 11 out of the 18 studies, which corresponds to 61 per cent. It’s not a poor result, but it should be higher, and there is need for improvement and more transparency," says co-author Magnus Johannesson, professor at the Stockholm School of Economics, Sweden.
"I’d have preferred that the success rate was at least 80 per cent. Although we are dealing with a non-exact science, 61 per cent still too low," says Associate Professor Alexander Christopher Sebald, at the University of Copenhagen Economic Institute, who was not involved in the study.
Original Results May Give False Positives
So why were they unable to replicate the results in 39 per cent of cases?
"The original results could be false positives. That’s to say that they found a connection that isn’t really there. The studies that couldn’t be verified were often based on small samples or have a p-value close to 0.05, which increases the risk of a false positive," says Johannesson.
P-value is a statistical term that indicates how statistically significant the result is. A p-value of 0.05 means that there is a five percent risk of accepting a result that turns out to be incorrect. And a lower p-value gives a more certain result.
Many fields of research, including economics, report their statistics with a P-value of five per cent. This is taken to be the level at which a result is statistically significant.
Publication Bias and P-Hacking: Big Problem in Research
Results with a p-value of 0.05 or less are more likely to be published than less significant results. This is referred to as the so-called ‘publishing bias’. Researchers may partake in it as they try to obtain this coveted level of significance and get their research published.
"The problem of publication bias is that significance becomes an end in itself. It makes some researchers, consciously or unconsciously, choose methods that can provide a significant result," says Johannesson.
"We call it p-hacking, when scientists change their methods to achieve significance. It’s generally a big problem in the world of research--also in economic sciences," he says.
But Johannesson does not know if the 39 per cent of un-replicated experiments used p-hacking.
"It’s also important to say that a failed check doesn’t necessarily mean that there is something wrong with the original study,” says Johannesson. Uncertainty affects all statistical analyses, including their own verification study, he says.
“We need more checks in order to increase the statistical certainty," says Johannesson.
Verification Is Becoming Widespread
The study follows a series of similar studies that have aimed to verify the results of other research fields.
Economists were particularly inspired by a similar study of psychology research published in 2015 in the journal Science.
In this paper, a number of different research teams attempted to verify 100 selected experiments published in high-profile journals. Only 36 percent could be replicated.
The result of the psychology-verification project is challenged in the latest issue of Science by another team of psychologists who believe that the verification process was flawed, and that the success rate is in fact higher. But this claim is disputed.
Scientists Should Pre-Register Their Research Methods
Johannesson recommends a new practice--researchers should register their analytical methods before embarking on the study. Some journals in different fields of research are already experimenting with this concept.
"If everyone can see how you intend to carry out an experiment, it’s then difficult to deviate from it when the experiment is in progress. It also makes it easier for others to verify and check the results. Generally, there is need for more data sharing and transparency," he says.
Sebald approves of this new approach.
"The results of the study here shows that economists should be more open about their methods. Registration is a good solution," he says.
"I hope it’ll be possible to make more of these verification studies. Unfortunately, I think it will be difficult because there is little willingness to spend the necessary resources. As in all other disciplines, there is a big focus on new knowledge and results. But a study such as this underlines the importance [of checks]," says Sebald.