The gold standard of science is a different experiment whose findings agree with the findings of the first.
Replication is much less valuable, scientifically and professionally, than non-scientists think it is. Simply repeating a published investigation runs a much higher risk of repeating any experimental errors or faulty assumptions that might have harmed the first one.
The whole point of science is to find knowledge that persists beyond one particular perspective; knowledge that is independently verifiable. Rote replication is not the best way to find this type of knowledge.
As for peer review, its purpose is simply to sharpen the communication of a completed study. Even if every study was replicated, the papers would still benefit from peer review.
It would be interesting to see if replication is really the glue that holds our greatest scientific achievements together. I suspect not. In my view a higher standard is progress towards powerful unifying theories, even if getting there is a messy business.
I'm thinking of something like relativity or quantum mechanics. Suppose a study in those fields fails replication. The whole thing still holds together, to the point where controversies at the part per billion level make it to the front page of the newspaper. Perhaps even most studies, taken in isolation, would be found to have problems when subjected to the strictest criteria for replication. Choosing replication as a silver bullet would be an unnecessary distraction.
Now, what about fields where there is no unifying theory on the horizon? If replication is all we've got, then sure. I can certainly see the point, especially if the results affect personal decisions (diet, medications, etc.) or public policy.
I suspect that "gold standards" can hurt science as much as help. Telling people that science is bunk because of the "replication crisis" contradicts the fact that messy science has produced results of astounding accuracy and predictive power. Learning from success should be at least as important as installing safeguards against failure.
> Replication is slow and can be extremely expensive
Gold standards are identified as such because they’re the best. Not everything needs to meet the gold standard to be debatable. Peer review is adequate for further research, but perhaps not policy initiatives. An unreviewed paper is enough to start simple inquiries. Et cetera
Replication is a more powerful statement about the validity of research findings than peer review. Lots of valid research findings will not be replicated before they influence other researchers and the public. The point of this piece is that lay audiences shouldn't expect a simple "gold standard" by which they can distinguish the good research from the bad; understanding research requires critical thinking and access to domain expertise.
And there are also ways in which replication is orthogonal to peer review. Replication can't by itself tell you whether a piece of research makes a significant contribution to the field, or whether it is itself derivative, or poorly presented.
It does not have to be slow in all domains. In my field there are many modeling papers. Journals could require that all data and code be open access, or at least define a submission type where this is the case. You could even automate the process of running submitted containers / packages.
All that would prove is that the the code that implements a buggy version of a model gives consistently wrong results. Actual replication requires that somebody else implements the code independently. And that both implementations are checked over a reasonably wide range of parameters over which the model should be valid.
Actual replication is much more effort and consequently slower than just a "docker run".
While I agree that this should be mandatory in many cases, it doesn't prove too much. I've experienced many cases where the open code worked fine with the test data provided, but failed completely when I tried it with my own real world data.
Replication. If science has a gold standard, it is this.