Yves here. I trust readers will enjoy this important piece on the replication crisis, here in science (we have a link today in Links about how the same problem afflicts economics. From KLG’s cover note:

My take follows the post that was based on the work of Nancy Cartwright last month, in which I extend her arguments in direction that she may not have intended:
https://www.nakedcapitalism.com/2024/02/our-loss-of-science-in-the-21st-century-and-how-to-get-it-back.html

Basically, replication is possible for “small world” questions but impossible for “large world” questions. A small world can be a test tube with enzyme and substrate or a mission to Saturn (used in the post). A large world can be a single cancer cell. This is the key difference for replication, which nobody does anyway, whether the “research finding” (an Ioannidis term) is a large world or a small world problem.

By KLG, who has held research and academic positions in three US medical schools since 1995 and is currently Professor of Biochemistry and Associate Dean. He has performed and directed research on protein structure, function, and evolution; cell adhesion and motility; the mechanism of viral fusion proteins; and assembly of the vertebrate heart. He has served on national review panels of both public and private funding agencies, and his research and that of his students has been funded by the American Heart Association, American Cancer Society, and National Institutes of Health.

The Replication Crisis™ in science will be twenty years old next year, when Why Most Published Research Findings are False by JPA Ioannidis (2005) nears 2400 citations (2219 and counting in late-March 2024) as a bona fide sextuple-gold “citation classic.”  This article has been an evergreen source on what is wrong with modern science since shortly after publication.  The scientific literature, as well as the journalistic, political, and social commentary on the Replication Crisis, is large (and quite often unhinged).  What follows is a short essay in the strict sense of the word attempting to understand and explain the Replication Crisis after a shallow dive into this very large pool.  And perhaps put the door back on its hinges.  This is definitely a work in progress, intended to continue the conversation.

This founding article of the Replication Crisis makes several good points even after beginning the Summary with “There is increasing concern that most current published research findings are false.” (emphasis added)  I had long been a working biomedical scientist in 2005, but I did not get the sense that what my colleagues and I were doing led to conclusions that were mostly untrue.  Not that we thought we were on the path of “truth,” but we were reasonably certain that our work led to a better understanding of the natural world, from evolutionary biology to the latest advances in the biology of cancer and heart disease.

Much of the Replication Crisis lies in the use and misuse of statistics, as noted by Ioannidis: “the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value of less that 0.05.”  Yes, this has been my experience, too.  I remember well the rejection of a hypothesis based on the notion that the difference in the levels of two structural proteins required for the assembly of a larger complex of interacting proteins in diseased heart after maladaptive remodeling subsequent to heart damage were not “statistically different” from the levels in normal heart, with 50% less not being significant.  This was true, according to a p-value that was attached to the data.  Unsuccessful was the argument by analogy that a house framed with half as many studs holding up the walls and 50% of the number of rafters supporting the roof would not be able to withstand static stresses due to weight and variable stresses due to heat, cold, wind, and rain.  A victory for statistics that made no biological sense, and one of these days I hope to return to this problem from a different perspective.

The examples used by Ioannidis in Why Most Published Research Findings are False are well chosen and instructive.  These include genetic associations with complex outcomes and data analysis of apparent differential gene expression using microarrays that purport to measure the ultimate causes of cancer.  Only 59 papers had been published through 2005 that included “genome wide association study” (GWAS) in the body or title of the paper (there are currently more than 51,000 in PubMed).  The utility of GWAS in identifying the underlying causes of any number of conditions with a genetic component have not been particularly useful, yet.  For example, the “ultimate causes” of schizophrenia, autism, and Type-1 diabetes remain to be established.  Kathryn Paige Harden has recently reanimated the Bell Curve argumentfor a determinant genetic basis of human intelligence.  This game of zombie Whac-a-Mole is getting tiresome.  Professor Paige’s book has naturally exercised those likely to agree with her and those who do not (NYRB paywall).

Measures of gene expression using microarrays in cancer and many other conditions have held up at the margin, but not as well as the initial enthusiasm led us to expect.  The experiments are difficult to do and difficult to reproduce from one lab to another.  This does not make the (statistical) heatmaps produced as the output of microarray experiments false, however (more on this below in the discussion of small versus large systems).  The thoroughly brilliant molecular biologist who developed microarrays is now working on Impossible Foods.  Perhaps plant-based hamburgers (I would like mine with cheese, please) will rescue the planet after all.

Getting back to Ioannidis and the founding of the Replication Crisis, he is exactly right that bias does produce faulty outcomes.  The definition of bias is “the combination of various design, data analysis, and presentation factors that tend to produce research findings when they should not be produced.”  There can be no argument with this.  Nor can one dispute that “bias can entail manipulation in the analysis or reporting of findings.  Selective or distorted reporting is a typical form of such bias.”  Yes, and this has been covered here often regarding in posts on Evidence Based Medicineand clinical studies run by drug manufacturers that reach a positive conclusion.

A series of “corollaries about the probability that a research finding is indeed true” are presented by Ioannidis.  These are statistical, and according to the formal apparatus used they are unexceptional, if one accepts the structure of the argument.  A few stand out to the working scientist who is concerned about the Replication Crisis, with provisional answers not based on statistical modeling:

Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.

Answer: This describes any research at any important frontier of scientific knowledge.  One example from the perceived race to beat Watson and Crick to the structure of DNA, Linus Pauling proposed that DNA is a triple helix with the nucleotide bases on the outside and the sugar-phosphate backbone in the center (where repulsion of the charges would have made the structure unstable).  That Pauling was mistaken, which is not the same as false, was inconsequential.

Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true.

Answer: This is so “true” that it is trivial, but it is a truism that has been eclipsed by marketing hype along with politics as usual.

Corollary 6: The hotter the scientific field (with more scientific teams involved), the less likely the research findings are to be true.

Answer:  Perhaps.  In the early 1950s few fields were hotter than the search for the structure of DNA.  Twenty years later, the discovery of reversible protein phosphorylation mediated by kinases (enzymes that add phosphoryl groups to proteins) as the key regulatory mechanism in our cells led to hundreds of blooming flowers.  A few wilted early, but most held up.  As an example, the blockbuster drug imatinib (Gleevec) inhibits a mutant ABL tyrosine kinase as a treatment of multiple cancers.  That cells in the tumor often develop resistance to imatinib does not make anything associated with the activity of the drug “false.”

But “true versus false,” is not the proper question regarding “published research findings” in the terminology of Ioannidis.  As Nancy Cartwright has pointed out in her recent books A Philosopher Looks at Science and The Tangle of Science: Reliability Beyond Method, Rigour, and Objectivity (with for coauthors), recently discussed here added comments in italics in brackets:

The common view of science shared by philosophers, scientists, and the people can be described as follows:

  • Science = theory + experiment.
  • It’s all physics really.
  • Science is deterministic: it says that what happens next follows inexorably from what happened before.

This tripartite scheme seems about right in the conventional understanding of science, but Nancy Cartwright has the much better view, one that is more congenial to the practicing scientist who is paying attention.  In her view, “theory and experiment do not a science make.”  Yes, science can and has produced remarkable outputs that can be very reliable (the goal of science), “not primarily by ingenious experiments and brilliant theory…(but)…rather by learning, painstakingly on each occasion how to discover or create and then deploy…different kinds of highly specific scientific products to get the job done.  Every product of science – whether a piece of technology, a theory in physics, a model of the economy, or a method for field research – depends on huge networks of other products to make sense of it and support it.  Each takes imagination, finesse and attention to detail, and each must be done with care, to the very highest scientific standards…because so much else in science depends on it.  There is no hierarchy of significance here.  All of these matter; each labour is indeed worthy of its hire.”

This is refreshing and I anticipate this perspective will provide a path out of the several dead ends modern science seems to have reached.  Contrary to the conceit of too many scientists [and hyper-productive meta/data-scientists such as Ioannidis], the goal of science is not to produce truth [the antithesis of falsity].  The goal of science is to produce reliable products that can used to interpret the natural world and react to it as needed, for example, during a worldwide pandemic [emphasis added].  This can be done only by appreciating the granularity of the natural world.

Thus, the objective of scientific research is not to find the truth.  The objective is to develop useful knowledge, and products, that lead to further questions in need of an answer.  When Thorstein Veblen wrote “the purpose of research is to make two questions grow where previously there was only one” (paraphrase), he was correct.

One example of this from my working life, which is in no way unique: Several years ago, I reviewed a paper for a leading cell biology journal.  The research findings in that article superseded those of a previous article.  The other anonymous reviewer was absolutely stuck on the fact that the article under review “contradicted” the previous research, which has been done in my postdoctoral laboratory but not by me (I had nothing to do with that work but was present at its creation).  We went through three rounds of review instead of the usual two, but we all eventually came to an agreement that the new results were different because ten years later the microscopes and imaging techniques were better.  Had I not been the second reviewer, the paper would have probably been rejected by that journal.  This did not make the earlier “research finding” false, however.  The initial work provided a foundation for the improved understanding of cell adhesion in health and disease in the second paper.  All research findings are provisional, no statistical apparatus required [1].

Reliability and usefulness are more important in science than the opposite of false.

More importantly, there is also a much larger context in which the Replication Crisis exists.  In the first place, scientists do not generally replicate previous research only to determine if it is true, i.e., not false, according to Ioannidis, other than as an exercise for the novice.  If the foundation for further research is faulty, this will be apparent soon enough.  Whether research findings can be replicated sensu stricto depends on the size of the world in which the science exists.

What is meant by “size of the world”?  Again, this comes from Nancy Cartwright in A Philosopher Looks at Science.  In her formulation as I understand it, the Cassini-Huygens Mission that placed Cassini spacecraft in orbit around Saturn from 2004 to 2017 was a “small-world” project.  Although the technical requirements for this tour de force were exceedingly demanding, there were very few “unknowns” involved.  The entire voyage to Saturn, including the flybys of Venus and Jupiter, could be planned and calculated in advance, including required course corrections.  Therefore, although the space traversed was unimaginably large, Cassini-Huygens was a small-world project, albeit one with essentially no room for error.

Contrast this with the infamous failure to reproduce preclinical cancer research findings.  The statistical apparatus involved in the linked study is impressive.  But going back to Ioannidis’s Fourth Corollary, “The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.”  This describes cancer research perfectly.  Although not explicitly recognized by many scientists and virtually all self-interested critics of science, the cancer cell comprises a very large world.  And this large world extends to the experimental models used at the cellular, tissue, and organismal levels.

None of these models recapitulate the development of cancer in a human being.  Very few can be replicated precisely.  They can be exceedingly useful and productive, however.  Imatinib was developed as an inhibitor of the BCR-ABL tyrosine kinase fusion protein and confirmed in the test tube (very small world) and in cells.  The cell, despite it very small physical size, is a very large world that might be described by several thousand nonlinear equations with an equal number of variables.  Scientists in systems and synthetic biology are attempting this.  Imatinib was subsequently shown to be effective in cancer patients.  Results vary with patients, however.  Experimental results in preclinical cancer research will also depend on how the model cell is cultured, for example, either in two dimensions attached to the bottom of a plastic dish or in three dimensions in the same dish surrounded by proteins that poorly mimic the environment of a similar cell in the organism.  This was not appreciated initially, but it is very important.  These variables affect outcomes as a matter of course.  As an aside, the apparent slowness of the development of stem cell therapy can be attributed in part to the fact that the stem cell environment determines the developmental fate of these cells.  A pluripotent stem cell in a stiff environment will develop along a different path than the same cell in a more fluid environment.

Thus, replication depends primarily on the size of the scientific world being studied.  The smaller the world, the more likely any given research finding can be replicated.  But small worlds generally cannot answer large questions by themselves.  For that we need the “tangle of science,” also described by Nancy Cartwright and colleagues with new comments in italics in brackets:

Rigor is a good thing; it makes for greater security.  But what it secures is generally of very little use [while remaining largely confined to small-world questions].  And that “of very little use” extends to what are called evidence-based policy (EBP) and evidence-based medicine (EBM).  The latter has been covered here before through the work of Jon Jureidini and Leamon B. McHenry (Evidence-based medicine, July 2022) and Alexander Zaitchik (Biomedicine, July 2023) and Yaneer Bar-Yam and Nassim Nicholas Taleb (Cochrane Reviews of COVID-19 physical interventions, November 2023), so there is no reason to belabor the point that RCTs have taken modern biomedical science straight into the scientific cul de sac that is biomedicine [replication of clinical studies and trials has been a major focus of the Replication Crisis].  They are practically and philosophically the wrong path to understanding the dappled world in which we live, which is not the linear, determined, mechanical world specified by physics or scientific approaches based on physics envy [and statistics envy].

Which is not to say the proper use of statistics is unessential.  But it is not sufficient, either.  Neither falsity nor truth can be determined by statistical legerdemain, especially the conventional, frequentist statistics derived from the work of Francis Galton, Karl Pearson, and R.A. Fisher.  We live in a very large Bayesian world in which priors of all kinds are more determinative than genetics, sample size, or statistical power.  Small samples are often successful when dealing with large world questions such as ultra-processed foods, while large sample sizes can lead to positive results when the subject is utter nonsense such as homeopathic medicine, as shown in a recent analysis by Ioannidis and coworkers (2023), summarized here:

Objectives: A “null field” is a scientific field where there is nothing to discover and where observed associations are thus expected to simply reflect the magnitude of bias. We aimed to characterize a null field using a known example, homeopathy (a pseudoscientific medical approach based on using highly diluted substances), as a prototype.

Study design and setting: We identified 50 randomized placebo-controlled trials of homeopathy interventions from highly cited meta-analyses. The primary outcome variable was the observed effect size in the studies. Variables related to study quality or impact were also extracted.

Conclusion: A null field like homeopathy can exhibit large effect sizes, high rates of favorable results, and high citation impact in the published scientific literature. Null fields may represent a useful negative control for the scientific process.

True as the opposite of false is a matter for philosophy, not science.

Finally, the Replication Crisis™ has often been conflated with scientific fraud, especially in accounts of misbehaving scientists.  This is as it should be regarding scientists who lie, cheat, and steal in their research.  But perceived non-replication and fraud are not the same thing, as Ioannidis notes with the inclusion of bias as a confounding factor leading to “false” research findings.  Making “stuff” up is the very definition of High Bias.  In my view, it seems obvious that the title of the founding paper of the Replication Crisis™ was meant to be inflammatory.  It was and remains the ur-text of the apparent crisis.  I will also note that seventeen years after Why Most Published Research Findings are False was published, an equation in the paper was corrected.

Dishonest science practiced by dishonest scientists is a pressing problem that must be stamped out, but that will require reorganization of how scientific research is conducted and funded.  Still, all scientific papers have a typo or three.  One of ours was published without removal of an archaic term that we used as a temporary, alas now permanent, placeholder.  But the long-delayed correction of one of Ioannidis’s earliest of ~1300 publications and most cited (>2000) since 1994 (71 in 2023 and already 24 in 2024) could well mean that the paper has been used primarily as the cudgel it was taken to be by others rather than as serious criticism of the practice of science?  If the correction took so long, how many people actually read the paper in detail?

[1] Ernest Rutherford (Nobel Prize Chemistry, 1908) to Max Planck (Nobel Prize in Physics, 1918), according to lore: “If your experiment needs statistics, you ought to have done a better experiment.”  True enough, but not in the world of the quantum or in most properly designed and executed clinical studies and trials.  We do not sense our existence in a quantum world.  Newtonian physics works well in the physical world of objects at the level of whole atoms/molecules and above (Born-Oppenheimer Approximation; yes, that Oppenheimer).  In the world of biology and medicine, the key is dose-response.  If this does not emerge strongly from the research, as it did in the recognition of the link between smoking and lung cancer (the fifth criterion) long before any molecular mechanism of cancer was identified, a new hypothesis should be developed forthwith.

This entry was posted in Dubious statistics, Guest Post, Science and the scientific method on by Yves Smith.