National Academies Launches Study of Research Reproducibility and Replicability

DEC 21, 2017

A new committee assembled to study reproducibility and replicability in the sciences began its work this month. At its kickoff meeting, speakers representing a variety of disciplines offered markedly different portraits of how these issues are viewed across fields, including physics, engineering, and the social sciences.

William Thomas

Spencer R. Weart Director of Research in History, Policy, and Culture

The National Academy of Sciences headquarters in Washington, D.C.

(Image credit – William Thomas / FYI)

On Dec. 12 and 13, the National Academies convened the first meeting of a new study committee on “Reproducibility and Replicability in Science.” The National Science Foundation is sponsoring the study, which is expected to take 18 months, to satisfy a provision in the American Innovation and Competitiveness Act signed into law in January.

Discussion at the kickoff meeting revolved primarily around how the committee should approach such a broad and important topic. In recent years, researchers , journalists , and policymakers have homed in on the reproducibility and replicability (R&R) of research results as a window into the health of the scientific enterprise. Many of them have taken widespread failures to replicate experimental results and to reproduce conclusions from data as a sign that, at least in certain fields, researchers’ experimental and statistical methods have become unreliable. To encourage better practices, those working to address such issues have been pressing to make research more transparent and to reform professional incentives.

Committee has leeway to interpret its charge flexibly

The National Academies has charged the committee to approach R&R issues from a cross-disciplinary perspective. NSF representatives Joan Ferrini-Mundy and Suzi Iacono told the committee it should provide the agency with “actionable” recommendations and might consider how R&R issues bear on the 10 “Big Ideas” NSF is supporting across its directorates. However, they also said the committee should take a flexible view of its task and make its final report a “resource” for other federal agencies and the broader research community.

Harvey Fineberg, the committee chair and president of the Gordon and Betty Moore Foundation, asked how the study should handle clinical research, which has been a central focus in R&R debates but does not fall among the fields that NSF supports. Ferrini-Mundy said that the committee should not restrict itself from addressing such areas.

While the committee will have significant leeway in determining its scope, Fineberg suggested it will be important for the study not to become excessively broad, for instance by expanding it to encompass the topic of scientific rigor as a whole.

Aside from the scope of its charge, the committee also discussed conceptual issues concerning the meaning of R&R and their precise role in ensuring scientific quality. Ned Hall, a committee member and Harvard University philosophy professor, noted that in the 19th century mathematicians had to work to clarify what constitutes a rigorous mathematical proof. He asserted the committee is participating in a similarly fundamental “intellectual project,” which has yet to reach a similar state of conceptual clarity.

Hall said it will be necessary to develop a large “diet of examples” in order to address questions such as how the role of experimental replication varies with the purpose for which an experiment is conducted.

Victoria Stodden, another committee member and professor in the School of Information Sciences at the University of Illinois, emphasized the need to pay special attention to computational and data-driven sciences, where the reproducibility of results from underlying data and computer code remains a pressing problem.

Reproduction, replication haunt engineering as much as social science

Invited speakers from a variety of science and engineering disciplines offered insights into how their fields view R&R issues, revealing stark differences.

Along with clinical research, the social and behavioral sciences have been a major focal point for discussions about R&R. Howard Kurtzman, the acting executive director of the American Psychological Association’s Science Directorate, said that a widely publicized 2015 study of the replicability of psychological research was a “watershed,” marking the start of a broad, sustained discussion in his field. William Jacoby, editor of the American Journal of Political Science, discussed his publication’s new policy of reproducing results from underlying data before advancing accepted articles to publication.

Kate Kirby, CEO of the American Physical Society, reported that she has heard very little discussion of R&R in physics. She also observed that checks on data are built into large collaborations such as those at the Laser Interferometer Gravity-wave Observatory and the Large Hadron Collider. Gerald Gabrielse, a committee member and Northwestern University physicist, suggested that R&R are also expected at the level of “tabletop” physics as a matter of routine. However, another committee member, Lorena Barba, a professor of mechanical and aerospace engineering at George Washington University, said that reproducibility of results is a significant issue in computational fluid dynamics.

Early in the meeting, committee members wondered how they should treat engineering, given that production of reliable designs is a primary goal of the field. David Sholl, an engineering professor at Georgia Tech, said that while in materials chemistry there is only “low-level concern” about R&R issues, meta-analysis has revealed that reported measurements are often unreliable . Responding to a question by Gabrielse, he noted it is not even common practice to estimate errors. Moreover, he said, because many experiments are done for industrial purposes, it is common for experimental replications to remain unpublished.

Phil DiVietro, managing director of publishing at the American Society of Mechanical Engineers, reported that a survey he conducted revealed, to his surprise, there is broad concern about R&R issues in his field. Noting his society is centrally occupied with the rigorous validation of technical standards, he said these issues could represent the unseen part of an “iceberg” mechanical engineers will have to address. Like Sholl, he reported that a great deal of engineering data is private.

Brooks Hanson, senior vice president for publications at the American Geophysical Union, and John Baillieul, a former editor-in-chief of IEEE Transactions on Automatic Control, noted that their fields are becoming increasingly data intensive. As a result, the sharing of data, code, and other “artifacts” is increasingly supported and encouraged by journal publishers. Baillieul said there would also have to be increased cultural acceptance that publications would have multiple versions to accommodate, for instance, updated code and data.

Report will add to larger conversation on research reliability

Because the committee’s work is in its first phases, its kickoff meeting avoided detailed discussion of what shape its ultimate recommendations would take. A number of committee members and speakers, though, alluded to the fact that the committee should take into account prior and parallel efforts.

Earlier this year, the National Academies released a study identifying “detrimental” research practices, some of which, it asserted, erode the quality of research results. The Academies is also currently sponsoring a study on “open science” practices. The Academies, NSF, and professional societies have also sponsored a number of symposiums and workshops on R&R and related subjects. The committee heard directly from members of a parallel study currently being conducted through the Royal Netherlands Academy of Arts and Sciences.

In developing its recommendations, the committee will also have to consider an additional aspect of its charge pertaining to public perceptions of scientific credibility. On this account, several meeting participants said they would like to avoid language portraying R&R issues as a “crisis” and to emphasize the “self-correcting” nature of science.

At the same time, Richard Harris, a science correspondent for National Public Radio and author of “Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions,” encouraged the committee not to “soft pedal” the issues the scientific community faces. He urged that showing a willingness to confront problems in an upfront manner would send the best message about the health of the scientific enterprise.