President's Report: Claiming more than we know

by Paul R. Amato, Ph.D., NCFR president
NCFR Report
Content Area
Research

A decade ago, John Ioannidis, a bio-statistician, published an article in PLOS Medicine titled, "Why most published research findings are false." In this article, he argued that the majority of findings reported in medical journals (up to 90% in some subfields) are wrong, and that many widely accepted conclusions reflect nothing more than the prevailing biases of researchers.

Although it would be easy to dismiss these claims as the work of a crank, Ioannidis is no lightweight. He holds the Rehnborg Chair in Disease Prevention and is a Professor of Medicine and Health Research and Policy at Stanford University, the director of Stanford's Prevention Research Center, and the co-director of Stanford's Meta-Research Innovation Center. When someone with these credentials makes strong claims, people pay attention, and his 2005 paper became the most heavily downloaded article in the journal's history. (I've included the reference below for interested readers.)

Ioannidis presented a set of equations to demonstrate that false positives (statistically significant results when the null hypothesis is true) are more common than most researchers realize. He argued that false positives are especially likely if (a) the sample size is small, (b) the effect size in the population is modest, (c) multiple hypotheses are tested in the same study, (d) alternative ways of conceptualizing and analyzing the data are possible, (e) financial, personal, or ideological incentives to find a particular result exist, and (f) the topic is popular among researchers. Because these conditions are common in many fields, and because journals tend to publish only statistically significant findings, false positives accumulate in the research literature. As a result, many innovative studies that appear to be groundbreaking at the time of publication turn out to be worthless in the long run.

Ioannidis's analysis has some disturbing implications for family science. If medical research, which relies largely on experimental trials with random assignment (the gold standard for inferring cause and effect relationships), is frequently wrong, then what about family research? Most of the topics we study cannot be studied experimentally, so cause and effect relationships are difficult to establish. Moreover, most family research occurs under the same conditions that (according to Ioannidis) produce an abundance of false positives.

A strong motivation to obtain statistically significant findings exists among family scholars, just as it does among medical researchers. Researchers want to publish their work in peer-reviewed journals to obtain tenure and promotion; publication also is the established path to becoming a respected and influential scholar. But because publication requires statistical significance, researchers have a strong incentive to find p < .05. Many researchers also have strong personal or ideological investments in particular hypotheses, especially in the social sciences, where political beliefs can affect the choice of research topics. Although outright fraud is rare in our field, "data dredging" or "p hacking" is common — testing one hypothesis after another until a coefficient is statistically significant and therefore publishable. When reading journal articles, however, we rarely know how many hypotheses were tested because this information is not reported.

Although my comments are focused on quantitative research, similar concerns about false positives are relevant to qualitative studies. These considerations suggest that many of the currently accepted "facts" of family science are wrong. We can be reasonably confident about some things, of course, because some hypotheses have been replicated so often that they are almost certainly true. For example, studies have repeatedly shown that poverty is linked with many problematic dimensions of family life and child development. And it seems pretty clear that exposure to conflict and violence are not good for children (or adults). But despite some well-replicated and reasonably certain findings, most of our accumulated knowledge about families is less certain than we would like.

Although we can rarely be 100% confident about research conclusions, there are steps we can take to increase our confidence in what we know. First, researchers can report the results of alternative specifications in their research articles. That is, researchers can describe how their results change when using different statistical models, methods of measuring variables, and samples or subsamples. Alternative specifications allow readers to understand how robust (or ephemeral) particular findings are. Second, we can relax the requirement of statistical significance for publication, especially for papers describing studies with strong theoretical foundations, exemplary methodologies, and a high level of statistical power. Third, we can rely more heavily on meta-analytic reviews before reaching firm conclusions about particular phenomena or hypotheses.

The most important step we could take, however, would be to increase the number of published replication studies. Unfortunately, replication is not popular. Graduate students are steered away from it because replications are not seen as original contributions to knowledge and have low status. Most manuscript reviewers do not like replication studies ("I don't see much that is new here"), and journal editors are reluctant to publish them. It also is difficult to obtain funding for replication research. Big funding agencies like the National Institutes of Health value innovation rather than replication — even though innovative studies tend to generate more false positives.

Despite these challenges, scientists are beginning to re-evaluate the importance of replication. A good example is the Many Labs Replication (Klein, et al., 2014) — a 36-site, 12-country effort to replicate 13 claiming classical findings from the field of psychology. Interestingly, this program of research successfully replicated 10 of the original 13 studies (77%). Some observers will be reassured by the fact that the majority of effects were confirmed, whereas others will be concerned about the one-fourth of replications that failed. What is especially interesting to me, however, is not the number of successful replications but the fact that so many behavioral researchers took replication seriously. (Note that efforts by some labs to replicate published drug studies have yielded even more troubling results, with the large majority of replications resulting in failure.)

Concern about the verification of research findings has been growing in recent years, and journals in a variety of fields have initiated procedures to facilitate replication. For example, Nature now has an 18-point checklist for authors to ensure that research results can be reproduced by other scholars. Science's guidelines for publication require authors to submit raw data and computer codes for independent verification of results. Perspectives on Psychological Science publishes an entire section devoted to repli-cation research. And a biomedical organiza-tion, the Science Exchange, in conjunction with the journal PLOS ONE, is helping researchers to obtain independent verification of their results prior to publication.

The lack of attention to replication in family science places practitioners in an awkward situation. Although responsible practitioners base their work on research, it is not always clear how much confidence they can place in particular findings. Note that textbooks are not necessarily good sources for identifying well-substantiated findings. Publishers usually encourage textbook authors (especially in revisions) to include references to as many recent studies as possible. Although citing recent publications makes textbooks up to date, it also guarantees that they are brimming with false positives. Unfortunately, we have few guidelines to help family practitioners determine how much confidence to place in large portions of the research literature.

A potentially useful approach would be to identify some of the key research findings that inform family practice and arrange for independent teams of researchers to conduct replications using diverse samples, measures, and analytic methods. Only results that stubbornly refuse to go away would be eligible for the gold seal of research confidence. (Another useful outcome of this process might be discovering that some well-known effects exist only for certain populations.) Adopting this approach would yield a curious outcome: We would know less than we did before, but we would be more certain of what we know. In my book, that would be a good tradeoff.         

References

Ioannidis, J. P. A. (2005). Why most pub­ lished research findings are false. PLOS Medicine, e124.

Klein, R. A., et al. (2014). Investigating variation in replicability: A "Many Labs" replication project. Social Psychology, 45, 142-152.

Copyright © 2015 National Council on Family Relations (NCFR). Contact NCFR for permission to reprint, reproduce, disseminate, or distribute by any means.