PubMed Central Boosts Citations, Study ClaimsPubMed Central (PMC), a digital archive of biomedical articles,
increases the citation impact of deposited papers, a recent study
The paper, “Examining the Impact of the National Institutes of Health Public Access Policy on the Citation Rates of Journal Articles” was published October 8 in PLOS ONE. Its lead author, Sandra De Groote, is a Professor & Scholarly Communications Librarian at the University of Illinois, Chicago.
De Groote and her colleagues were interested in measuring what
effect, if any, PMC has had on the distribution of NIH-funded papers. As
the NIH Public Access Policy,
requiring mandatory PMC deposit, took effect in 2008, De Groote focused
on two cohorts of papers: those published in 2006, when deposit was
still voluntary, and 2009 after deposit became mandatory. We should note
that just 9% of 2006 NIH-funded papers were deposited in PMC compared
to 72% in 2009. It is not known whether these deposits were made by the
author, the publisher, and in what format (author’s final manuscript vs.
The researchers excluded from their analysis all Open Access journals
as well as journals that make their content freely available after an
embargo period. Their dataset includes a total of 45,716 papers
appearing in 122 biomedical journals.
De Groote reports that while there was no overall difference in the
performances of 2006 papers, 2009 papers deposited in PMC were cited 26%
more frequently over 5 years, a difference of 5.2 citations per paper,
The researchers noted that there were differences in the composition
of the papers in their cohorts, specifically in the journals appearing
in each group as well as the number of authors per paper. While they
tested for these differences, they did not report their adjusted
results. Thankfully the authors deposited their dataset, which allowed me to verify their findings and to provide more nuance.
I appreciate the effort that went into gathering and analyzing this
paper; however, I feel that it suffers from three major weaknesses that
may undermine the validity of its claims:
1. A statistical model that fails to fit the data. The researchers used a statistical model that assumed their citation distribution was normal instead of skewed–a
feature of most citation distributions. As a result, a few highly cited
papers in the PMC group inflated their reported difference. Using a
non-linear (Poisson) model on their dataset, I arrived at an unadjusted
difference of 22% (95% CI 20%–23%), 4 percentage points lower than De
2. An inability to control for confounding effects. Papers published in some journals perform differently than others (for example, compare the papers published in The Lancet
with a specialized nursing journal). If we simply add the journal
information to the regression model, the main effect drops from 22% to
16%. This 6-point drop tells us that that papers appearing in
higher-performing journals are more likely to be found in PMC than the
control (non-PMC) group. Similarly, if I add the number of authors per
paper to the regression model, the PMC effect drops another point to
15%, again suggesting that the a simple comparison of PMC vs. non-PMC
papers is not going to provide the researcher with an independent,
unbiased estimate. Or put differently, their reported PMC effect is
confounded with other citation effects.
3. A dataset contaminated with non-research papers.
While the researchers claimed that their study was limited to research
articles, this is not supported by their dataset. With just a few
minutes of spot-checking, I came up with a letter to the editor, an introduction to a special issue, several perspectives, commentaries, and reviews, a point/counterpoint, and a lecture. Unless
an equal number of these non-research papers are found in each
cohort–which is pretty unlikely–the main results of this study are
likely to be biased.
Personally, I’m surprised that the PLOS editor (a leader in
bibliometric research) didn’t catch any of these problems. The paper is
clearly too methodologically weak to be considered “sound science.”
As a result, it is difficult to interpret the results of this paper.
Did PMC really improve the citation performance of NIH-funded papers, or
is PMC composed of papers that generally perform better? While the
researchers attempted to answer a very difficult question, they
simply weren’t careful enough in the process. At best, De Groote’s
estimates are overinflated; at worse, they’re making a false claim.
Federal policies based on funding sources are likely to
create heterogeneous groups. Add that the NIH deposit requirement was
adhered by less than three-quarters (72%) of 2009 authors, and it would
be safe to assume that the comparison groups are not comparable. I
reported recently on a similar control group problem with the Academia.edu citation boost paper. For retrospective observational research, the researchers should have assumed, a priori, that their groups were unequal and attempted to convince us otherwise, not the reverse.
Where does this leave us? There has been little empirical work on the
effect of publicly accessible repositories, meaning that this paper,
despite its methodological and analytical weaknesses, will have an
outsized effect on steering public policy. And if we consider the
paper’s conclusion, advocacy may be part of the authors’ intention:
This study illustrates a positive impact of the 2008 NIHRather than conclude on a negative note, I want to try something
Public Access Policy. Articles that are openly accessible in PubMed
Central are often cited more than articles published in the
same journals, but not openly accessible. This increase in scholarly
impact is important to researchers and their institutions, to NIH, and
to the public. This is a strong argument for expanding legislation to
other federal agencies making information more accessible.
constructive. If you were to measure the effect of PMC (or the NIH
Public Access Policy) on journal articles, how would you do it? Can the
data, methods, and analysis of this paper be tweaked to get us closer to
the “truth?” If so, what would you suggest?
Fortunately, this blog attracts a number of thoughtful and analytical
minds. Together, perhaps we could get a little closer to answering a
really important question.
PubMed Central Boosts Citations, Study Claims | The Scholarly Kitchen