Wednesday, 29 April 2015

Why are Authors Citing Older Papers?

Caption here.
Cited half-life on the rise.
With so much new literature published each year, why are authors increasingly citing older papers?

Late last year, computer scientists at Google Scholar published a report
describing how authors were citing older papers. The researchers posed
several explanations for the trend that focused on the digitization of
publishing and the marvelous improvements to search and relevance

However, as I wrote in my critique
of their paper, the trend to cite older papers began decades before
Google Scholar, Google, or even the Internet was invented. When you are
in the search business, everything good in this world must be the result
of search.

In order to validate their results, the helpful folks at Thomson Reuters Web of Science sent me a dataset that included the cited half-life for 13,455 unique journal names reported in their Journal Citation Report (the report that discloses journal Impact Factors). Rather than relying on the individual citation as the unit of observation (the approach used by Google Scholar), we base our analysis on the cited half-life of journals. This
approach has the obvious advantage of scale, allowing us to approach
the problem using thousands of journals rather than tens of millions of

In order to approximate a citation-based analysis, each journal was
weighted by the number of papers it published, so that small quarterly
journals don’t have the same weight as mega-journals like PLOS ONE.
Each journal was also classified into one or more subject categories
and measured each year over the 17-year observation period. Our variable
of interest is the cited half-life, which is the median age of
articles cited in a given journal for a given year. By definition, half
of the articles in a journal will be older than the cited half-life;
the other half will be younger. The concept of half-life can also be
applied to article downloads.

For the entire dataset of journals, the mean weighted cited half-life
was 6.5 years, which grew at a rate of 0.13 years per annum. For those
journals that had been indexed continuously in the dataset over the 17
years, the mean weighted cited half-life was 7.1 years, which grew at
the same rate. For the newer journals, the cited half-life was just 5.1
years, but grew at a rate of 0.19 years per annum.

Focusing on the journals for which we have a continuous series of
cited half-life observations, 91% (209 of 229) of subject categories
experienced increasing half-lives. Some of these categories
grew significantly more than average. For example, Developmental Biology
journals grew at 0.25 years per annum, Genetics & Heredity journals
grew at 0.20 years per annum and Cell Biology journals grew at 0.17
years per annum.

Conversely, the cited half-life of 20 (9%) of journal categories decreased
over the observation period. With few exceptions, these fields covered
the general fields of Chemistry and Engineering. For example, the cited
half-life for journals classified under Energy & Fuels declined by
0.11 years per annum, Chemistry-Multidisciplinary declined by 0.07 years
per annum, Engineering-Multidisciplinary by 0.05 years per annum, and
Engineering-Chemical by 0.04 years per annum. Granted, these are smaller
declines, but they do run contrary to overall trends.

Add caption
Figure 1. Cited half-life for 229 journal subject categories.
We also discovered that cited half-life increases with total
citations, meaning, as a journal attracts more citations, a larger
proportion of these citations target older articles. This can be seen in
Figure 2, as journal categories move from the bottom left to the upper
right quadrant of the graph over the observation period.

Caption here.
2. Cited half-life for 229 journal categories observed from 1997–2013.
The size of each bubble represents the number of papers in each journal
category. Trail lines depict the trajectory of each category.
The next figure highlights the trajectory of highly-cited journals
from 1997 to 2013, illustrating how cited half-life increases with the
total citations to a journal. While most highly-cited journals move
toward the upper-right quadrant of the graph, we highlight three
chemistry journals that run contrary to this trend: Journal of the American Chemical Society, Angewandte Chemie-Int Ed., and Chemical Communications. Those
readers wishing to speculate why Chemistry and Engineering journals
were bucking the overall trend are welcome to do so in the comment
section below.

Readers are also welcome to explore the data (for categories and for journals).
The files (.swf) require the Adobe Flash plug-in. Mac users may need to
hold the Control key and selecting one’s browser when opening these
files. Categories may be be split into component journals. Other
controls moderate the size, speed and display of the data.

Caption here
Figure 3. Cited half-life increases with total citations. Trail lines highlight highly-cited journals.
In sum, we were able to validate the claims by the Google Scholar
team that scholars have been citing older materials, with some

The citation behavior of authors reflects cultural, technological,
and normative behaviors, all acting in concert. While digital publishing
and technologies were invented to aid the reader in discovering,
retrieving, and citing the literature, the  trend appears to predate
many of these technologies. Indeed, equal credit may be due to
the photocopier, the fax machine, FTP, and email as is given to Google,
EndNote, or the DOI.

Nevertheless, a growing cited half-life might also reflect major
structural shifts in the way science is funded and the way scientists
are rewarded. A gradual move to fund incremental and applied research
may result in fewer fundamental and theoretical studies being published.
Giving credit to these founders may require authors cite an
increasingly aging literature.

Correction note: Table 1 of the manuscript “Cited Half-Life of the Journal Literature
(arXiv) contains a sorting error. A corrected version (v2) was
submitted and will become live at 8pm (EDT). Thanks to Dr. Jacques
Carette, Dept. of Computing and Software at McMaster University for
spotting this error.

