Sunday, 25 October 2015

20140709-jan2cropped-300x225Jan van den Heuvel
considers the vital role of discipline-specific repositories in the
research process. The arXiv came into existence because it provided a
solution to a very practical problem, namely publication time-lags.
Recent developments like overlay journals suggest these platforms could
play a bigger role in the publishing process, but as long as recruitment
and promotion panels attach value to papers published in specific
journals only, their role will be limited.

When a researcher in most areas of Physics, Mathematics or Computer
Science (and increasingly also Statistics, Quantitative Finance and
Quantitative Biology) is looking for recent publications in their field,
one of the first places they will look is the arXiv.
(Pronounced “archive”, with the “X” standing in for the Greek letter
chi.) The arXiv was started in 1991 as a simple central repository of
electronic preprints in physics, based on servers at the Los Alamos
National Laboratory. Soon it expended its scope to other areas. In 1999
it moved to Cornell University Library, which is still its main base.
The statistics page of the arXiv
gives a good indication of its size and activity: over 1 million
submissions since its start; currently between 8,000 and 9,000 new
submissions per month and around 10 millions downloads per month.

So why has the arXiv become so important for researchers in these
particular fields? Why is it that it is now more or less standard that
any active researcher in these areas will deposit a close to final
version of their publications in the archive? Part of it can be
explained by the increasing prominence of Open Access and related
developments in academic publishing. But that can only explain a small
part of the success of the arXiv. The main reason of its success, in my
opinion, is a specific feature of these research areas: the very long
lead time between submission and publication in a journal of papers in
those fields, and hence the historic prominence of “preprints” and
“reports”. I will describe some of that background below, specifically
for Mathematics (my field), but similar factors play a role in the other
subjects covered by the arXiv as well.

mathsImage credit: Wallpoper (Public Domain)
In Mathematics, a period of one year between submission and
publication is quite common, while periods of 3-4 years are nothing
exceptional. A major reason for those long lead times is the thorough
refereeing that is expected. Most papers in Mathematics consist for a
large part of one or more detailed proofs of the main result(s). These
proofs can vary in length from a few paragraphs to several hundred pages
(although anything over roughly 30 pages is considered long). And it is
one of the main duties of the referees to convince themselves of the
correctness of those proof(s); a process that involves carefully going
through the arguments, checking if the logic is correct, checking if old
results are used correctly, etc. Thoroughly checking one page of a
proof can easily take more than a day. This means that the refereeing
process usually takes at least several months, or even years if the
referees need to find the time to do a proper job. And if errors are
found, the author(s) might be asked to try to correct them, and a 2nd or
3rd version op the paper may need considerable amount of time to be
scrutinised again. Added to the lengthy refereeing process in the past
was the specialised typesetting that was required for mathematical

Because of the long time between submission and publication, the
existence of “preprints” or “reports” was standard in the mathematical
community. As soon as a version of a new paper was submitted to a
journal, the author(s) would make a number of hard-copies of it, often
in the form of a report in a “Reports Series” based in any respectable
Mathematics department. (The one at the LSE was called CDAM Research Report Series;
although still accessible online, it stopped accepting new material in
2009.) When you would go to a conference or gave a seminar, you would
bring a couple of those preprints. And after the presentation,
interested members of the audience would come forward and ask “do you
have a preprint of this?”. Note that these preprints were different from
the “working papers” that exist in some other fields. Where a “working
paper” is a publication that is still in development, a preprint or
report would be a (hopefully) close to final version, more or less
identical to the manuscript that was already submitted to a journal.

Once the World Wide Web became more prominent, those preprints went
online, usually via personal homepages of the author(s). At the same
time, institutional preprint series were going online. And once the
advantages of having a central repository became clear, most of us
started uploading our work to one of those, and personal homepages and
the surviving preprint series just link to the article on the arXiv.

So the arXiv is not something that came into existence because of the
move towards Open Access. It’s more that it was the solution to a
practical problem: “if it will take several years before my paper will
be published, how do I tell the world about my brilliant work in the
meantime?”. Of course, the arXiv is now seen as a prime example of Open
Access: it is completely free to search and download all publications.
It allows uploading new versions of a paper, while at the same time
keeping previous versions accessible.

On the other hand, in its present form the arXiv is not in a position
to replace traditional journals. The main reason for that is the lack
of refereeing. There is a group of moderators who can reject
publications that are not scientific or recategorise off-topic
submissions. But in general any paper can be a brilliant proof of a
long-standing conjecture, a piece of high-school Mathematics, or
something that upon serious reading is clearly wrong. As long as
academic recruitment panels and promotion committees attach value to
papers published in specific journals only, repositories such as the
arXiv can have a limited role in the whole publication process.

An interesting new development is the appearance of “overlay
journals”. These are journals that have an independent (online)
presence, but who use a central repository to host the papers appearing
in them. In other words, the journal will have editors, an editorial
board, a review process, etc., but in the end the list of papers in it
will just be a list of links to the relevant papers in some repository.
Although these overlay journals have existed for a while, they became a
lot better known when Timothy Gowers announced on his blog
that he and a number of extremely eminent collaborators would start an
arXiv overlay journal in their specialism. Gowers became quite
well-known because of his activities and called for a boycott of the
traditional commercial scientific publishers, in particular Elsevier.
(See here, here and here
for more on that.) So anything he does regarding Open Access and the
use of open repositories immediately makes people sit up and pay

So could we see a more prominent role of completely open repositories
such as the arXiv in the scientific publication process? Maybe. But two
main obstacles remain, from my point of view. How do you set up a
review process that makes it possible to recognise (top-)quality among
the publications in the repositories? And how do you overcome the
inbuilt conservatism in academic recruitment panels and promotion
committees to look firstly and  mainly at publications at journals they
recognise? As long as those hurdles are not removed, commercial
publishers won’t have to worry too much, unfortunately.

This piece originally appeared on the LSE Library Blog and is reposted under CC BY 4.0

Note: This article gives the views of the author, and not the
position of the Impact of Social Science blog, nor of the London School
of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.

About the Author

Professor Jan van den Heuvel teaches and researches in the Department of Mathematics at LSE. He can also be found on Twitter @JanvadeHe.

