Research Tools: The Open Citation Index

Tuesday, 1 March 2016

The Open Citation Index – ScienceOpen Blog

Source: http://blog.scienceopen.com/2016/02/the-open-citation-index/

The Open Citation Index

February 29, 2016Author: Jon Tennant

Eugene Garfield, one of the founders of biliometrics and scientometrics, once claimed that “Citation
indexes resolve semantic problems associated with traditional subject
indexes by using citation symbology rather than words to describe the
content of a document.” This statement led to the advent and a new
dawn of Web-based measurements of citations, implemented as a way to
describe the academic re-use of research.

However, Garfield had only reached a
partial solution to a problem about measuring re-use, as one of the
major problems with citation counts is that they are primarily
contextless: they don’t tell us anything about why research is being re-used. Nonetheless, citation counts are now at the very heart of academic systems for two main reasons:

They are fundamental for grant, hiring and tenure decisions.
They form the core of how we currently assess academic impact and prestige.

Working out article-level citation
counts is actually pretty complicated though, and depends on where
you’re sourcing your information from. If you read the last blog post here,
you’ll have seen that search results between Google Scholar, Web of
Science, PubMed, and Scopus all vary to quite some degree. Well, it is
the same for citations too, and it comes down to what’s being indexed by
each. Scopus indexes 12,850 journals, which is the largest documented
number at the moment. PubMed on the other hand has 6000 journals
comprising mostly clinical content, and Web of Science offers broader
coverage with 8700 journals. However, unless you pay for both Web of
Science and Scopus, you won’t be allowed to know who’s re-using work or
how much, and even if you are granted access, both services offer
inconsistent results. Not too useful when these numbers matter for
impact assessment criteria and your career.

struggling-scientist — Cartoonstock is the source of Hagen Cartoons’ *Struggling scientists*.

Google Scholar, however, offers a free
citation indexing service, based, in theory, on all published journals,
and possibly with a whole load of ‘grey literature’. For the majority of
researchers now, Google Scholar is the go-to powerhouse search tool.
Accompanying this power though is a whole web of secrecy: it is unknown
who Google Scholar actually crawls, but you can bet they reach pretty
far given by the amount of self-archived, and often illegally archived,
content they return from searches. So the basis of their citation index
is a bit of mystery and lacking any form of quality control, and
confounded by the fact that it can include citations from
non-peer-reviewed works, which will be an issue for some.

Academic citations represent the
structured genealogy or network of an idea, and the association between
themes or topics. I like to think that citation counts tell us how
imperfect our knowledge is in a certain area, and how much researchers
are working to change that. Researchers quite like citations; we like to
know how many citations we’ve got, and who it is who’s citing and
re-using our work. These two concepts are quite different: re-use can be
reflected by a simple number, which is fine in a closed system. But to
get a deeper context of how research is being re-used and to trace the genealogy of knowledge, you need openness.

At ScienceOpen, we have our own way to
measure citations. We’ve recently implemented it, and are only just
beginning to realise the importance of this metric. We’re calling it the
Open Citation Index, and it represents a new way to measure the retrieval of scientific information.

But what is the Open Citation Index, and
how is it calculated? The core of ScienceOpen is based on a huge corpus
of open access articles drawn primarily from PubMed Central and arXiv.
This forms about 2 million open access records, and each one comes with
its own reference list. What we’ve done using a clever metadata
extraction engine is to take each of these citations and create an
article stub for them. These stubs, or metadata records, form the core
of our citation network. The number of citations derived from this
network are displayed on each article, and each item that cites another
can be openly accessed from within our archive.

citation_network — Visualising citation networks: pretty, but complex. (Source)

So the citation counts are based
exclusively on open access publications, and therefore provide a
pan-publisher, article-level measure of how ‘open’ your idea is. Based
on the way these data are gathered, it also means that every article
record has had at least one citation, and therefore we explicitly
provide a level of cross-publisher content filtering. It is pertinent
that we find ways to measure the effect of open access, and the Open
Citation Index provides one way to do this. For researchers, the Open
Citation Index is about gaining prestige in a system that is gradually,
but inevitably and inexorably, moving towards ‘open’ as the default way
of conducting research.

In the future, we will work with
publishers to combine their content with our archives and enhance the
Open Citation Index, developing a richer, increasingly transparent and
more precise metric of how research is being re-used.