Thursday, 5 September 2019

The rise of new citation indexes and the impact on Science mapping tools - Citespace, VOSviewer , Citation Gecko and more

Source: https://musingsaboutlibrarianship.blogspot.com/2019/09/the-rise-of-new-citation-indexes-and_5.html

The rise of new citation indexes and the impact on Science mapping tools - Citespace, VOSviewer , Citation Gecko and more

Posted:2019-09-05 03:10:45 UTC+08:00
For a long time, there were just two main citation sources that spanned most disciplines, one provided by Web of Science and the other Scopus.

As I noted in past posts (here, here), this has changed in the last 2 years, new citation sources both  proprietary such as Dimensions, and open such as OpenCitations Corpus have started to emerge.

At the same time, I have recently became interested in the potential of Science or bibliometric tools for aiding phd students who want to do more sophisticated literature review. 

As such, I've started to notice that the science mapping tools that are still in active development e.g. VOSViewer , Citespace  or newer tools such as Citation Gecko have started to respond to the trend of new emerging citation indexes and started providing support of these new sources on top of the traditional Web of Science and Scopus only.

Similarly the popular tool Publish or Perish  by Anne-Wil Harzing which began by supporting extraction of results and citations from Google Scholar has now also grown now to support other citation indexes beyond Scopus or Web of Science.

In this blog post, I will talk about some of the new indexes, Science mapping tools are starting to support, and as an aside provide a brief overview of what such tools can do and my first thoughts on them.

As it stands, it seems Microsoft Academic graph (due to it's size), Crossref (due to its openness) and Dimensions (Digital Science backed) are starting to be sources used by such tools. 

A warning, I'm still trying to figure out such tools, so chances are my understanding is incomplete!

Rise of open citations and emergance of new citation indexes

Creating a new citation index isn't easy. You not only need metadata such as article , title , abstract but you need the references in each item to construct the citation index. Let me detail some of the sources that have started to emerge that can be used to contruct citation indexes.

In the past 2 years, one of the major sources of such data has resulted from the efforts of  I4OC - the intitative for Open Citations  to get publishers to make such data open in Crossref.

The open citations in Crossref are made up of metadata received from publishers and made open for all.  As I write this, roughly 60% of articles deposited by all publishers in Crossref has open citations.

Various organizations and tools have leveraged on this open citations including Dimensions, the literature finding tool Citation Gecko but notably it has been used by the OpenCitations organization to produce COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations.

This supplements their existing OpenCitations Corpus (OCC)  which works by parsing the XML source of Pubmed Central OA sources as well as the CROCI, the Crowdsourced Open Citations Index.

Indeed, an alternative way of obtaining data and citations is to crawl webpages and PDF and parse out the details.

As we all know perhaps the biggest source of data and citations has always been "available" via Google Scholar, which  obtains such data by crawling , harvest and parse the data from webpages but unfortunately with the lack of an API this data is unaccessible for bulk use.

But their tech rivals Microsoft has gone beyond this. Not only have they re-launched Microsoft Academic their alternative to Google Scholar but also released the citation data (no doubt obtained using similar methods as Google Scholar) dubbed Microsoft Academic Graph (MAG) as open data.    As I noted in my past review of this set of data, studies have indicated that the size of the data is truly massive (currently it is at 200 million items) and is probably 2nd largest index available next to Google Scholar

Beyond Crossref and MAG, other smaller sources of article metadata and citations exist and are available such as via Semantic Scholar, Wikidata/Wikicite, EuroPMC, and more.

With so many citation sources available and open (depending on your definition), citation indexes have started to emerge that blend such sources , some examples include the already mentioned Dimensions (Crossref plus additional closed sources), Lens.org (Crossref, Microsoft Academic, Pubmed), Scinapse (Microsoft Academic Graph , Semantic Scholar, Springer Nature SciGraph, PubMed),  Scilit, ScienceOpen etc.

What follows will be a mini review of tools like VOSViewerCitespace  and show how they have started to incorporate these new sources or indexes. Other Science mapping or Scientometric analysis mapping tools exist  (see also this topic page and this) but I haven't had the opportunity to study them closely. 

VOSViewer - almost as easy as creating a word cloud

There are two challenges to using bibliometric mapping tools. Firstly, you need to have some basic understanding of the concepts. What's bibliometric coupling network vs Cocitation network? What does centrality mean? What does distances between clusters mean (if anything).

The other challenge is usability, most of the science mapping tools are shall we say not designed for ease of use in mind. My guess is most of them are designed by the authors for either own use primarily or for fellow bibliometricians and as such as extremely powerful to use, but with correspondingly higher entry barriers.

VOSViewer a Science mapping tools by  Nees Jan van Eck and Ludo Waltman from CWTS is probably the only exception to the rule. 

While it cannot solve the earlier problem, it does do well in trying to make things user friendly.

Rather then being open ended leaving the user to decide what buttons to press, at the start VOSViewer guides you by prompting the options at each step. It also seems to provide a lot of sensible defaults and I often could generate something reasonable looking.

For a librarian like me with a basic (not expert) amount of bibliometric knowledge (and a bit of googling + manual reading helps when trying to understand terms), this is often sufficient to quickly generate a bibliometric map  in 10 minutes flat (assuming you already have the required files exported from citation indexes).

Admittedly the more I played with these tools, trying to understand why and how some types of networks could or could not be created based on the source file used, my understanding of such networks & how they were created improved.

In particular, the newer citation index Dimensions makes it almost seamless to use in Vosviewer (free version is limited to 2,500 results). 


Exporting Dimension data for VOSviewer


Ways to use science mapping tools

Tools like VOSviewer generally provide the main mapping features. Firstly one can feed it a bunch of text and make it create a term co-occurance map. A typical use is to feed it articles, and it will create a map of terms phrases with words that tend to occur together appearing close to one another (or are linked strongly to each other).

While that can bring you some insights, the use cases I tend to look at is to look at maps based on citation relationships which limits the number of possible sources ie sources that include references and not just title/author/abstract type metadata.

So say you want to generate a map using bibliometric coupling or cocitation of papers to look for interesting patterns. The easiest way to do so is to search a citation index with keywords, export all the results found (including references) into VOSviewer for visualization.

Export records in Scopus - notice "include references" is checked


Another traditional use of tools like VOSviewer is to put out all papers by one institution and study patterns within.

While they have always provided support to the main sources including Web of Science , Scopus and Pubmed, since Jan 2019, they support the following sources as well
  • Dimensions
  • Crossref
  • Europe PMC
  • Semantic Scholar
  • OCC (Open Citation Corpus) & COCI (OpenCitations Index of Crossref open DOI-to-DOI)
  • Wikidata
  • Microsoft academic (new as of Sept 2019!)

This is an amazing array of citation sources!


Co-authorship map of my institution Singapore Management University using Microsoft Academic data


Co-occurance map of my institution Singapore Management University using Microsoft Academic data and Field of Study as unit


That said, besides Dimensions and Microsoft Academic (which are potentially larger than Scopus and Web of Science in terms of items covered if not necessarily citation density), it's unclear to me how useful the other sources are useful because they are usually a lot smaller.

I did find one exception. I was doing visualization of all the papers authored by researchers from my institution using Scopus. As my institution has a heavy CS/IS component, when I used Semantic Scholar I found a richer map generated than Scopus or Web of Science as Semantic Scholar is strong on CS papers.


Quick summary of VOSviewer

VOSviewer is one among dozens of science mapping tools, however it is one of the easiest to use.

Sensible defaults, a guided flow makes it such that one can generate a nice looking map in a matter of minutes (less time downloading the files for visualization). 

For instance the map before was generated from data from Dimensions and visualized by VOSviewer in less than 10 minutes. As this is a bibliometric coupled map, each node is a paper from Dimensions and nodes that are near to each other are more strongly related. By default size of node represents total number of citations.

VOSViewer also automatically attempts to detect clusters of papers, which each color representing a cluster. In theory papers in each cluster is more related to each other than to papers not in the cluster.





      Bibliometric coupling of Dimensions (title and abstract) data exported with  Board independence

You can also show an alternative view with each color representing another metric. In most cases, you would use the color as a heatmap to represent year of publication. In some cases you may be able to see certain clusters tend to be newer or older.





In the example above you can see the right most papers are mostly newer papers and these happen to be the purple cluster of papers from the earlier chart.

One of the nicer things about VOSviewer is that whether you are using Scopus,WOS, Dimensions, Microsoft Academic in most cases clicking on the nodes will bring you to a appropriate paper to learn more about the paper e.g. Publisher article page, Scopus record etc.

This is important because when you are new to the area all the node labels - Author (year) means nothing to you and you often find yourself clicking a few nodes in a cluster before you can figure out what each cluster kinda represents.


Citespace - a powerful yet difficult to use tool

On the other end of the spectrum we have CiteSpace by Chaomei Chen of Drexel University. 

I must admit when I first tried it , I couldn't even get it to run! But I eventually was drawn back to trying it when I read about the auto-labelling feature of Clusters.

Like VOSviewer, Citespace is able to autocluster papers, authors etc. But one of the most useful tricks in Citespace is it is able to autolabel clusters using title/keyword/abstract (depending on the source of data you use and the type of map you generated), so it gives you a hint of what each cluster is about even if you don't recognise a single paper.

For instance, the graph below shows data generated by doing a basic search in Scopus using the query team creativity and exporting the full record in RIS. After some work converting the data to a suitable format, I chose to visualize the data using co-citations, aka a document co-citation network and then cluster.

The steps to do so can be a bit tricky, but there's a Youtube video here that walks you through each step.

I've hidden the labels for nodes and reduced the node to smaller sizes so the cluster labels come up more clearly.


Document Cocitation network visualized with CiteSpace from Scopus data (Team Creativity)



As noted in the video, you can click on the buttons T, K, A to switch to use title, keyword or abstract respectively to generate the cluser labels. You may also change the algo used to generated cluster labels by clicking on the buttons, LSI, LLR, MI etc.


This doesn't always work as some data sources may not have abstract etc.


In particular, I am generating a document cocitation map here. This means each node represents a cited reference and not the indexed item from Scopus. By default, the labels are generated using the citing items (aka the indexed items) but by pressing on the Green CR button, you can change to using the cited reference data itself.

You will find the CR button option doesn't work if you are using WOS data because for some reason WOS cited references do not contain article titles (try exporting and opening a file to see!).

The interesting side effect of playing around with these tools is my understanding of what fields are or are not exported from citation databases like WoS, Scopus, Dimensions improved!

Citespace does a ton more including having the ability to detect "citation bursts" (time windows where some articles suddenly got a lot of cites), show evolution of cluster labels over time (Cltr-Y) etc.


Generating citation bursts list - you will need to click on the citation/frequency button first)

Top 5 articles with Citation bursts 



How cluster labels change over time where I selected it to show 2 labels per year. 


Citespace also has various views - a lot relating to time. I skipped over this but a unique point about Citespace is that it allows you to segment your analysis in time slices. Typically one year would be one slice but you can change this.

Like Vosviewer you can of course use color to represent time slices for example in the image below you can see some clusters are more yellow which are newer papers.




But you can go beyond that and "Step forward" in time using the time slider for each slice

As you go forward or backwards in the time you can see the linkages appearing. The above image is set to the time slice 2008-2008 and you can see the links appear on the top part of the graph.

Incidently, the image above also has the tree ring option turned on as you can see some nodes are surrounded by rings of different colors , each representing years they were cited and thickeness representing the amount.

But perhaps a clear view is if you switch to a timeline view (two button from right).  I've also hiddent the node labels just showing cluster labels.



Timeline view is nice if you want to see more clearly the interactions between each identified cluster. Coupled with the time slice feature, you can see for example in 2018, which papers and clusters they belong to were cocited.

This is really just scratching the surface and like VOSviewer, Citespace seems to be regularly updated.

In terms of the data sources supported for citation based visualizations, Citespace is pretty impressive supports quite a few formats (though some of them might just be term co-occurance maps rather than citation based?).



The ones I have tested includes WOS, Scopus,Dimensions and most recently Lens.org (see instructions).



Visualization using Lens as a source in Citespace

Citespace is really powerful but in terms of learning curve it is way higher than VOSviewer. Part of it is because of the many options, but a big part of it is it simply isn't as user friendly. For instance to work with Scopus you need to go through a conversion process, while it works natively in VOSviewer.

Clicking on nodes in Citespace to get to papers or cited references often doesn't work. Or in some cases it does work but you won't realise it because it silently opened up a tab in the control panel window (but doesnt switch to it). This seems to occur quite a bit for various functions.

Citation Gecko

I mentioned and reviewed this in the past so I won't belabour this. But this is probably by far the easiest to use even comparing with vosviewer.

You can search for papers to add as "seed papers" or upload them via Zotero libraries and it will try to create relationships between the papers you entered.

The key here is that Citation Gecko is able to leverage citation links between seed papers (in yellow below) and other papers to help highlight possible papers of interest.

For instance it could reveal papers that were cited frequently by your seed papers (co-citations) or conversely papers that cited your seed papers a lot (bibliometric coupling) all with a single click.


Citation Gecko


The main difference between Citation Gecko and the other two is you can explore and expand your citation graph by adding nodes that are found (add as seed paper), while Vosviewer and Citespaceby and large are fixed maps based on what you exported from citation indexes. (though I see a SQL database option in Citespace).

Citation Gecko draws from Crossref and used to draw from Microsoft Academic.

Publish or Perish software

This isn't quite comparible with the other tools since it doesn't do visualizations, but I will include it here because like the other tools it has been expanding the sources it supports. As of version 7, it supports quite a few sources.

Besides Google Scholar, Scopus and Web of Science, it also supports Crossref and Microsoft Academic and support for Dimensions might be on the way.


Sources you can use in Publish or Perish

Impact of new sources on tools

It's difficult for new citation sources to get accepted compared to the traditional big three - Scopus, Web of Science and Scopus. 

These indexes have the advantage of being widely used, analysed and over the years have been refined and cleaned up from feedback of thousands of users including librarians pointing out errors. It is thus difficult for new sources to compete as they are not as trusted.

In a way such tools benefit from network effects , the more users use it and provide feedback the better it becomes.

Digital Science backed Dimensions is making a go at it, by partnering with institutions and making the data open for research by bonafide researchers to increase confidence in the data.

Lastly, as more tools particularly well known ones like Publish or Perish, VOSviewer start supporting these tools, this further increase the user-base and corresponding feedback from users

As I write this bibliometrics researchers are having a field day comparing these new sources against the usual big 3, but as most of these new sources are open (with exception of Dimensions), it leads to a situation where they have to decide how to deal with "blended indexes" like Lens.org that consist of more than one open source.
Is there any point in comparing say Lens.org (Microsoft Academic+Crossref+others) with Scopus/WOS when you already have compared Microsoft Academic? After all there is reason to suspect Microsoft Academic dominates the results in Lens.org.

For example @CameronNevlon published a paper comparing Scopus, Web of Science and Microsoft academic and someone asked him about including Lens.org. His response?

Similarly should tool makers like VOSviewer, CiteSpace borther to support file formats from Lens.org if they already support it's components sources like Crossref and Microsoft Academic? 

As it stands, Microsoft Academic, Crossref and Dimensions seems to be the "must support" sources these tools are trying to support, though things might change. 






No comments:

Post a Comment