Sunday, 4 April 2021

Mapping literature to UN SDGs, issues & tools that support it

 Source: https://musingsaboutlibrarianship.blogspot.com/2020/12/mapping-literature-to-un-sdgs-issues.html

UN Sustainability Goals or UN SDGs was first adopted in 2015 , and consists of 17 goals adopted by all United Nations Member States to achieve by 2030.

In particular, in academia, there seems to be increasing interest in mapping literature directed at each of the 17 UN SDGs and filters showing this aspect of research is also starting to appear in various research tools in 2020 such as Scival , Dimensions and Overton.

Beyond mapping literature to the 17 goals, I am amazed to realise that the Times Higher Education (THE) rankings - includes an "Impact ranking"  which claims to be

"the only global performance tables that assess universities against the United Nations’ Sustainable Development Goals (SDGs). We use carefully calibrated indicators to provide comprehensive and balanced comparisons across three broad areas: research, outreach and stewardship."



This ranking is already in it's second edition in 2020 and it includes 768 universities from 85 countries. This sounds a lot except that the much more famous THE World University ranking has over 1,500 universities from 93 countries. (There are for example no Singapore Universities included - like most of the THE rankings, the institution needs to submit data to be ranked), but I have no doubt these numbers will increase.

So how does THE impact rankings measure University performance against the 17 UN SDG goals? The methodology can be found here.

There is a whole bunch of metrics for each of the 17 UN SDGs for example for UN SDG 5 Gender equality , there are metrics like "Proportion of senior female academics (15.4%)", "Proportion of a university’s total research output that is authored by women (10%)" etc.

But most of the 17 UN SDG metrics each devotes 27% to research , which usually involves identifying research papers that are on the UN SDG topic and measuring number of papers, field weighted citation of such papers etc.

This leads to an interesting question how are such papers identified?

Identifying research papers on UN SDG topics

It seems like the data from THE impact rankings is provided by Elsevier/Scopus and they have written several reports on it and also have a SDG Resource Centre

Not to be outdone citation metric provider rivals Clarivate have also produced a report studying the structure of research on UN SDG.

But generally regardless of whether it is from Elsevier or Clarivate, literature mapping to the UN SDG goals are done using long advanced and complicated nested Boolean Operators.

For example you can see the ones used by Elsevier in Scopus here. 

Below shows the search strategy used for  - SDG3: "Ensure healthy lives and promote well-being for all at all ages"



Research tools that support mapping papers to UN SDGs - Scival and Dimensions

It is not suprising that Elsevier's Scival is one of the tools that started support mapping to UN SDGs in June 2020 via the Research Areas facet

Tracking research performance related to Sustainable Development Goals using SciVal



More surprising is that you can even access it if you don't have a Scival subscription  at least the Keyphrase analysis and Topic of prominence clusters, but you will need a institutional account to do more benchmarking at Country or institution levels. 


Free access to UN SDG 4 - Scival Topic cluster analysis



Subscription access to Scival allows you to do more benchmarking at institution level for UN SDGs



Besides Elsevier and Clarivate another citation provider that seems to be on the rise is Digital Science and their flag ship citation service- Dimensions. Indeed Dimensions does provide filters for filtering to literature by UN SDGs as well as a report





This is available in the free version of Dimensions but as you may know the free version does have an issue that it does not allow institution filtering. 


This report goes into detail how this is done, but essentially they use search queries to build up a list of documents as training sets to apply machine learning on. This is necessary because there is a lack of tagged documents to do the machine learning on. 
"Keyword search strings for each of the goals were defined in order to produce training sets based on publications from the Dimensions platform. Key phrases and terminology were based on UN definitions of SDGs, including the target and indicator definitions, and narratives. The aim was to create high-quality training sets with a minimum of false positives... For each of the 17 created training sets, Natural Language Processing and Machine Learning was applied resulting in the classi cation scheme."

 

Overton - impact of research papers on policy papers - mapping to UN SDGs

So far we have seen mapping literature to UN SDGs can be done in two ways, via Boolean search strategy matches and via  Supervised Machine learning, however both still focus on classifying research papers.

A fairly new tool that is relatively unknown - Overton takes a different tack by classifying policy papers into UN SDGs.

But what is Overton? 




Overton is a interesting citation index that tracks citation of research papers by policy documents, parliamentary transcripts, government guidance from think tanks, NGOs, IGOs etc (I shall call them collectively policy papers from here on).

The idea here as stated by Overton is to "help users measure their influence on government policy, both locally and internationally."

In a sense, you are measuring impact of your research papers on policy papers that may show some real world societal impact....


Policy papers that cite research papers affiliated with Singapore Management University



This isn't a full review on Overton, but it looks impressive to me in terms of the number of policy sources it covers (they claim to cover 90+ countries in different languages). For sure it isn't as US/Europe centric as I feared as I browsed the policy sources.

I was pleasantly surprised to see it even include some Singapore think tanks and some Singapore government sites.


Some sources from Singapore in Overton

For the purposes of this blog post, I noticed something interesting about Overton filters.  Firstly Overton has a healthy number of filters available. From the policy tab, you can filter by over 20 filter types but the one relevant here is by UN SDG.



It is important to note that unlike Scopus, Dimensions the mapping to UN SDGs is from the policy papers not the research papers. 

It allows you to say, these are the papers from my institution that are cited by policy papers targetted at the following UN SDGs. 

In the example above, for my institution, you can see my institution's research papers are most cited by policy papers that are mapped to SDG 8: Decent Work and Economic Growth, and then SDG 10: Reduced Inequalities followed by SDG 9: Industry, Innovation and Infrastructure

Given the issues we shall discuss in the next section on mapping research papers to UN SDG, one wonders if such a approach could be easier/more accurate than directly mapping research papers to UN SDG (my gut tells me policy papers are more 'direct').

In any case, I asked Overton how they were classifying the policy papers and got a fascinating answer. 


How sure are we this mapping to UN SDG is accurate?

Of course while such mappings and tools exist, the question before we even start to use and trust them much less rank with them is to wonder how accurate or reliable such mapping are?

An indeed there has been some research on this.

The recent research paper Mapping scholarly publications related to the Sustainable Development Goals: Do independent bibliometric approaches get the same results? gives an excellent overview of the issues.

Similarly the blog post - Consensus and dissensus in ‘mappings’ of science for Sustainable Development Goals (SDGs) - provides another overview on the issues.

In short, mapping publications to the SDG is indeed not straightforward due to difficulties in the  interpretation of the SDG themes.

Besides the usual issues of interpretion in particular,  

"Each of the SDGs has targets (“Outcomes” and “Means of implementation”) and indicators. While the titles of some of the SDGs are relatively broad and open to subjective interpretation (e.g., “Climate action”), the targets and indicators are much more specific about what should be achieved under the goal: They mention specific actions (e.g., “reduce”) and topics (e.g., “hunger,” “resilience,” and “tourism”; note that we use “topic” in a broad sense to also include states, characteristics, or activities)"

This can result in 2 type of search strategies - a broader "topic-approach" and a narrower "action-approach" that 

"attempts to find literature that could directly contribute to achieving the SDG targets, the topic-approach finds literature related to the target concepts generally."

Due to these and other reasons the team at University of Bergen (authors of the above paper), found that when they tried to identify papers using a similar method they got very different results from Elsevier . In most areas, the amount of overlap was generally only 20-30% throwing doubt into whether the mappings are valid.

This is still an on-going area of research with teams at organizations like The STRINGS project working on similar research.

In case you are wondering what about machine learning techniques instead of Boolean matching? 

On paper this has many advantages for example UNSILO has applied machine learning techniques on the OECD content to create a SDG Pathfinder website which won the Inaugural University Press Redux Sustainability Award

But as noted this creates a black box type of situation which may not be ideal, not to mention the issue mentioned by the Dimensions team about needing to build a good training set.


Conclusion

These are only some early possibilities. For example, one early response to my post via linkedin by Pru Mitchell and Jan Ainali mooted the idea of wide scale tagging of articles in Wikidata by topics like Sustainable Development Goal 3 (Q50216838).

Once this is done, various types of visualizations available to Wikidata items becomes an option, chief among them Scholia.


Scholia - Sustainable Development Goal 3 (Q50216838)


Topic mapping of literature is a interesting area that I am starting to take a greater interest in. It helps with researchers who need to orient themselves and for institutions to make sense of the strengths of their institution. 

While librarians are comfortable with boolean searches (evidence maps for example look really interesting to me) to map literature, machine learning techniques are not going away so understand these methods of mapping literature is also going to be increasingly important.

 






 

 

4

View comments

No comments:

Post a Comment