Source: http://blogs.lse.ac.uk/impactofsocialsciences/2015/12/09/the-researchgate-score-a-good-example-of-a-bad-metric/
to ResearchGate, the academic social networking site, their RG Score is
“a new way to measure your scientific reputation”. With such high aims,
Peter Kraker, Katy Jordan and Elisabeth Lex
take a closer look at the opaque metric. By reverse engineering the
score, they find that a significant weight is linked to ‘impact points’ –
a similar metric to the widely discredited journal impact factor. Transparency
in metrics is the only way scholarly measures can be put into context
and the only way biases – which are inherent in all socially created
metrics – can be uncovered.
Launched in 2008, ResearchGate was one
of the earlier academic social networks on the Web. The platform
revolves around research papers, a question and answering system, and a
job board. Researchers are able to create a profile that showcases their
publication record and their academic expertise. Other users are then
able to follow these profiles and are notified of any updates. In recent
years, ResearchGate has become more aggressive in marketing its
platform via e-mail. In default settings, ResearchGate sends between 4
and 10 emails per week, depending on the activity in your network. The
high number of messages prove to be very successful for ResearchGate:
according to a study by Nature
from 2014, ResearchGate is the most well known social network among
researchers; 35% of surveyed researchers say that they signed up for
ResearchGate “because they received an e-mail”. It may come as no
surprise that this strategy has since been adopted by many of
ResearchGate’s competitors, including Academia.edu and Mendeley.
One of the focal points in
ResearchGate’s e-mails is a researcher’s latest ResearchGate Score (RG
Score). Updated weekly, the RG Score is a single number that is attached
to a researcher’s profile. According to ResearchGate, the score
includes the research outcomes that you share on the platform, your
interactions with other members, and the reputation of your peers (i.e.,
it takes into consideration publications, questions, answers,
followers). The RG Score is displayed on every profile alongside the
basic information about a researcher. ResearchGate has received
substantial financial backing from venture capitalists and Bill Gates,
but it is not clear how the platform will generate revenue; the possibility of the score being linked to financial value warrants further exploration and critical assessment.
the RG Score were rather discouraging: while there are some innovative
ideas in the way ResearchGate approached the measure, we also found that
the RG Score ignores a number of fundamental bibliometric guidelines
and that ResearchGate makes basic mistakes in the way the score is
calculated. We deem these shortcomings to be so problematic that the RG
Score should not be considered as a measure of scientific reputation in
its current form.The measure comes with bold statements: according to the site,
the RG Score is “a new way to measure your scientific reputation”; it
was designed to “help you measure and leverage your standing within the
scientific community”. With such high aims, it seemed to be appropriate
to take a closer look at the RG Score and to evaluate its capability as a
measure of scientific reputation. We based our evaluation on
well-established bibliometric guidelines for research metrics, and an
empirical analysis of the score. The results were presented at a recent
workshop on Analysing and Quantifying Scholarly Communication on the Web
(ASCW’15 – introductory post here) in a position paper and its discussion.
RG Score is that it is in-transparent. ResearchGate does present its
users with a breakdown of the individual parts of the score, i.e.,
publications, questions, answers, followers (also shown as a pie-chart),
and to what extent these parts contribute to your score. Unfortunately,
that is not enough information to reproduce one’s own score. For that
you would need to know the exact measures being used as well as the
algorithm used for calculating the score. These elements are, however,
unknown.
black-box evaluation machine that keeps researchers guessing, which
actions are taken into account when their reputation is measured. This
is exemplified by the many questions
in ResearchGate’s own question and answering system pertaining to the
exact calculation of the RG Score. There is a prevalent view in the
bibliometrics community that transparency and openness are important features of any metric.
One of the principles of the Leiden Manifesto states for example: “Keep
data collection and analytical processes open, transparent and simple”,
and it continues: “Recent commercial entrants should be held to the
same standards; no one should accept a black-box evaluation machine.”
Transparency is the only way measures can be put into context and the
only way biases – which are inherent in all socially created metrics –
can be uncovered. Furthermore, intransparency makes it very hard for
outsiders to detect gaming of the system. In ResearchGate for example,
contributions of others (i.e., questions and answers) can be anonymously
downvoted. Anonymous downvoting has been criticised in the past as it often happens without explanation. Therefore, online networks such as Reddit have started to moderate downvotes.
Further muddying the water, the
algorithm used to calculate the RG Score is changing over time. That in
itself is not necessarily a bad thing. The Leiden Manifesto states that
metrics should be regularly scrutinized and updated, if needed. Also,
ResearchGate does not hide the fact that it modifies its algorithm and
the data sources being considered along the way. The problem with the
way that ResearchGate handles this process is that it is not transparent
and that there is no way to reconstruct it. This makes it impossible to
compare the RG Score over time, further limiting its usefulness.
As an example, we have plotted Peter’s
RG Score from August 2012 to April 2015. Between August 2012, when the
score was introduced, and November 2012 his score fell from an initial
4.76 in August 2012 to 0.02. It then gradually increased to 1.03 in
December 2012 where it stayed until September 2013. It should be noted
that Peter’s behaviour on the platform has been relatively stable over
this timeframe. He has not removed pieces of research from the platform
or unfollowed other researchers. So what happened during that timeframe?
The most plausible explanation is that ResearchGate adjusted the
algorithm – but without any hints as to why and how that has happened,
it leaves the researcher guessing. In the Leiden Manifesto, there is one
firm principle against this practice: “Allow those evaluated to verify
data and analysis”.
There are several pieces of profile information which could potentially
contribute to the score; at the time of the analysis, these included
‘impact points’ (calculated using impact factors of the journals an
individual has published in), ‘downloads’, ‘views’, ‘questions’,
‘answers’, ‘followers’ and ‘following’. Looking at the pie charts of RG
Score breakdowns, academics who have a RG Score on their profile can
therefore be thought of as including several subgroups:
on the first group: we constructed a small sample of academics (30), who
have a RG Score and only a single publication on their profile . This
revealed a strong correlation between impact points (which, for a single
paper academic, is simply the Journal Impact Factor (JIF) of that one
papers’ journal). Interestingly, the correlation is not linear but
logarithmic. Why ResearchGate chooses to transform the ‘impact points’
in this way is not clear. Using the natural log of impact points will
have the effect of diminishing returns for those with the highest impact
points, so it could be speculated that the natural log is used to
encourage less experienced academics.
We then expanded the sample to
include examples from two further groups of academics: 30 academics who
have a RG Score and multiple publications; and a further 30 were added
who have a RG Score, multiple publications, and have posted at least one
question and answer. Multiple regression analysis indicated that RG
Score was significantly predicted by a combination of number of views,
natural logs of impact points, answers posted and number of publications.
Impact points proved to be very relevant; for this exploratory sample
at least, impact points accounted for a large proportion of the
variation in the data (68%).
incorporates the Journal Impact Factor to evaluate individual
researchers. The JIF, however, was not introduced as a measure to
evaluate individuals, but as a measure to guide libraries’ purchasing
decisions of journals. Over the years, it has also been used for
evaluating individual researchers. But there are many good reasons why
this is a bad practice. For one, the distribution of citations within a
journal is highly skewed; one study found
that articles in the most cited half of articles in a journal were
cited 10 times more often than articles in the least cited half. As the
JIF is based on the mean number of citations, a single paper with a high
number of citations can therefore considerably skew the metric.
In addition, the correlation between JIF and individual citations to articles has been steadily decreasing since the 1990s,
meaning that it says less and less about individual papers.
Furthermore, the JIF is only available for journals; therefore it cannot
be used to evaluate fields that favor other forms of communication,
such as computer science (conference papers) or the humanities (books).
But even in disciplines that communicate in journals, there is a high
variation in the average number of citations which is not accounted for
in the JIF. As a result, the JIF is rather problematic when evaluating
journals; when it comes to single contributions it is even more
questionable.
There is a wide consensus among researchers on this issue: the San Francisco Declaration of Research Assessment (DORA)
that discourages the use of the Journal Impact Factor for the
assessment of individual researchers has garnered more than 12,300
signees at the time of writing. It seems puzzling that a score that
claims to be “a new way to measure your scientific reputation” would go
down that way.
ideas in the RG Score: including research outputs other than papers
(e.g. data, slides) is definitely a step into the right direction, and
the idea of considering interactions when thinking about academic
reputation has some merit. However, there is a mismatch between the goal
of the RG Score and use of the site in practice. Evidence suggests that
academics who use ResearchGate tend to view it as an online business card or curriculum vitae,
rather than a site for active interaction with others. Furthermore, the
score misses any activities that takes place outside of ResearchGate;
for example, Twitter is more frequently the site for actively discussing research.
The extensive use of the RG Score in
marketing e-mails suggests that it was meant to be a marketing tool that
drives more traffic to the site. While it may have succeeded in this
department, we found several critical issues with the RG Score, which
need to be addressed before it can be seen as a serious metric.
ResearchGate seems to have reacted to the criticisms surrounding the RG Score. In September, they introduced a new metric named “Reads”.
“Reads”, which is defined as the sum of views and downloads of a
researcher’s work, is now the main focus of their e-mails and the metric
is prominently displayed in a researcher’s profile. At the same time,
ResearchGate has decided to keep the score, albeit in a smaller role. It
is still displayed in every profile and it is also used as an
additional information in many of the site’s features, e.g.
recommendations.
Finally, it should be pointed out that
the RG Score is not the only bad metric out there. With metrics
becoming ubiquitous in research assessment, as evidenced in the recent
HEFCE report “The Metric Tide”,
we are poised to see the formulation of many more. With these
developments in mind, it becomes even more important for us
bibliometrics researchers to inform our stakeholders (such as funding
agencies and university administrators) about the problems with
individual metrics. So if you have any concerns with a certain metric,
don’t hesitate to share it with us, write about it – or even nominate it
for the Bad Metric prize.
Note: This article gives the views of the author, and not the
position of the LSE Impact blog, nor of the London School of Economics.
Please review our Comments Policy if you have any concerns on posting a comment below.
About the Authors
Peter Kraker is a postdoctoral researcher at Know-Center of Graz University of Technology and a 2013/14 Panton Fellow.
His main research interests are visualizations based on scholarly
communication on the web, open science, and altmetrics. Peter is an open
science advocate collaborating with the Open Knowledge Foundation and the Open Access Network Austria.
Katy Jordan is a PhD student based in the Institute
of Educational Technology at The Open University, UK. Her research
interests focus on the intersection between the Internet and Higher
Education. In addition to her doctoral research on academic social
networking sites, she has also published research on Massive Open Online
Courses (MOOCs) and semantic web technologies for education.
Elisabeth Lex is assistant professor at Graz University of Technology and she heads the Social Computing research area at Know-Center
GmbH. In her research, she explores how digital traces humans leave
behind on the Web can be exploited to model and shape the way people
work, learn and interact. At Graz University of Technology, Elisabeth
teaches Web Science as well as Science 2.0.
Impact of Social Sciences – The ResearchGate Score: a good example of a bad metric
The ResearchGate Score: a good example of a bad metric
Accordingto ResearchGate, the academic social networking site, their RG Score is
“a new way to measure your scientific reputation”. With such high aims,
Peter Kraker, Katy Jordan and Elisabeth Lex
take a closer look at the opaque metric. By reverse engineering the
score, they find that a significant weight is linked to ‘impact points’ –
a similar metric to the widely discredited journal impact factor. Transparency
in metrics is the only way scholarly measures can be put into context
and the only way biases – which are inherent in all socially created
metrics – can be uncovered.
Launched in 2008, ResearchGate was one
of the earlier academic social networks on the Web. The platform
revolves around research papers, a question and answering system, and a
job board. Researchers are able to create a profile that showcases their
publication record and their academic expertise. Other users are then
able to follow these profiles and are notified of any updates. In recent
years, ResearchGate has become more aggressive in marketing its
platform via e-mail. In default settings, ResearchGate sends between 4
and 10 emails per week, depending on the activity in your network. The
high number of messages prove to be very successful for ResearchGate:
according to a study by Nature
from 2014, ResearchGate is the most well known social network among
researchers; 35% of surveyed researchers say that they signed up for
ResearchGate “because they received an e-mail”. It may come as no
surprise that this strategy has since been adopted by many of
ResearchGate’s competitors, including Academia.edu and Mendeley.
One of the focal points in
ResearchGate’s e-mails is a researcher’s latest ResearchGate Score (RG
Score). Updated weekly, the RG Score is a single number that is attached
to a researcher’s profile. According to ResearchGate, the score
includes the research outcomes that you share on the platform, your
interactions with other members, and the reputation of your peers (i.e.,
it takes into consideration publications, questions, answers,
followers). The RG Score is displayed on every profile alongside the
basic information about a researcher. ResearchGate has received
substantial financial backing from venture capitalists and Bill Gates,
but it is not clear how the platform will generate revenue; the possibility of the score being linked to financial value warrants further exploration and critical assessment.
Image credit: Tunnel diode amplifier by Circuit-fantasist CC BY 2.0
The results of our our evaluation ofthe RG Score were rather discouraging: while there are some innovative
ideas in the way ResearchGate approached the measure, we also found that
the RG Score ignores a number of fundamental bibliometric guidelines
and that ResearchGate makes basic mistakes in the way the score is
calculated. We deem these shortcomings to be so problematic that the RG
Score should not be considered as a measure of scientific reputation in
its current form.The measure comes with bold statements: according to the site,
the RG Score is “a new way to measure your scientific reputation”; it
was designed to “help you measure and leverage your standing within the
scientific community”. With such high aims, it seemed to be appropriate
to take a closer look at the RG Score and to evaluate its capability as a
measure of scientific reputation. We based our evaluation on
well-established bibliometric guidelines for research metrics, and an
empirical analysis of the score. The results were presented at a recent
workshop on Analysing and Quantifying Scholarly Communication on the Web
(ASCW’15 – introductory post here) in a position paper and its discussion.
Intransparency and irreproducibility over time
One of the most apparent issues of theRG Score is that it is in-transparent. ResearchGate does present its
users with a breakdown of the individual parts of the score, i.e.,
publications, questions, answers, followers (also shown as a pie-chart),
and to what extent these parts contribute to your score. Unfortunately,
that is not enough information to reproduce one’s own score. For that
you would need to know the exact measures being used as well as the
algorithm used for calculating the score. These elements are, however,
unknown.
Image credit: Blackbox public domain
ResearchGate thus creates a sort ofblack-box evaluation machine that keeps researchers guessing, which
actions are taken into account when their reputation is measured. This
is exemplified by the many questions
in ResearchGate’s own question and answering system pertaining to the
exact calculation of the RG Score. There is a prevalent view in the
bibliometrics community that transparency and openness are important features of any metric.
One of the principles of the Leiden Manifesto states for example: “Keep
data collection and analytical processes open, transparent and simple”,
and it continues: “Recent commercial entrants should be held to the
same standards; no one should accept a black-box evaluation machine.”
Transparency is the only way measures can be put into context and the
only way biases – which are inherent in all socially created metrics –
can be uncovered. Furthermore, intransparency makes it very hard for
outsiders to detect gaming of the system. In ResearchGate for example,
contributions of others (i.e., questions and answers) can be anonymously
downvoted. Anonymous downvoting has been criticised in the past as it often happens without explanation. Therefore, online networks such as Reddit have started to moderate downvotes.
Further muddying the water, the
algorithm used to calculate the RG Score is changing over time. That in
itself is not necessarily a bad thing. The Leiden Manifesto states that
metrics should be regularly scrutinized and updated, if needed. Also,
ResearchGate does not hide the fact that it modifies its algorithm and
the data sources being considered along the way. The problem with the
way that ResearchGate handles this process is that it is not transparent
and that there is no way to reconstruct it. This makes it impossible to
compare the RG Score over time, further limiting its usefulness.
As an example, we have plotted Peter’s
RG Score from August 2012 to April 2015. Between August 2012, when the
score was introduced, and November 2012 his score fell from an initial
4.76 in August 2012 to 0.02. It then gradually increased to 1.03 in
December 2012 where it stayed until September 2013. It should be noted
that Peter’s behaviour on the platform has been relatively stable over
this timeframe. He has not removed pieces of research from the platform
or unfollowed other researchers. So what happened during that timeframe?
The most plausible explanation is that ResearchGate adjusted the
algorithm – but without any hints as to why and how that has happened,
it leaves the researcher guessing. In the Leiden Manifesto, there is one
firm principle against this practice: “Allow those evaluated to verify
data and analysis”.
An attempt at reproducing the ResearchGate Score
In order to learn more about the composition of the RG Score, we tried to reverse engineer the score.There are several pieces of profile information which could potentially
contribute to the score; at the time of the analysis, these included
‘impact points’ (calculated using impact factors of the journals an
individual has published in), ‘downloads’, ‘views’, ‘questions’,
‘answers’, ‘followers’ and ‘following’. Looking at the pie charts of RG
Score breakdowns, academics who have a RG Score on their profile can
therefore be thought of as including several subgroups:
- those whose score is based only on their publications;
- scores based on question and answer activity;
- scores based on followers and following;
- and scores based on a combination of any of the three.
on the first group: we constructed a small sample of academics (30), who
have a RG Score and only a single publication on their profile . This
revealed a strong correlation between impact points (which, for a single
paper academic, is simply the Journal Impact Factor (JIF) of that one
papers’ journal). Interestingly, the correlation is not linear but
logarithmic. Why ResearchGate chooses to transform the ‘impact points’
in this way is not clear. Using the natural log of impact points will
have the effect of diminishing returns for those with the highest impact
points, so it could be speculated that the natural log is used to
encourage less experienced academics.
We then expanded the sample to
include examples from two further groups of academics: 30 academics who
have a RG Score and multiple publications; and a further 30 were added
who have a RG Score, multiple publications, and have posted at least one
question and answer. Multiple regression analysis indicated that RG
Score was significantly predicted by a combination of number of views,
natural logs of impact points, answers posted and number of publications.
Impact points proved to be very relevant; for this exploratory sample
at least, impact points accounted for a large proportion of the
variation in the data (68%).
Incorporating the Journal Impact Factor to evaluate individual researchers
Our analysis shows that the RG Scoreincorporates the Journal Impact Factor to evaluate individual
researchers. The JIF, however, was not introduced as a measure to
evaluate individuals, but as a measure to guide libraries’ purchasing
decisions of journals. Over the years, it has also been used for
evaluating individual researchers. But there are many good reasons why
this is a bad practice. For one, the distribution of citations within a
journal is highly skewed; one study found
that articles in the most cited half of articles in a journal were
cited 10 times more often than articles in the least cited half. As the
JIF is based on the mean number of citations, a single paper with a high
number of citations can therefore considerably skew the metric.
In addition, the correlation between JIF and individual citations to articles has been steadily decreasing since the 1990s,
meaning that it says less and less about individual papers.
Furthermore, the JIF is only available for journals; therefore it cannot
be used to evaluate fields that favor other forms of communication,
such as computer science (conference papers) or the humanities (books).
But even in disciplines that communicate in journals, there is a high
variation in the average number of citations which is not accounted for
in the JIF. As a result, the JIF is rather problematic when evaluating
journals; when it comes to single contributions it is even more
questionable.
There is a wide consensus among researchers on this issue: the San Francisco Declaration of Research Assessment (DORA)
that discourages the use of the Journal Impact Factor for the
assessment of individual researchers has garnered more than 12,300
signees at the time of writing. It seems puzzling that a score that
claims to be “a new way to measure your scientific reputation” would go
down that way.
Final Words
There are a number of interestingideas in the RG Score: including research outputs other than papers
(e.g. data, slides) is definitely a step into the right direction, and
the idea of considering interactions when thinking about academic
reputation has some merit. However, there is a mismatch between the goal
of the RG Score and use of the site in practice. Evidence suggests that
academics who use ResearchGate tend to view it as an online business card or curriculum vitae,
rather than a site for active interaction with others. Furthermore, the
score misses any activities that takes place outside of ResearchGate;
for example, Twitter is more frequently the site for actively discussing research.
The extensive use of the RG Score in
marketing e-mails suggests that it was meant to be a marketing tool that
drives more traffic to the site. While it may have succeeded in this
department, we found several critical issues with the RG Score, which
need to be addressed before it can be seen as a serious metric.
ResearchGate seems to have reacted to the criticisms surrounding the RG Score. In September, they introduced a new metric named “Reads”.
“Reads”, which is defined as the sum of views and downloads of a
researcher’s work, is now the main focus of their e-mails and the metric
is prominently displayed in a researcher’s profile. At the same time,
ResearchGate has decided to keep the score, albeit in a smaller role. It
is still displayed in every profile and it is also used as an
additional information in many of the site’s features, e.g.
recommendations.
Finally, it should be pointed out that
the RG Score is not the only bad metric out there. With metrics
becoming ubiquitous in research assessment, as evidenced in the recent
HEFCE report “The Metric Tide”,
we are poised to see the formulation of many more. With these
developments in mind, it becomes even more important for us
bibliometrics researchers to inform our stakeholders (such as funding
agencies and university administrators) about the problems with
individual metrics. So if you have any concerns with a certain metric,
don’t hesitate to share it with us, write about it – or even nominate it
for the Bad Metric prize.
Note: This article gives the views of the author, and not the
position of the LSE Impact blog, nor of the London School of Economics.
Please review our Comments Policy if you have any concerns on posting a comment below.
About the Authors
Peter Kraker is a postdoctoral researcher at Know-Center of Graz University of Technology and a 2013/14 Panton Fellow.
His main research interests are visualizations based on scholarly
communication on the web, open science, and altmetrics. Peter is an open
science advocate collaborating with the Open Knowledge Foundation and the Open Access Network Austria.
Katy Jordan is a PhD student based in the Institute
of Educational Technology at The Open University, UK. Her research
interests focus on the intersection between the Internet and Higher
Education. In addition to her doctoral research on academic social
networking sites, she has also published research on Massive Open Online
Courses (MOOCs) and semantic web technologies for education.
Elisabeth Lex is assistant professor at Graz University of Technology and she heads the Social Computing research area at Know-Center
GmbH. In her research, she explores how digital traces humans leave
behind on the Web can be exploited to model and shape the way people
work, learn and interact. At Graz University of Technology, Elisabeth
teaches Web Science as well as Science 2.0.
Impact of Social Sciences – The ResearchGate Score: a good example of a bad metric
No comments:
Post a Comment