Thursday, 3 September 2015

Altmetrics and Research Assessment: How Not to Let History Repeat Itself | The Scholarly Kitchen


Innovation, Libraries, Metrics and Analytics

Altmetrics and Research Assessment: How Not to Let History Repeat Itself

Those who do not remember history are doomed to repeat it. -George Santayana
Those who do not remember history are doomed to repeat it. -George Santayana
Over the past few weeks, I’ve been involved in a number of
discussions over the role of alternative metrics in research evaluation.
Amongst them, I moderated a session at SSP on the evaluation gap, took
part in the short course on journal metrics, prior to the CSE conference
in Philadelphia, and moderated a webinar on the subject. These
experiences have taught me a lot about both the promise of, and
challenges surrounding altmetrics, and how they fit into the broader
research metrics challenge that funders and institutions face today.
Particularly, I’ve become much more aware of the field of Informetrics,
the academic discipline that supports research metrics, and have begun
to think that we, as scholarly communication professionals and
innovators have been neglecting a valuable source of information and

It seems that broadly, everybody agrees that the Impact Factor is a
poor way to measure research quality. The most important objection is
that it is designed to measure the academic impact of journals, and is
therefore only a rough proxy for the quality of the research contained
within those journals. As a result, article-level metrics are becoming
increasingly common and are supported by Web of Science, Scopus and Google Scholar.
There are also a number of alternative ways to measure citation impact
for researchers themselves. In 2005 Jorge Hirsch, a physicist from UCSD,
proposed the h-index, which is intended to be a direct measure of a researcher’s academic impact through citations. There are also a range of alternatives and refinements
with names like m-index, c-index, and s-index, each with their own
particular spin on how best to calculate individual contribution.

While the h-index and similar metrics are good attempts to
tackle the problem of the impact factor being a proxy measure of
research quality, they can’t speak to a problem that has been identified
over the last few years and is becoming known as the Evaluation Gap.

Defining the Evaluation Gap

The Evaluation Gap is a concept that was introduced in a 2014 post by Paul Wouters, on the citation culture blog
which he co-authors with Sarah de Rijcke, both of whom are scholars at
the University of Leiden. The idea of the gap is summed up by Prof
Wouters as:

…the emergence of a more fundamental gap between on the
one hand the dominant criteria in scientific quality control (in peer
review as well as in metrics approaches), and on the other hand the new
roles of research in society.
In other words, research plays many different roles in society.
Medical research, for example, can lead to new treatments and better
outcomes for patients. There are clear economic impacts of work that
leads to patents or the formation of new companies. Add to that
legislative, policy and best practice impact, as well as education and
public engagement, and we see just how broad the ways are in which
research and society interact. Peer review of scholarly content and
citation counts are a good way to understand the impact of research on
the advancement of knowledge within the academy but a poor
representation of the way in which research informs activities outside
of the ivory tower.

Go0dhart’s Law: When a measure becomes a target it ceases to be a good measure

In April of this year, the Leiden manifesto, which was written by Diana Hicks and Paul Wouters, was published in nature.
There has been surprisingly little discussion about it in publishing
circles. It certainly seems to have been met with less buzz than the now
iconic altmetrics manifesto, which Jason Priem et al., published in 2010.  As Cassidy Sugimoto (@csugimoto) pointed out in the session at SSP that I moderated, the Leiden manifesto serves as a note of caution.

Hicks and Woulters point out that obsession with the Impact Factor is
a relatively new phenomenon, with the number of academic articles with
the words ‘impact factor’ in the title having steadily risen from almost
none, to around 8 per 100,000 a few years ago. The misuse of this
simple and rather crude metric to inform decisions that it was never
intended to inform has distorted the academic landscape by
over-incentivizing the authorship of high impact articles, and
discounting other valuable contributions to knowledge, as well as giving
rise to more sinister forms of gaming like citation cartels, stacking and excessive self-citation.
In many ways, citation counting and altmetrics share some common risks.
Both can be susceptible to gaming and as Hicks and Wouters put it…

….assessors must not be tempted to cede the decision-making to the numbers
Is history repeating itself?

Eugene Garfield is the founder of ISI and an important figure in bibliometrics. In his original 1955 article he makes an argument uncannily similar to the argument that Jason Priem made in the altmetrics manifesto (emphasis my own)

It is too much to expect a research worker to spend an inordinate amount of time searching for the bibliographic descendants of antecedent papers.

As the volume of academic literature explodes, scholars rely on filters to select the most relevant and significant sources from the rest. Unfortunately, scholarship’s three main filters for importance are failing.
In the case of both citation tracking and altmetrics, the original
problem was one of discovery in the face of information overload but
people inevitably start to look at anything that you can count as way to
increase the amount of automation in assessment. How do we stop
altmetrics heading down the same path as Impact Factor and distorting
the process of research?

Engagement exists on a spectrum. While some online mentions, for
example tweets, are superficial, requiring little effort to produce and
conveying only the most basic commentary, some mentions are of very high
value. For example, a medical article that is cited in
would not contribute to traditional citation counts but would inform
the practice of countless physicians. What is important is context. To
reach their full potential, altmetrics solutions and processes must not
rely purely on scores but place sufficient weight on qualitative context
based assessment.

The Research Excellence Framework (REF),
is a good example of how some assessors are thinking positively about
this issue. The REF is an assessment of higher education institutions
across the UK, the results of which are used to allocate a government
block grant that makes up approximately 15-20% of university funding.
The framework currently contains no metrics of any kind and according to
Stephen Hill of HEFCE, assessment panels are specifically told not to
use Impact Factor as a proxy for research quality. Instead, institutions
submit written impact statements and are assessed on a broad range of
criteria including their formal academic contributions, economic impact
of their work, influence on government policy, their public outreach
efforts and their contribution to training the next generation of
academics. HEFCE are treading carefully when it comes to metrics and are
consulting with informaticians about how to properly incorporate
metrics without distorting researcher behavior. Unfortunately, as
Jonathan Adams, chief scientist as Digital Science notes, some
researchers are already seeing evidence that the REF is affecting researcher behavior.

The importance of learning from the experts

I’ve only really touched very lightly on some of the issues facing
altmetrics and informetrics. When I’ve spoken to people who work in the
field, I get the impression they feel there isn’t enough flow of
information from the discipline into the debate about the future of
scholarly communication, leading to a risk that new efforts will suffer
the same pitfalls as previous endeavors.

As a result, many in the field have been trying very hard to be heard
by those of us working at the cutting edge of publishing innovation.
The Leiden manifesto (which has been translated into an excellent and easy to understand video) as well as earlier documents like the San Francisco Declaration on Research Assessment (DORA), (available as a poster, here)
are examples of these outreach efforts. These resources are more than
opinion pieces, they are attempts to summarize aspects of state of the
art thought in the discipline, to make is easier for publishers,
librarians, funders and technologists to learn about them.

Funders and institutions clearly feel that they need to improve the
way that research is assessed for the good of society and the
advancement of human knowledge. Much of the criticism of altmetrics
focuses on problems that traditional bibliometrics also suffer from,
over matricization, the use of a score as an intellectual shortcut, the
lack of subject normalization, and the risks of gaming. At the same
time, people working in the field of informetrics have good ideas to
address these issues. Publishers have a role to play in all of this by
supporting the continued development of tools that enable better

Instead of thinking about criticisms of altmetrics as arguments
against updating how we assess research, let’s instead think of them as
helpful guidance as to how to improve the situation yet further.
Altmetrics as they stand today are not a panacea and there is still work
to be done. Now that we have the power of the web at our disposal
however, it should be possible with some thought, and by learning from
those who study informetrics, to continue to work towards a more
complete and more useful system of research assessment.

About Phill Jones

Phill Jones is Head of Publisher Outreach at Digital Science, where
he works to improve understanding amongst publishers of the types of
products and services that Digital Science and its various portfolio
companies offer. Working particularly closely with ReadCube, Altmetric,
Figshare, and Overleaf, Phill supports marketing and sales efforts
through industry engagement, public speaking, conference participation
and educational efforts.
Phill has spent much of his career working on projects that use
technology to accelerate scientific discovery. He joined Digital Science
from portfolio company ReadCube, where he held the position of VP of
Business Development. Prior to Digital Science, he was the Editorial
Director at Journal of Visualized Experiments (JoVE), the first academic
video journal. Phill is a member of several committees including the
SSP annual conference and educational committees, the STM association
early career publishers and future lab committees.
In a former life, Phill was a cross-disciplinary research scientist. He
held a faculty position at Harvard Medical School, working in
bio-physics and neuroscience, despite having originally started out as a
plasma physicist at the UK atomic energy authority. He has also worked
as a microscopy consultant and scientific advisor for a number of
startups and small companies.

Altmetrics and Research Assessment: How Not to Let History Repeat Itself | The Scholarly Kitchen

No comments:

Post a Comment