Sunday, 14 June 2015

The genius and the h-index | The land of algorithms


The genius and the h-index

Last week my friend Sanjoy came in Pisa to visit us and give a
three day long seminar. At dinner with a few colleagues, we starting
discussing about academic careers in Italy, and how difficult it is to
obtain a position (currently, there is none open). My younger colleagues
were discussing “how many papers you need to get a position”, a common
“game” among young researchers, and Marco
observed that no earlier than 10 years ago, the average requirements
(and the expectations) were so much lower than today: a couple of
journal papers were enough for becoming an assistant professor, 6
journals for associate, 12 for full professors. Now, 10 journals may not
be enough for an assistant position! Seems that people are publishing
much more, and much more frequently, and correspondingly the limits are
getting higher and higher. I will not get into the discussion of why
this is happening and if it is good or bad (maybe in a future post).

Inevitably, we ended up talking of the Hirsch index (or h-index)
for evaluating researcher performance. This index is very popular,
although it has received a lot of criticism. The definition is:

A scientist has index h if h of [his/her] Np papers have
at least h citations each, and the other (Np − h) papers have at most h
citations each.
In practice, you need to count the citations to each one of your
papers; then sort the papers in decreasing order of citations; then find
the index h of the paper that has  no less  than h citations, while the h+1-th  has less than h.

The popularity of this index is probably due to the fact that it is
easy to calculate and easy to understand: many on-line database offer a
service for calculating it automatically. There are also many critics of
the h-index, and I am one of them: it depends on the
researcher age, so it tends to underestimate the performance of your
researchers; it tends to overestimate people that publish a lot; it
strongly depends on the research area; it also depends on the database

Many other performance indexes have been proposed and many more will
be in the future. Why? Why so many efforts in trying to measure the
performance of academic researchers?

One of the main reasons is exogenous to the academic world.
Politicians try to allocate money to the best researchers and to the
best groups, so it is important for them (that have no specific
background to directly evaluate researchers) to obtain an “index”,
something that they can use right away to compare individuals, groups,
departments and universities. The Italian government, in particular, is
finally building up a national evaluation process for universities and
departments, and a good, robust performance metric (if such a thing
existed) would be of great help.

Let’s focus on measuring the performance of a researcher. An
important question is: should we consider the h-index a good measure of
the academic performance? For example, if a researcher has published
only 3 papers with a large impact, with 1000 citations each, the h-index
will be just 3. On the other hand, consider a researcher that has 20
papers, each one with 20 citations, his h-index will be no less than 20.
Therefore, this index seems to favour researchers with lot of good
papers, although maybe none very fundamental.

It is the old difficult question: quality of quantity? Then, Sanjoy pointed me to this article. Here is an extract:

The psychologist Dean Simonton argues that fecundity is
often at the heart of what distinguishes the truly gifted. The
difference between Bach and his forgotten peers isn’t necessarily that
he had a better ratio of hits to misses. The difference is that the
mediocre might have a dozen ideas, while Bach, in his lifetime, created
more than a thousand full-fledged musical compositions. A genius is a
genius, Simonton maintains, because he can put together such a
staggering number of insights, ideas, theories, random observations, and
unexpected connections that he almost inevitably ends up with something
great. “Quality,” Simonton writes, “is a probabilistic function of
Yes, I think that quality is a probabilistic
function of quantity (the key is in the probability). It was true for
Bach, Leonardo Da Vinci, Mozart, Newton,  Gauss and Euler. However,
sometimes it is not true; Einstein is maybe the best example: he
published a relatively low number of paper with an extraordinary impact.
Also, many mathematicians fall in this category (with the notable
exception of Erdos). In conclusion, I think we may find many examples of
genius for which quality = quantity, and many examples for which
quality != quantity. I think that Simonton concentrates on one very
specific aspect of genius. But this concept is difficult to define,
capture, encapsulate.

Going back to the h-index: if we are in search of the pure
genius, then the h-index is probably of no help; an academic genius
(especially a young one) can be recognised by his peers without any
index, and can be missed by any index. A performance index is probably
more necessary to evaluate mediocre researchers from the bad ones (and
we also need mediocre researchers!); the problem is to find the “perfect
index” (if such a thing exists…)

[1] Lutz Bornmann and Hans-Dieter Daniel, “The state of h index research. Is the h index the ideal way to measure research performance? DOI: 10.1038/embor.2008.233.

The genius and the h-index | The land of algorithms

No comments:

Post a Comment