inclusiveness and mix of automatically updated and hand-curated profiles
means you should never take any of its numbers at face value. Case in
point: the power couple Prof. Et Al and Dr. A. Author,
whose profiles I created following Scholar’s recommended settings (and a
bit of manual embellishment). If you have a Scholar profile, make sure
you don’t let Scholar update the publication list automatically without
checking and cleaning up regularly. If you’re looking at somebody else’s
profile, take it with a big pinch of salt, especially when they have a
reasonably common name or when duplicate entries or weird citation
distributions indicate that it is being automatically updated.
Update July 1st: Google Scholar has now manually
blocked Prof. et al. from appearing in top rankings for her
disciplines. They probably thought her too prominent a reminder of the
gameability of their system (how long will it take before they silence her next of kin?).
This doesn’t solve the real problem, noted below, of auto-updating
profiles like Yi Zhang and John A. Smith diluting top rankings. In fact,
even in scientometrics, it looks like there are at least 3 or 4 auto-updating profiles in the top 10.
I love Google Scholar. Like many scientists, I use it all the time
for finding scientific literature online, and it is more helpful and
comprehensive than services like PubMed, Sciencedirect, or JSTOR. I like
that Google Scholar rapidly delivers scholarly papers as well
as information about how these papers are cited. I also like its
no-nonsense author profiles, which enable you to find someone’s most
influential publications and gauge their relative influence at a glance.
These are good things. But they are also bad things. Let’s consider
Three good things about Google Scholar
- Google Scholar is inclusive. It finds scholarly works of
many types and indexes material from scholarly journals, books,
conference proceedings, and preprint servers. In many disciplines, books
and peer-reviewed proceedings are as highly valued and as influential
as journal publications. Yet services like Web of Science and PubMed
focus on indexing only journals, making Google Scholar a preferred tool
for many people interested in publication discovery and citation counts.
- Its citation analysis is automated. Citations
are updated continuously, and with Google indexing even the more obscure
academic websites, keeping track of the influence of scholarly work has
become easier than ever. You can even ask Scholar to send you an
email when there are new citations of your work. There is very little
selection, no hand-picking, and no influence from questionable measures
like impact factor: only citations, pure and simple, determine the order
in which papers are listed.
- Its profiles are done by scholars. No sane person wants to
disambiguate the hundreds of scholars named Smith or clean up the mess
of papers without named authors, titles or journals. Somebody at
Google Scholar had the brilliant idea that this work can be farmed out
to people who have a stake in it: individual scholars who want to make
sure their contributions are presented correctly and comprehensively. So
while citations are automated, the publication lists in Google Scholar
profiles are at least potentially hand-curated by the profile owners.
Pretty useful. But wait…
Three bad things about Google Scholar
- Google Scholar is inclusive. It will count anything that remotely looks like an article, including the masterpiece “Title of article”
(with 128 citations) by A. Author. It will include anything it finds on
university web domains, so anyone with access to such a domain can easily game the system.
Recently it has started to index stuff on academia.edu, a place without
any quality control where anybody can upload anything for
- Its citation analysis is automated. There are no humans
pushing buttons, making decisions and filtering stuff. This means
rigorous quality control is impossible. That’s why publications in the
well-known “Name of journal”
are counted as contributing bona fide citations, and indeed how “Title
of article” can have 128 citations so far. It’s also why the
recent addition of academia.edu content has resulted in an influx of
duplicate citations due to poor metadata.
- Its profiles are done by scholars. Scholars have incentives
to appear influential. H-indexes and citation counts play a role
in how their work is evaluated and enter into funding and hiring
decisions. Publications and co-authors can be added to Google Scholar
manually without any constraints or control mechanism, an opportunity
for gaming the system that some may find hard to resist. But forget
malicious intent: scholars are people, and people are lazy. If Google
Scholar tells them it can update their publications lists automatically,
they’ll definitely do so — with consequences that can be as hilarious
as harmful, as we’ll see below.
profiles of two eminent scholars, Dr. Author and Prof. Et Al.
Dr. AuthorEnter dr. A. Author. Ranking second in the field of citation analysis, his h-index is 30 and he has over 3500 citations. Among his most influential papers are “Title of article” with 159 citations and “Title of paper” with 128 citations to date. It is a matter of some regret to him that his 1990 “Instructions to authors”
has been less influential, but perhaps its time is yet to come. Dr.
Author is active across a remarkable range of fields. He likes to write
templates, editorials, and front matter but has also been known to
produce peer-reviewed papers as
well. His first name is variously spelled Andrew, Albert or Anonymous,
but most people just call him “A.” and Google Scholar happily accepts
and so will be necessarily noisy. His profile simply gathers anything
attributed to “A. Author”, a listing that is automatically updated in
accordance with Google Scholar’s recommended settings. How pieces like
“Title of article” can accrue >100 citations is a bit of a mystery,
especially since only a few of the citing articles are other
templates. Some of A. Author’s highly cited papers seem to be due
to incomplete metadata from the source; others seem to be simply
misparses; some are correct in the sense that editorials are often
authored by “anonymous author”. At any rate, this shows there are a lot
of ghost publications and citations out there, some of which may easily
be attributed to people or publications they don’t belong to.
But surely these are just quirks due to bad data — garbage in,
garbage out, as they say. Actual scientists maintaining a profile can
count on more reliable listings. Or can they?
Prof. Et AlEnter prof. Et Al. With
an h-index of 333 and over 2 million citations, she is the world’s most
influential scientist, surpassing such highly cited scholars as Freud,
Foucault, and Frith (what is it with F?). She has an Erdős number
of 1 and ranks first in the disciplines of
scientometrics, bibliometrics, quality control and performance
assessment; in fact in any discipline she would care to associate
herself to. How did she reach this status? Simply by (i) creating a
profile under her name, (ii) blindly adding the publications that Google
Scholar suggested were hers; (iii) allowing Scholar to update her
profile automatically, as recommended. Oh, and just because Google
Scholar allows her to, she also manually added some more papers she was
sure she wrote (including with her good friend Paul Erdős).
Scholars, being people, are mostly well-intentioned — but they can also
be unsuspecting, lazy or worse. Prof. Al started out by simply
doing what most scholars do when they create a new profile: following
the instructions and recommended settings. If you do this blindly,
Google Scholar will just add anything to your profile that comes
remotely close to your name, and there is almost a guarantee that you’ll
end up with a profile that way overestimates your scientific
Real-life examplesIt is not that hard to find real examples of profiles getting a lot
of extra padding because of Scholar’s automatic updating feature. Take Yi Zhang at
Georgia Tech, who must surely be the most accomplished PhD student ever
with 40.000+ citations and an h-index of 70. This is Google Scholar’s
recommended “automatic updating” feature going bananas with what must be
a very common name. Indeed, there is another Yi Zhang, ranking 4th in syntax
just after Chomsky, Sag, and Kiparsky. His top cited paper has 306
citations and yet the sum of his work —a well-rounded total of 1000
publications— has somehow received over 23,000 citations. (Note that #5 and #6 in syntax are also auto-updating profiles.)
All this is mostly harmless fun, until you realise that a profile may
be claiming the publications and citations of another one without
either of them noticing. Case in point: the profile of Giovanni Arturo Rossi, an expert on respiratory diseases, is consistently hoovering up publications by my colleague Giovanni Rossi,
who works on social interaction. Scholar auto-links author names to
profiles in search results, preventing people from finding the real
Rossi from his publications unless he actively and manually adds those
Arturo-claimed publications to his profile.
of every new publication manually, since otherwise Rossi (or Smith, or
Zhang) is going to get it added automatically to their profile. Also, if
you have a common name and you blindly follow Google Scholar’s
recommended settings, you may be very pleased with your h-index, but
probably for the wrong reasons (hello there John A. Smith, independent scholar, 23428 citations, h-index 64!). So
my most general recommendation would be: don’t let Google Scholar
update your profile automatically, and if you must, clean up regularly
to avoid looking silly.
Know what you’re doingSo far, the examples arise simply from Google Scholar’s recommended
setting to automatically update publication lists. It doesn’t look like
any of these authors (well, except maybe dr. Author and prof. Et Al)
have done anything like actively adding publications that aren’t
theirs, or claiming they’ve worked with Paul Erdős. But here’s the
thing: these things are not just possible, they are really
easy, as prof. Et Al’s superstar profile shows. And with hundreds of
thousands of active profiles, there’s bound to be some bad apples there.
What are the consequences? Nothing much if you take Google Scholar
for what it is: a useful but imperfect tool. Yet many take it more
seriously. If you’re in the business of comparing people (for instance
while reviewing job applications or when looking for potential
conference speakers), the metrics provided by Google Scholar are some of
the first ones you’ll come across and it will be very tempting to use
them. There is even an r package that
will help you extract citation data and compare scholars based solely
on citation numbers and h-indexes. All this is perilous business,
considering these ranks are diluted with auto-updating ghost profiles.
Let me end by reiterating that I love Google Scholar and I use it all the time.
It can be a tremendously useful tool. Like all tools, it can also be
misinterpreted, misused and even gamed. If you know what you’re doing
you should be fine. But if you think you can blindly trust it, take
another look at the work of dr. A. Author and prof. dr. Et Al.
The “A. Author” and “Et Al” profiles were created in June 2016 by
Mark Dingemanse to illustrate the points made in this post. Thanks to
Seán Roberts for suggesting that A. Author should co-author with Et
Al. Just in case Google Scholar follows up with some manual quality
control and some of these profiles or publications disappear,
screenshots document all the relevant profiles and pages.
There is something of a tradition of creating Google Scholar profiles to make a point; see here and here,
for example. While my goal here is simply to promote mindful use of
technology by noting some problems with Google Scholar profiles (as
opposed to citations, the focus of most prior research), let me note
there is of course a large scholarly literature in bibliometrics and scientometrics on the pros and cons of Google Scholar. Google Scholar Digest offers a comprehensive bibliography.