Tuesday, 8 November 2016

Some things you need to know about Google Scholar | The Ideophone

 Source: http://ideophone.org/some-things-you-need-to-know-about-google-scholar/

Some things you need to know about Google Scholar

Summary: Google Scholar is great, but its
inclusiveness and mix of automatically updated and hand-curated profiles
means you should never take any of its numbers at face value. Case in
point: the power couple Prof. Et Al and Dr. A. Author,
whose profiles I created following Scholar’s recommended settings (and a
bit of manual embellishment). If you have a Scholar profile, make sure
you don’t let Scholar update the publication list automatically without
checking and cleaning up regularly. If you’re looking at somebody else’s
profile, take it with a big pinch of salt, especially when they have a
reasonably common name or when duplicate entries or weird citation
distributions indicate that it is being automatically updated. 

Update July 1st: Google Scholar has now manually
blocked Prof. et al. from appearing in top rankings for her
disciplines. They probably thought her too prominent a reminder of the
gameability of their system (how long will it take before they silence her next of kin?).
This doesn’t solve the real problem, noted below, of auto-updating
profiles like Yi Zhang and John A. Smith diluting top rankings. In fact,
even in scientometrics, it looks like there are at least 3 or 4 auto-updating profiles in the top 10.

I love Google Scholar. Like many scientists, I use it all the time
for finding scientific literature online, and it is more helpful and
comprehensive than services like PubMed, Sciencedirect, or JSTOR. I like
that Google Scholar rapidly delivers scholarly papers as well
as information about how these papers are cited. I also like its
no-nonsense author profiles, which enable you to find someone’s most
influential publications and gauge their relative influence at a glance.
These are good things. But they are also bad things. Let’s consider

Three good things about Google Scholar

  1. Google Scholar is inclusive. It finds scholarly works of
    many types and indexes material from scholarly journals, books,
    conference proceedings, and preprint servers. In many disciplines, books
    and peer-reviewed proceedings are as highly valued and as influential
    as journal publications. Yet services like Web of Science and PubMed
    focus on indexing only journals, making Google Scholar a preferred tool
    for many people interested in publication discovery and citation counts.
  2. Its citation analysis is automated. Citations
    are updated continuously, and with Google indexing even the more obscure
    academic websites, keeping track of the influence of scholarly work has
    become easier than ever. You can even ask Scholar to send you an
    email when there are new citations of your work. There is very little
    selection, no hand-picking, and no influence from questionable measures
    like impact factor: only citations, pure and simple, determine the order
    in which papers are listed.
  3. Its profiles are done by scholars. No sane person wants to
    disambiguate the hundreds of scholars named Smith or clean up the mess
    of papers without named authors, titles or journals. Somebody at
    Google Scholar had the brilliant idea that this work can be farmed out
    to people who have a stake in it: individual scholars who want to make
    sure their contributions are presented correctly and comprehensively. So
    while citations are automated, the publication lists in Google Scholar
    profiles are at least potentially hand-curated by the profile owners.
    Pretty useful. But wait…

Three bad things about Google Scholar

The classic 'Title of paper', 1995

  1. Google Scholar is inclusive. It will count anything that remotely looks like an article, including the masterpiece “Title of article
    (with 128 citations) by A. Author. It will include anything it finds on
    university web domains, so anyone with access to such a domain can easily game the system.
    Recently it has started to index stuff on academia.edu, a place without
    any quality control where anybody can upload anything for
  2. Its citation analysis is automated. There are no humans
    pushing buttons, making decisions and filtering stuff. This means
    rigorous quality control is impossible. That’s why publications in the
    well-known “Name of journal
    are counted as contributing bona fide citations, and indeed how “Title
    of article” can have 128 citations so far. It’s also why the
    recent addition of academia.edu content has resulted in an influx of
    duplicate citations due to poor metadata.
  3. Its profiles are done by scholars. Scholars have incentives
    to appear influential. H-indexes and citation counts play a role
    in how their work is evaluated and enter into funding and hiring
    decisions. Publications and co-authors can be added to Google Scholar
    manually without any constraints or control mechanism, an opportunity
    for gaming the system that some may find hard to resist. But forget
    malicious intent: scholars are people, and people are lazy. If Google
    Scholar tells them it can update their publications lists automatically,
    they’ll definitely do so — with consequences that can be as hilarious
    as harmful, as we’ll see below.
To illustrate these points, let’s have a look at the Google Scholar
profiles of two eminent scholars, Dr. Author and Prof. Et Al.

Dr. Author

Enter dr. A. AuthorRanking second in the field of citation analysis, his h-index is 30 and he has over 3500 citations. Among his most influential papers are “Title of article” with 159 citations and “Title of paper” with 128 citations to date. It is a matter of some regret to him that his 1990 “Instructions to authors
has been less influential, but perhaps its time is yet to come. Dr.
Author is active across a remarkable range of fields. He likes to write
templates, editorials, and front matter but has also been known to
produce peer-reviewed papers as
well. His first name is variously spelled Andrew, Albert or Anonymous,
but most people just call him “A.” and Google Scholar happily accepts

Dr. Author reminds us that Google Scholar citations are done by an automated system,
and so will be necessarily noisy. His profile simply gathers anything
attributed to “A. Author”, a listing that is automatically updated in
accordance with Google Scholar’s recommended settings. How pieces like
“Title of article” can accrue >100 citations is a bit of a mystery,
especially since only a few of the citing articles are other
templates. Some of A. Author’s highly cited papers seem to be due
to incomplete metadata from the source; others seem to be simply
misparses; some are correct in the sense that editorials are often
authored by “anonymous author”. At any rate, this shows there are a lot
of ghost publications and citations out there, some of which may easily
be attributed to people or publications they don’t belong to.

But surely these are just quirks due to bad data — garbage in,
garbage out, as they say. Actual scientists maintaining a profile can
count on more reliable listings. Or can they?

Prof. Et Al

Enter prof. Et Al. With
an h-index of 333 and over 2 million citations, she is the world’s most
influential scientist, surpassing such highly cited scholars as Freud,
Foucault, and Frith
(what is it with F?). She has an Erdős number
of 1 and ranks first in the disciplines of
scientometrics, bibliometrics, quality control and performance
assessment; in fact in any discipline she would care to associate
herself to. How did she reach this status? Simply by (i) creating a
profile under her name, (ii) blindly adding the publications that Google
Scholar suggested were hers; (iii) allowing Scholar to update her
profile automatically, as recommended. Oh, and just because Google
Scholar allows her to, she also manually added some more papers she was
sure she wrote (including with her good friend Paul Erdős).

Prof. Al reminds us that Google Scholar profiles are made by scholars.
Scholars, being people, are mostly well-intentioned — but they can also
be unsuspecting, lazy or worse. Prof. Al started out by simply
doing what most scholars do when they create a new profile: following
the instructions and recommended settings. If you do this blindly,
Google Scholar will just add anything to your profile that comes
remotely close to your name, and there is almost a guarantee that you’ll
end up with a profile that way overestimates your scientific

Real-life examples

It is not that hard to find real examples of profiles getting a lot
of extra padding because of Scholar’s automatic updating feature. Take Yi Zhang at
Georgia Tech, who must surely be the most accomplished PhD student ever
with 40.000+ citations and an h-index of 70. This is Google Scholar’s
recommended “automatic updating” feature going bananas with what must be
a very common name. Indeed, there is another Yi Zhang, ranking 4th in syntax
just after Chomsky, Sag, and Kiparsky. His top cited paper has 306
citations and yet the sum of his work —a well-rounded total of 1000
publications— has somehow received over 23,000 citations. (Note that #5 and #6 in syntax are also auto-updating profiles.)

All this is mostly harmless fun, until you realise that a profile may
be claiming the publications and citations of another one without
either of them noticing. Case in point: the profile of Giovanni Arturo Rossi, an expert on respiratory diseases, is consistently hoovering up publications by my colleague Giovanni Rossi,
who works on social interaction. Scholar auto-links author names to
profiles in search results, preventing people from finding the real
Rossi from his publications unless he actively and manually adds those
Arturo-claimed publications to his profile.

Bottomline: if you have a common name, you’ll have to take control
of every new publication manually, since otherwise Rossi (or Smith, or
Zhang) is going to get it added automatically to their profile. Also, if
you have a common name and you blindly follow Google Scholar’s
recommended settings, you may be very pleased with your h-index, but
probably for the wrong reasons (hello there John A. Smith, independent scholar, 23428 citations, h-index 64!). So
my most general recommendation would be: don’t let Google Scholar
update your profile automatically, and if you must, clean up regularly
to avoid looking silly.

Know what you’re doing

So far, the examples arise simply from Google Scholar’s recommended
setting to automatically update publication lists. It doesn’t look like
any of these authors (well, except maybe dr. Author and prof. Et Al)
have done anything like actively adding publications that aren’t
theirs, or claiming they’ve worked with Paul Erdős. But here’s the
thing: these things are not just possible, they are really
easy, as prof. Et Al’s superstar profile shows. And with hundreds of
thousands of active profiles, there’s bound to be some bad apples there.

What are the consequences? Nothing much if you take Google Scholar
for what it is: a useful but imperfect tool. Yet many take it more
seriously. If you’re in the business of comparing people (for instance
while reviewing job applications or when looking for potential
conference speakers), the metrics provided by Google Scholar are some of
the first ones you’ll come across and it will be very tempting to use
them. There is even an r package that
will help you extract citation data and compare scholars based solely
on citation numbers and h-indexes. All this is perilous business,
considering these ranks are diluted with auto-updating ghost profiles.

Let me end by reiterating that I love Google Scholar and I use it all the time.
It can be a tremendously useful tool. Like all tools, it can also be
misinterpreted, misused and even gamed. If you know what you’re doing
you should be fine. But if you think you can blindly trust it, take
another look at the work of dr. A. Author and prof. dr. Et Al.


The “A. Author” and “Et Al” profiles were created in June 2016 by
Mark Dingemanse to illustrate the points made in this post. Thanks to
Seán Roberts for suggesting that A. Author should co-author with Et
Al. Just in case Google Scholar follows up with some manual quality
control and some of these profiles or publications disappear,
screenshots document all the relevant profiles and pages.

There is something of a tradition of creating Google Scholar profiles to make a point; see here and here,
for example. While my goal here is simply to promote mindful use of
technology by noting some problems with Google Scholar profiles (as
opposed to citations, the focus of most prior research), let me note
there is of course a large scholarly literature in bibliometrics and scientometrics on the pros and cons of Google Scholar. Google Scholar Digest offers a comprehensive bibliography.

5 thoughts on “Some things you need to know about Google Scholar

  1. what must be a very common name
    Understatement! Zhāng 张 is the third most common surname in the PRC
    (87.5 million people, 6.83 % of the population) as of 2007 and the
    fourth most common one (traditional character: 張) in the ROC (5.26 % of
    the population) as of 2010, says Wikipedia; its exact homophone 章 is
    much rarer (not in the top 100), but does include the actress Zhāng Zǐyí
    for example. Single-syllable given names are less common in China, but yi is four common syllables, and Google Scholar probably auto-adds every “Y. Zhang” just in case.

  2. But even far away from such extreme
    cases, there’s at least one fellow biologist who shares my exact full
    name, and there are other scientists with my last name whose first names
    begin with D. Google Scholar suggests all their publications to me. On
    top of that, my homonym seems not to have a Google Scholar profile. He’s
    lucky I haven’t turned auto-updating on.

  3. Wow! I guess I’m lucky with a
    relatively unique name (though there is a “Maria A Dingemanse” whose
    publications are suggested to me by Google Scholar). Auto-updating
    really should not be the recommended setting.

  4. Totally agree. The automatic update
    takes the moral responsibility of a person claiming other people’s work
    away. I had not looked at my google scholar for a while, and the
    citation increased a lot. It turned out that someone else’s work are
    listed. Probably google wants us to be glued to this site. But using
    this method is not very good. Having the biggest surname in China and a
    common initial, hundreds of papers would be added to your profile in a
    couple of days. One can get tired of it very easily and abandon the app.
    It seriously undermines the credibility of google scholar.

    The core problem is that Google Scholar simply misunderstand the
    incentives. For academic people, they have every incentive to publicise
    their works. Just let them add their work voluntarily should be good
    enough. At the same time, they would have to bear the responsibility to
    make sure what they add will be truly their own work. Now with this
    automatic function, no one will be responsible and everyone can claim
    that they do not know what’s going on. Not trustworthy at all. Sad

Some things you need to know about Google Scholar | The Ideophone

No comments:

Post a Comment