Research Tools: Bibliometrics

Thursday, 26 November 2015

Bibliometrics

Source: http://www.guidelines.kaowarsom.be/en/content/bibliometrics

Bibliometrics

View
Track

General

Definition

The word “bibliometrics” is used to designate a set of quantitative methods of analysis of scientific publications. Every aspect of a publication that can be quantified may form the subject of a bibliometric study: the number of words in a paper,
the delay time between submission and publication, etc. While the
quantitative data concerning a specific paper may be rather boring, they
become more interesting when comparing different publications or for a
statistical study of large sets of publications.

Bibliometrics and research evaluation

During
the last decades bibliometric methods have become quite fashionable for
the evaluation of scientific research and for the assessment of individual researchers. The most frequently used bibliometric measures are the following:

The number of papers published by a given researcher or research group, as an indication of his/its productivity.
The frequency with which a published paper is cited in later
publications by other researchers, as an indication of the interest this
paper has raised.
The frequency with which an electronically available paper is downloaded by readers, as an indication of its importance.
The average frequency with which the papers in a given journal are cited during a given time span after publication, as an indication of the scientific quality of the journal or of the thoroughness of its peer review.

Advantages of bibliometric methods and drawbacks

The advantages of bibliometric methods for scientific evaluations are rather obvious:

The methods are straightforward, since based on simple counting.
Many techniques have become especially simple in the digital age,
because their application can be automated.
On first sight they are objective and unbiased.

At the same time, there are some obvious drawbacks in these methods:

As quantitative methods they may completely miss the point for a qualitative evaluation.
They may be manipulated (e.g., unnecessary citations by your colleagues).
The number of citations depends more on the number of people working
in the same domain, rather than on the intrinsic quality or originality
of the published results.
Since the number of citations is a driving force for evaluating
researchers in their career promotion, researchers will tend to cling to
“trendy research” in fields where many other researchers are active and
where scientific funds can more easily be obtained. The result may be a
trivialization of research subjects instead of an active search for original research ideas.
In practice it has lead to an Anglo-Saxon bias and a strengthening
of the big players in the publication sector, to the disadvantage of
small publishers, Southern countries and, e.g., Hispanic, Japanese, Chinese, Russian, Arabic and Francophone publications.

This means that one should be very careful in drawing conclusions from

bibliometric methods. They are at their best as statistical methods,
and therefore also prone to big errors when applied to individual cases.
Even if it could be proven that there is a strong correlation between
the number of citations and the scientific quality of a paper, it would
be very dangerous to conclude that a paper without citations is
necessarily of low scientific value.

Practical implementations

The most widely used bibliometric instrument is formed by the databases of Thomson Reuters (formerly ISI) with its Web of Knowledge, containing citation indices since 1900 and covering 23,000 journals, and the derived Journal of Citation Reports with
statistical data such as impact parameters for more than 10,000
journals. We discuss them below in more detail. Access to these
databases, however, requires a very expensive subscription.
SCOPUS is an alternative citation database from Elsevier, covering 19,500 journals, also by subscription.
Google Scholar offers a good and free alternative. By searching, e.g.,
with the author name, you not only obtain a list of publications, but
for each of them the number of citations and even the link to all citing
papers. Google Scholar may be a better alternative for the Social
Sciences.
CiteSeerX offers an independent free citation database for computer and information sciences.
MESUR (MEtrics
from Scholarly Usage of Resources) was a big project in which the
access to e-journals was logged at a large number of US university
campuses, and combined with the above mentioned bibliometric data. One
of their conclusions, reported in J. Bollen et al. (2009) is that the
journal impact factor is only of marginal importance and should thus be
used with caution.

The Thomson Reuters databases

Since
the Thomson Reuters databases and their contained indices and
parameters are most widely used in Western universities and research
funding agencies, we describe them here in some more detail.

History

The
history of these databases goes back to the (paper format) citation
indices published by the Institute of Scientific Information (ISI) since
1955, as started by Eugene Garfield. The original intention was not to
evaluate research but to offer an instrument where researchers could
discover the most recent publications by subject. It was soon realized
that the citations could serve as an additional help for discovering
relevant papers for specialized subjects, and the first citation index
version was published in 1964 as the Science Citation Index (SCI). Two
years later it became available on magnetic tape, later on CD-Rom and
now – much extended with data from the Social Sciences – on the
Internet.

The Web of Knowledge

The
core of the system is still a large index database in which all papers
from more than 13000 journals are registered with their full metadata
and their list of citations, going back to 1955. Free format searches
can be performed on title, subject, author, journal, author address and
more. In this sense, it still fulfils its original role as an indexing
instrument. For each of the search results, not only the full metadata
are reproduced, but also the list of cited references, and the list of
later papers that have cited this one. From the references, the
database is extended with data about cited journals outside the core of
13000 analyzed journals and before 1955 (going back to 1900). As of
March 2012 they claim to have 87 million source items, with 700 million
cited references.
For the evaluation of individual researchers and
research teams, the important aspect here is the number of citations
received for each article.

Journal Citation Reports

From
the data available in this large database, each year a special report
is produced containing a detailed analysis of the citations per journal
for the previous year, and classified by broad subject categories. We
mention some of them:

The total number of articles published in the journal during the concerned year.
The total number of citations to each journal during the concerned year.
The impact factor: the number of citations during the year to
articles that appeared in this journal during the two previous years,
divided by the number of articles that appeared in this journal during
these two years. (E.g.: the impact factor of journal X for 2011
is the number of citations during 2011 to articles that appeared in
journal X in 2009 and 2010, divided by the number of papers published in
journal X in these same two years.) The impact factor therefore
expresses the average number of times that an article (published in the
previous 2 years) has been cited during the concerned year.
The 5-year impact factor: the same as above, but for 5 years instead
of 2. (This is important for subject fields that evolve more slowly.)
The immediacy index: number of citations in the previous year to
articles in the journal published in the same year, divided by the
number of articles published in this year.
Journal cited half-life: In order to calculate this half-life all
citations during the year concerned are counted by year of publication
of the cited paper, and the half-life is set such that there are as many
citations to papers before that time span as to papers after that time
span. (A short half-life indicates that on average publications in this
journal may be cited well, but that over time these citations rapidly
diminish.)

Impact factor abuse

At
present journal impact factors (JIF) are playing a very important role
in research evaluation procedures in Western countries, in spite of the
fact that they are widely believed to be overestimated. Because of
this JIF many researchers continue to publish in overly expensive
commercial journals and they often neglect more suitable Open Access
publication channels. What are the drawbacks of the impact factors?

In absolute terms, a JIF reflects for a large part the number of
people working in a given field. The highest JIF attributed for 2010 to
a journal in oncology was 94.3, whereas the highest in ornithology was
2.3. To conclude that the research quality in ornithology is so much
lower than that in cancer research would be preposterous: there are
clearly more people active in the latter domain. For this reason, the
ranking lists of journals per category are more important than the
individual JIF values.
In the same way, it is easier to obtain a high JIF for a journal in a
domain in full expansion than in a domain where scientific evolution
has reached a point of quiet maturity. This should not discourage
researchers from working in this more quiet field, where still important
scientific work may be done, even if this will not lead to a high
number of citations.
By its definition, a JIF is an average value for all papers
published during 2 (or 5) years in the given journal, and does not
guarantee the quality of an individual paper.

We
refer to the References for further reflexions about the value of this
and other bibliometric evaluation parameters. It should be clear that
they must be treated with care: they may be welcome as additional
information, but they can never replace a good qualitative evaluation of the work performed by a researcher or research group (Georges Stoops, 2009).
References

C. Neuhaus & H.-D. Daniel, Data sources for performing citation analysis: An overview, Journal of documentation, 64 (2008), 193-210. preprint:http://e-collection.ethbib.ethz.ch/eserv/eth:29124/eth-29124-01.pdf.
J. Bollen, H. Van de Sompel, A. Hagberg, R. Chute, A Principal Component Analysis of 39 Scientific Impact Measures, PLoS ONE 4(6), 2009: e6022. doi:10.1371/journal.pone.0006022
D.
Schoonbaert & G. Roelants: Citation analysis for measuring the
value of scientific publications: quality assessment tool or comedy of
errors? Tropical Medicine and International Health, 1 (1996), 739-752, http://onlinelibrary.wiley.com/doi/10.1111/j.1365-3156.1996.tb00106.x/pdf.
M. Amin & M. Mabe, Impact factors: use and abuse, Perspectives in Publishing, 1 (2000), http://www.elsevier.com/framework_editors/pdfs/Perspectives1.pdf.