The relationship between quality of research and citation frequency
1 Medical Informatics Group, University of Oulu, P.O. Box 5000, FIN-90014 Oulu, Finland
Institute of Medical Biometry and Medical Informatics, University of
Freiburg, Stefan-Meier-Str. 26, D-79115 Freiburg, Germany
The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2288/6/42
|Received:||17 April 2006|
|Accepted:||1 September 2006|
|Published:||1 September 2006|
© 2006 Nieminen et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
of published articles. The objective of this study is to assess whether statistical
reporting and statistical errors in the analysis of the primary outcome are associated
with the number of citations received.
journals. The statistical and reporting quality of each paper was assessed and the
number of citations received up to 2005 was obtained from the Web of Science database.
We then examined whether the number of citations was associated with the quality of
the statistical analysis and reporting.
inadequate reporting of the research question and primary outcome were not statistically
significantly associated with the citation counts. After adjusting for journal, extended
description of statistical procedures had a positive effect on the number of citations
received. Inappropriate statistical analysis did not affect the number of citations
received. Adequate reporting of the primary research question, statistical methods
and primary findings were all associated with the journal visibility and prestige.
statistical analysis were not associated with the number of citations. The journal
in which a study is published appears to be as important as the statistical reporting
quality in ensuring dissemination of published medical science.
The attention that scientific articles get can be assessed using citation analysis.
In this context, Egghe & Rousseau  claim that four important assumptions form the basis for all research based on citation
counts. The assumptions are that: (1) citation of an article implies use of that document
by the citing author, (2) citation reflects the merit (quality, significance, impact)
of the article, (3) references are made to the best possible works, and (4) an article
is related in content to the one in which it is cited. Thus citation counts can be
regarded as one method of obtaining a quantitative expression of the utilization and
contribution of a particular published paper. However, whether received citations
reflect the methodological quality has been questioned .
in the high proportion of articles which are essentially statistical in their presentation
[3,4]. The most visible aspect of this is the statistical summaries of the raw data used
in the research. Medical research articles using statistical methods have always been
at risk of poor reporting, methodological errors and selective conclusions [5-8]. The existence of these problems in published articles is often regarded as evidence
that poor research and poor reporting quality slips through the peer review process
is presently unclear . Our aim is to investigate the extent to which authors consider the quality of the
evidence when deciding which evidence to cite. We hypothesised that publications are
cited for a variety of reasons, but that the effect of statistical reporting and inappropriate
statistical analysis on the number of citations is minimal.
Set of articles
The American Journal of Psychiatry (AJP), Archives of General Psychiatry (AGP), the British Journal of Psychiatry (BJP) and the Nordic Journal of Psychiatry (NJP). AJP and AGP are the two leading medical journals covering psychiatric research
and have consistently been the top two as ranked by Garfield's impact factor (IF),
while BJP is the most cited psychiatric journal outside the United States and NJP
represents the large group of journals having a markedly lower IF than the other three
studied here. The four journals had the following impact factors in 2004: AGP 11.207,
AJP 7.614, BJP 4.175 and NJP 0.887.
Libraries in the authors' institutes. Papers were included for examination if they
had been published as original research articles in 1996, reported research findings
based on the systematic collection of data, and used statistical methods for data
analysis. The total number of articles reviewed was 448, representing about 47% of
all the articles in the four journals (N = 951). Those excluded were mostly letters
(n = 287), brief reports (AJP, n = 63), reviews (n = 22) or editorials. Further details
of the sample and the statistical methodology used in the articles have been published
in an earlier paper .
Number of citations
of Science databases (Science Citation Index, Social Sciences Citation Index and Arts
& Humanities Citation Index) in April 2005. Self-citation was deemed to have occurred
whenever the set of co-authors of the citing article shared at least one author with
that of the cited one, a definition used in various recent citation studies . The number of self-citations was then subtracted from the total number of citations
Primary outcome in the evaluated articles
response variable(s) together with possible explanatory or grouping factors. The primary
outcome was that which was stated in the research objectives (in the abstract or introduction)
or labelled as "primary" in the methods. When no outcome met these criteria, the reviewer
used his own judgment to select the outcome that was presented in the abstract, and/or
the first outcome presented in the results, that appeared crucial to the final conclusions.
The psychiatric sub-field and study design were also assessed. Papers that were difficult
to assess were additionally reviewed by GR and MS, then jointly assessed.
(manual) classification scheme for each paper, and was blind to the number of citations
reviewers (P.N. and Jouko Miettunen). They independently reviewed all the 448 articles
in separate research projects with different research questions; however, their review
protocols shared two items. For the first, 'whether data analysis procedures were
completely described in the methods part of the research report', the overall agreement
between raters was 90.7% and kappa coefficient for inter-rater reliability was 0.75
(95% CI 0.68 – 0.82). For the second, 'whether the statistical software used in the
study was named in the report', the overall agreement was 96.9% and kappa coefficient
0.93 (95% CI 0.89 – 0.96).
Characteristics of the statistical reporting and analysis
whether the primary research question or hypothesis was clearly stated in the report's
introduction or methods section; (ii) whether sample size and data analysis procedures
were described in the report's methods section, and (iii) whether the article was
difficult to read due to lack of clarity about the primary response or outcome variable.
each article was checked for the specific analysis errors defined by Altman  as 'definite errors'. These errors are related to elementary statistical techniques
and included the following: (i) using a statistical test that requires an underlying
normal distribution on data that are not normally distributed; (ii) using an incorrect
method for repeated measurements, analyzing serial measurements independently at multiple
time points and making comparisons between p-values; (iii) using a non-parametric
test that requires an ordered scale on data with non-ordered categorical variable;
(iv) wrong unit of analysis, confusion between tests or more tests than number of
cases; or (v) other errors such as using an incorrect method for time-to-event data
or using a correlation coefficient to relate change to initial value.
drew conclusions not supported by the study data, reported significant findings without
a statistical test or CI, or explicitly or implicitly made comparisons between p-values.
The overuse of statistical tests, defined to be present if there was no clear main
hypothesis, or several sub-group analyses using the primary outcome, was also assessed.
categorised each study's sample size as small (< 50), medium (50 – 360) or large (>360).
positively skewed, so we use the median as a measure of location. Mann-Whitney tests,
Kruskal-Wallis ANOVA and negative binomial regression were used to investigate possible
associations between the number of citations and reporting quality. We adjusted for
journal to control for the effect of journal visibility. The statistical significance
of differences in statistical reporting and errors between the four journals was evaluated
using chi-square test.
SAS Release 9.1 (SAS Institute Inc.).
topics, psychopharmacology, biological topics and others. The distribution of published
articles by journal and topic is shown in table 1. AJP and NJP had more clinical articles than the other two journals, BJP had more
other articles (e.g. prevalence and validity studies) and AGP had more biological
articles compared to other evaluated journals. The distribution of study designs was
as follows: cross-sectional surveys (33.7%), cohort studies (26.8%), case-control
studies (16.5%), intervention studies including clinical trials (16.7%), reliability
and diagnostic studies (4.7%) and basic science studies (1.6%).
number of citations received by the publishing journal.
up to April 2005 the AGP articles received a median of 64 citations while the median
for those in the AJP was 33 and for those in the BJP was 20. Few references were made
to articles published in the low IF journal NJP (median 1, not included in the figure
1 due to low number of citations).
is not included due to low number of citations.
The quality of reporting
state the primary research question or hypothesis was most common defect (34.6%).
Of the 448 evaluated articles, sample size was unreported in 78 (17.4%) papers. The
quality of reporting was related to the journal; failure to describe the primary research
question and methods was less common in the AJP and AGP.
received by the publishing journal.
quality measures. There was not a strong association between the quality of reporting
and the number of citations received by the articles. In the AGP, articles with better
reporting quality received more citations, but this association was not statistically
significant in any of the quality variables. Only in the AJP and BJP did 'description
of statistical procedures' have a statistically significant positive association with
the number of citations received (Mann-Whitney test, p < 0.05)
Errors in statistical analysis
articles (3.8%) used a statistical test that requires an underlying normal distribution
on data that clearly was not normally distributed; 5.8% (26 articles) used an incorrect
method for repeated measurements (unpaired or independent samples); 0.9% (4 articles)
used a test that requires an ordered scale on data with non-ordered categorical variable;
5.6% (25 articles) had confusion with observation units, confusion between tests or
more tests than number of cases; and 0.6% (3 articles) had other errors. Inappropriate
analyses seemed to be less common in the more visible journals. The total error rate
of 16.7% is probably an underestimate, because often articles did not give enough
information to evaluate the appropriateness of the methods they used. 31.5% (141 articles)
met our criteria of overuse of statistical significance tests (i.e. they lacked a
clear main hypothesis or had several sub-group analyses using the primary outcome).
of citations received by the publishing journal.
by the statistical analysis variables. There is no evidence that errors in the statistical
analysis of the primary outcome decreased the number of citations.
Adjusted effects on the number of citations
of statistical reporting and analysis on the number of received citations, adjusted
for the publication forum, is shown in Table 4. Journal visibility is the most important predictor of citation frequency; the citation
rate in the AGP is three times that in the BJP. After adjustment for journal, articles
which have an inadequate description of statistical procedures have a ratio of 0.83
(95% CI 0.80 – 1.20, P = 0.048) citations per article relative to those with extended
description. Other reporting quality or statistical analysis variables were not associated
with citation frequency.
and analysis on citation frequency.
there is a difference in the number of citations received by papers with (i) statistical
errors that potentially affect the study results and (ii) papers with reporting failures.
To this end, a combined variable "Presence of errors potentially affecting the study
results" was defined using the last four variables given in Table 4. It takes the value "yes" if there is an inappropriate analysis or overuse of tests,
and "no" if there is a complete description of the procedures, a complete and appropriate
analysis, and no overuse of tests. In all other cases, it takes the value "undetermined".
The negative binomial regression model was then with journal and this new variable
as the only covariates. The results showed no evidence of an association between this
new variable and citation. Arguably, this is unsurprising as this new variable effectively
dilutes the association shown in Table 4.
included sample size calculations, power analysis or any other justification for the
sample size, contrary to the CONSORT  and STROBE  guidelines for reporting research.
journals. NJP is not included due to low number of citations. There was no statistically
significant evidence of preferential citation of studies with large sample size (p-value
of Kruskal-Wallis test > 0.05 in each journal).
reporting and analysis and the number of citations it received. In this set of articles,
failing to state essential information, such as the primary research question or the
primary outcome variable did not affect the number of citations the article received.
However, a sufficient description of the methods used was an important factor in increasing
the number of citations received in two of the four journals. Statistical errors and
sample size were not associated with number of citations received. Reporting quality
was associated with the journal visibility and prestige.
there was no correlation between number of citations and expert ratings of article
quality. Callaham et al  examined a cohort of published articles originally submitted to an emergency medicine
meeting and also reported that the impact factor of the publishing journal, not the
peer rating quality of the research, was the strongest predictor of citations per
year. Our findings concerning the statistical quality are in line with these findings.
in the report is obvious, but such a statement was often (in 34.6% of papers) missing.
In these cases, the results cannot be interpreted in light of a priori hypotheses.
Further, unless the research question is clearly stated, the appropriateness of the
study design, data collection methods and statistical procedures cannot be judged.
For other researchers to cite the paper, however, it does not appear to matter whether
the initial purpose of the cited study was clear, or whether the analyses are exploratory
of the primary response or outcome variable. Although it is valuable for medical studies
to evaluate several aspects of patients' responses, it is important to identify a
small set of primary outcome or response variables in advance . It is also important that the results for primary responses (including any non-significant
findings) are fully reported . Focusing on clearly stated primary response measure(s) helps both the investigators
to write an understandable and compact report and the readers to evaluate the findings.
Again, though, our results indicate that having an unclear primary response or outcome
variable does not lower the citation count and so does not appear to restrain other
researchers from using the paper.
association was more marked in papers published in AJP and BJP. In our sample, documentation
of statistical methods used was generally sufficient in AGP (92.2%), consistent with
the editorial policy of the journal which requires an extended methods section in
and visibility. By involving several journals we were able to control for the effect
of journal visibility on the number of citations received and compare the prestige
of a journal with the quality of statistical presentation. The reporting of statistical
information was more detailed, comprehensive and useful for the reader in the two
leading journals (AGP and AJP). Again, this is consistent with their detailed guidelines
for presenting statistical results, and also a more rigorous review process, including
extensive statistical reviewing . In low-impact journals the peer review is undoubtedly less thorough [6,18]. Thus our results provide an important confirmation, for editors, authors and consumers
of research, on the value of guidelines and rigorous statistical reviewing.
even those published in 'high -prestige' journals –, are not statistically faultless
[6,8,19,20]. Our findings are in line with these studies, and also demonstrate inadequate reporting
of research methods and hypotheses. However, most of the statistical problems in medical
papers are probably relatively unimportant or more a matter of judgment. As there
is also no general agreement on what constitutes a statistical error, the comparison
of different statistical reviews is difficult [8,21]. There may be several valid ways of analyzing a data set.
studies . Our data does not support this hypothesis: sample size was not associated with the
frequency of citations. Callaham et al  came to the same conclusion when they analyzed a set of emergency medicine articles.
Textbooks of medical statistics require that the sample size should be large enough
(or as large as possible) and that some justification for the size chosen should be
given . Unfortunately, our results suggest the concept of sample size calculations seems
to be almost unknown in psychiatric research outside the field of clinical trials;
less than 4 % of the evaluated articles included sample size calculations, power analysis
or any other justification for the sample size.
statistical analysis were not associated with the number of citations. The journal
in which a study is published appears to be as important as the statistical reporting
quality in ensuring dissemination of published medical science [2,24]. A highly visible publication may therefore attract more attention, even if the results
are poorly and obscurely reported. Thus, the quality of statistical reporting is often
not important in the subsequent update of an article. Rather, if a study is highly
cited it reflects a strong active interest in the question addressed in the scientific
information in the journal articles reviewed here are related to topics included in
most introductory medical statistics books. Some of these errors are serious enough
to call the author's conclusions into question. It seems strange that a problem seemingly
so important, so wide spread and so long-standing should continue [6,9]. Possible explanations are that (1) much research is done without the benefit of
anyone with adequate training in quantitative research methods , (2) copying of inappropriate methods is usual  or (3) the statistical component of the peer review process is not common or sufficiently
valued by editors . Our study suggests another possible contributory factor. Editors and authors are
often partially motivated by the desire to publish papers that will be highly cited
and, while the methodological quality of published original research articles does
not appear to relate to their uptake in the literature, poor reporting and errors
in the analysis are likely to continue.
and wrote the paper. GR contributed to the data collection, the statistical analysis
and writing of the paper. JC contributed to the statistical analysis and writing of
the paper. MS initiated the study project, coordinated the collection of material
and contributed to the writing of the manuscript. All authors read and approved the
FOR534, AN 365/2-1)
Egghe L, Rousseau R: Introduction to informetrics. Quantitative methods in library, documentation and information
science. Amsterdam: Elsevier; 1990.
Moed HF: Citation analysis in research evaluation. Dordrecht: Springer; 2005.
Horton NJ, Switzer SS: Statistical methods in the journal.
N Engl J Med 2005, 353:1977-1979. PubMed Abstract | Publisher Full Text
Miettunen J, Nieminen P, Isohanni I: Statistical methodology in major general psychiatric journals.
Nord J Psychiatry 2002, 56:223-228. PubMed Abstract | Publisher Full Text
Lang T: Twenty statistical errors even you can find in biomedical research articles.
Croat Med J 2004, 45:361-370. PubMed Abstract | Publisher Full Text
Altman DG: Poor-quality medical research: what can journals do?
JAMA 2002, 287:2765-2767. PubMed Abstract | Publisher Full Text
Jamart J: Statistical tests in medical research.
Acta Oncol 1992, 37:723-727.
Altman DG: Statistical reviewing for medical journals.
Stat Med 1998, 17:2661-2674. PubMed Abstract | Publisher Full Text
Lang T, Secic M: How to report statistics in medicine. Philadelphia: American College of Physicians; 1997.
Song F, Eastwood AJ, Gilbody S, Duley L, Sutton AJ: Publication and related biases.
Health Technol Assess 2000., 4(10)
Glänzel W, Thjis B, Schlemmer B: A bibliometric approach to the author self-citations in scientific communication.
Scientometrics 2004, 59:63-77. Publisher Full Text
Moher D, Schulz KF, Altman DG: The CONSORT statement: revised recommendations for improving the quality of reports
of parallel group randomized trials.
BMC Medical Research Methodology 2001, 1:2. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text
STOBE Statement. Strengthening the reporting of observational studies in epidemiology [http://www.strobe-statement.org/] webcite
West R, Mcllwaine A: What do citation counts count for in the field of addiction? An empirical evaluation
of citation counts and their link with peer ratings of quality.
Addiction 2002, 97:501-504. PubMed Abstract | Publisher Full Text
Callaham M, Wears RL, Weber E: Journal prestige, publication bias, and other characteristics associated with citation
of published studies in peer-reviewed journals.
JAMA 2002, 287:2847-2850. PubMed Abstract | Publisher Full Text
Johnson T: Clinical trials in psychiatry: background and statistical perspective.
Stat Methods Med Res 1998, 7:209-234. PubMed Abstract | Publisher Full Text
Goodman SN, Altman DG, George SL: Statistical reviewing policies of medical journals: caveat lector?
J Gen Intern Med 1998, 13:753-756. PubMed Abstract | Publisher Full Text | PubMed Central Full Text
Lee KP, Schotland M, Bacchetti P, Bero LA: Association of journal quality indicators with methodological quality of clinical
JAMA 2002, 287:2805-2808. PubMed Abstract | Publisher Full Text
McGuigan SM: The use of statistics in the British Journal of Psychiatry.
Br J Psychiatry 1995, 167:683-688. PubMed Abstract
Olsen CH: Review of the use of statistics in infection and immunity.
Infect Immun 2003, 71:6689-6692. PubMed Abstract | Publisher Full Text | PubMed Central Full Text
McGuigan SM: The use of statistics in the Br J Psychiatry.
Br J Psychiatry 1995, 167:683-688. PubMed Abstract
Peritz BC: On the Heuristic Value of Scientific Publications and Their Design – A Citation Analysis
of Some Clinical-Trials.
Scientometrics 1994, 30:175-186. Publisher Full Text
Armitage P, Berry G, Matthews JNS: Statistical methods in medical research. Oxford: Blackwell Science; 2002.
Garfield E: Which medical journals have the greatest impact?
Ann Int Med 1986, 105:313-320. PubMed Abstract
Ioannidis JP: Contradicted and initially stronger effects in highly cited clinical research.
JAMA 2005, 294:218-228. PubMed Abstract | Publisher Full Text
Altman DG, Goodman SN, Schroter S: How statistical expertise is used in medical research.
JAMA 2002, 287:2817-2820. PubMed Abstract | Publisher Full Text
BMC Medical Research Methodology | Full text | The relationship between quality of research and citation frequency