Source: https://scholarlykitchen.sspnet.org/2019/09/25/fighting-citation-pollution/
Fighting Citation Pollution — The Challenge of Detecting Fraudulent Journals in Works Cited
Editor’s Note: Today’s post is by Lisa Janicke Hinchliffe and Michael Clarke.
As citations to articles in fraudulent journals increasingly appear in article manuscripts, vexing reviewers and editors alike, the scholarly communications community needs to develop an automated shared service to assess works cited efficiently and ensure that authors are not inadvertently polluting the scholarly record.
CrossRef’s recent decision to rescind the membership of OMICS brings the issue of citation pollution into sharp relief. The decision comes in the wake of $50 million fine levied against the publisher by the US Federal Trade Commission in a summary judgement earlier this year. Now CrossRef is freezing OMICS out of its ecosystem. While DOIs already deposited will remain active, OMICS will no longer be able to deposit DOIs via CrossRef.
CrossRef is not the only organization to grapple with this issue. The Scientist reported in May on growing concerns among researchers about papers from fraudulent publishers finding their way into PubMed via PubMedCentral. Once in PubMed, the papers appear just like any other paper and can easily be found and cited by researchers.
While the extent of the fraudulent and deceptive journal publishing practices in scholarly publishing is not fully known, it is perceived as a substantial and growing problem. There are, for example, over 10,000 journals on the Cabel’s blacklist. (Let’s pause to let that number sink in: over 10,000 journals.) While some of what is published in these 10,000-plus journals is undoubtedly methodologically sound scholarship (an inference based simply on the volume of papers we are talking about), other articles are at best questionable science and at worst outright fraud. Separating the methodologically sound from the fraudulent would be a Herculean challenge (analogies to vanquishing the Lernaean Hydra or cleaning the Augean stables seem apropos), so what are citing researchers, and the legitimate journals they publish in, to do?
Authors and editors who wish to avoid giving citations to fraudulent publications are in the position of having to track which journals engage in fraudulent practices. This is difficult due to the sheer number of such journals and the fact that many fraudulent journal titles are deliberately chosen to closely mirror those of legitimate publications. While manual checks by authors and copyeditors against whitelists and blacklists are possible, such approaches are time-consuming and costly. Further, copyediting practices vary widely among publishers and even among journals at the same publisher. While some journals closely review citations, others simply edit details to conform with the journal’s style format.
Spending any time seriously considering this challenge leads one to see there is clearly a need for a scalable, easily adopted, and industry-wide approach to the problem of citations to articles in fraudulent journals appearing in author manuscripts.
We suggest that what could meet this need is a “meta journal look-up service” that could be accessed via API by the production systems and editing tools used by publishers. In reference to the labors of the ancient Greek hero, we propose calling such a system “HYDRA” for High-frequencY Fraud Detection Reference Application.
How HYDRA could work is as follows. A manuscript would be submitted to a publisher’s production or copy editing system. As part of that ingest process, the list of cited journals would be sent to HYDRA in the form of an API query. HYDRA would then return a list of which whitelists each cited journal appears on. So, for each citation in a manuscript, HYDRA would return something like “Journal X is indexed by Web of Science, Journal Citation Reports, Scopus, DOAJ, MEDLINE” and so on. It could include subject lists as well, e.g., EconLit, PsycINFO, MLA, GeoRef, Inspec, and so forth. HYDRA could further allow publishers to maintain their own whitelists that would be incorporated into query results; this might include regional titles and niche publications that do not appear on other whitelists. Such a look up process could also bring back which blacklists a cited journal appears on. By querying multiple lists, HYDRA would avoid over-reliance on a single authority and allow for a more nuanced assessment of a given journal title.
If a journal does not appear on any whitelists — or if it appears on any blacklists — a query to the author could be automatically generated (as a component of the author submission or proof review process) asking the author to justify the citation. Journals might adopt a simple editorial policy: If a reference is not included on certain whitelists (which might vary by journal and might include publisher-maintained regional lists), then authors must justify the citation to the satisfaction of the editor. For example, in writing about fraudulent publications, it may be necessary to cite them!
As HYDRA would be providing a simple look-up service, it could be embedded into any number of tools and applications in the scholarly workflow. This might include authoring tools and manuscript submission systems, for example. HYDRA might also have a simple web look-up that anyone could use. This might even be used by authors to validate that a journal they are considering submitting an article to is on well-known whitelists or to find out if it is on any blacklists.
This approach would not require too much in the way of new infrastructure or the creation of new lists. It would require, however, that the various whitelists allow HYDRA to make an API call, for free or through some sort of license model, and return a validation that a given journal is on a list (or that it is not). HYDRA would therefore not store any information from any whitelists — it would simply act as a kind of switchboard. It would be, in other words, a look-up, not a meta-index. And the look-up need not contain any additional information from the lists — only the fact that the journal appears on them (or does not). This enables any subscription-based whitelists/blacklists to preserve much of the value of their products while contributing data to HYDRA, which in a way serves as marketing for the fuller data and services of the subscription products.
The development and industry-wide adoption of a service like HYDRA could go a long way toward keeping citations to articles in fraudulent journals from polluting the scholarly record. It would also go a long way toward educating authors, editors, and others about the problem. The simplicity of the service makes it easy to adopt both technologically and socially. The costs of developing and maintaining such a service should be minimal and could be supported via a modest fee for API access (website look-up, a more manual process provided for the individual author or very small publisher, would ideally be free).
This idea is ripe for a collaborative development approach, perhaps undertaken by an existing infrastructure organization. We offer this idea with the acknowledgment that it is not fully detailed (e.g., how to handle citations to sources other than articles, should it be extended to flag retractions, etc.). We hope that it will inspire conversation and, perhaps, action.
***
Note: We wish to acknowledge that the idea for HYDRA was born in response to a post from Margaret Winker on the OSI listserv that asked: “Authors cite research published in what may be predatory journals. Should a journal refuse to allow the citation(s)? And if so, what does that look like?” Though the full extent to which citations to articles in fraudulent journals are entering the scholarly record is not well documented, the OSI discussion revealed that this is a problem of great concern for journal editors and publishers that has elided easy resolution through manual review processes.
As citations to articles in fraudulent journals increasingly appear in article manuscripts, vexing reviewers and editors alike, the scholarly communications community needs to develop an automated shared service to assess works cited efficiently and ensure that authors are not inadvertently polluting the scholarly record.
CrossRef’s recent decision to rescind the membership of OMICS brings the issue of citation pollution into sharp relief. The decision comes in the wake of $50 million fine levied against the publisher by the US Federal Trade Commission in a summary judgement earlier this year. Now CrossRef is freezing OMICS out of its ecosystem. While DOIs already deposited will remain active, OMICS will no longer be able to deposit DOIs via CrossRef.
CrossRef is not the only organization to grapple with this issue. The Scientist reported in May on growing concerns among researchers about papers from fraudulent publishers finding their way into PubMed via PubMedCentral. Once in PubMed, the papers appear just like any other paper and can easily be found and cited by researchers.
While the extent of the fraudulent and deceptive journal publishing practices in scholarly publishing is not fully known, it is perceived as a substantial and growing problem. There are, for example, over 10,000 journals on the Cabel’s blacklist. (Let’s pause to let that number sink in: over 10,000 journals.) While some of what is published in these 10,000-plus journals is undoubtedly methodologically sound scholarship (an inference based simply on the volume of papers we are talking about), other articles are at best questionable science and at worst outright fraud. Separating the methodologically sound from the fraudulent would be a Herculean challenge (analogies to vanquishing the Lernaean Hydra or cleaning the Augean stables seem apropos), so what are citing researchers, and the legitimate journals they publish in, to do?
Authors and editors who wish to avoid giving citations to fraudulent publications are in the position of having to track which journals engage in fraudulent practices. This is difficult due to the sheer number of such journals and the fact that many fraudulent journal titles are deliberately chosen to closely mirror those of legitimate publications. While manual checks by authors and copyeditors against whitelists and blacklists are possible, such approaches are time-consuming and costly. Further, copyediting practices vary widely among publishers and even among journals at the same publisher. While some journals closely review citations, others simply edit details to conform with the journal’s style format.
Spending any time seriously considering this challenge leads one to see there is clearly a need for a scalable, easily adopted, and industry-wide approach to the problem of citations to articles in fraudulent journals appearing in author manuscripts.
We suggest that what could meet this need is a “meta journal look-up service” that could be accessed via API by the production systems and editing tools used by publishers. In reference to the labors of the ancient Greek hero, we propose calling such a system “HYDRA” for High-frequencY Fraud Detection Reference Application.
How HYDRA could work is as follows. A manuscript would be submitted to a publisher’s production or copy editing system. As part of that ingest process, the list of cited journals would be sent to HYDRA in the form of an API query. HYDRA would then return a list of which whitelists each cited journal appears on. So, for each citation in a manuscript, HYDRA would return something like “Journal X is indexed by Web of Science, Journal Citation Reports, Scopus, DOAJ, MEDLINE” and so on. It could include subject lists as well, e.g., EconLit, PsycINFO, MLA, GeoRef, Inspec, and so forth. HYDRA could further allow publishers to maintain their own whitelists that would be incorporated into query results; this might include regional titles and niche publications that do not appear on other whitelists. Such a look up process could also bring back which blacklists a cited journal appears on. By querying multiple lists, HYDRA would avoid over-reliance on a single authority and allow for a more nuanced assessment of a given journal title.
If a journal does not appear on any whitelists — or if it appears on any blacklists — a query to the author could be automatically generated (as a component of the author submission or proof review process) asking the author to justify the citation. Journals might adopt a simple editorial policy: If a reference is not included on certain whitelists (which might vary by journal and might include publisher-maintained regional lists), then authors must justify the citation to the satisfaction of the editor. For example, in writing about fraudulent publications, it may be necessary to cite them!
As HYDRA would be providing a simple look-up service, it could be embedded into any number of tools and applications in the scholarly workflow. This might include authoring tools and manuscript submission systems, for example. HYDRA might also have a simple web look-up that anyone could use. This might even be used by authors to validate that a journal they are considering submitting an article to is on well-known whitelists or to find out if it is on any blacklists.
This approach would not require too much in the way of new infrastructure or the creation of new lists. It would require, however, that the various whitelists allow HYDRA to make an API call, for free or through some sort of license model, and return a validation that a given journal is on a list (or that it is not). HYDRA would therefore not store any information from any whitelists — it would simply act as a kind of switchboard. It would be, in other words, a look-up, not a meta-index. And the look-up need not contain any additional information from the lists — only the fact that the journal appears on them (or does not). This enables any subscription-based whitelists/blacklists to preserve much of the value of their products while contributing data to HYDRA, which in a way serves as marketing for the fuller data and services of the subscription products.
The development and industry-wide adoption of a service like HYDRA could go a long way toward keeping citations to articles in fraudulent journals from polluting the scholarly record. It would also go a long way toward educating authors, editors, and others about the problem. The simplicity of the service makes it easy to adopt both technologically and socially. The costs of developing and maintaining such a service should be minimal and could be supported via a modest fee for API access (website look-up, a more manual process provided for the individual author or very small publisher, would ideally be free).
This idea is ripe for a collaborative development approach, perhaps undertaken by an existing infrastructure organization. We offer this idea with the acknowledgment that it is not fully detailed (e.g., how to handle citations to sources other than articles, should it be extended to flag retractions, etc.). We hope that it will inspire conversation and, perhaps, action.
***
Note: We wish to acknowledge that the idea for HYDRA was born in response to a post from Margaret Winker on the OSI listserv that asked: “Authors cite research published in what may be predatory journals. Should a journal refuse to allow the citation(s)? And if so, what does that look like?” Though the full extent to which citations to articles in fraudulent journals are entering the scholarly record is not well documented, the OSI discussion revealed that this is a problem of great concern for journal editors and publishers that has elided easy resolution through manual review processes.
No comments:
Post a Comment