In order to improve the quality of systematic researches, various tools have been developed by well-known scientific institutes sporadically. Dr. Nader Ale Ebrahim has collected these sporadic tools under one roof in a collection named “Research Tool Box”. The toolbox contains over 720 tools so far, classified in 4 main categories: Literature-review, Writing a paper, Targeting suitable journals, as well as Enhancing visibility and impact factor.
Cobaltmetrics was first released in January 2018, just in time for PIDapalooza
in Girona. Five months and 78 million documents later, it is time to
reflect on where we stand in the altmetrics movement and what we want to
push for.
Are altmetrics providers alt- enough?
Altmetrics,
for the uninitiated, were designed to complement traditional,
journal-to-journal citation metrics and provide the scientific community
with better proxies for scientific impact.
When Jason Priem coined the term in 2010, diversity was part of the message:
Diversity
of measures has been and remains the main goal of the altmetrics
movement. But what about the underlying data? What do you gain by
changing the statistic if the sample remains biased?
Our work on altmetrics stemmed from a simple observation: existing altmetrics providers are not alt-
enough. While they process significant amounts of data, they operate on
a very specific subset of the global scientific production:
Target languages:less than one percent
of the world population speak English as their first language, yet
altmetrics providers tend to ignore content in languages other than
English, or they only support a handful of languages. For example,
Altmetric only monitors Wikipedia in English, Finnish, and Swedish, and PlumX Metrics recently made its first step on the path to multilingualism by adding Wikipedia in Portuguese and Spanish.
Advances in natural language processing, however, make massively
multilingual text mining more efficient than ever and, whenever
identifiers or URLs are used, extracting citations becomes mostly
language-independent. In any case, algorithmic complexity cannot be used
as an excuse for a lack of linguistic diversity in scientometrics.
Anglo-centrism is prejudicial to science—see this note by Vladimir Lazarev & Serhii Nazarovets for another recent example—and we must fight it.
Target documents:
a recent study by Martin Klein et al. has found that preprints are
largely indistinguishable from the versions that appear in academic
journals—you can even compare the preprint and the final version
of this study, so meta—yet existing altmetrics providers treat non
peer-reviewed documents as second-class citizens. Altmetric, for
example, can only merge citations for a publication and the
corresponding preprint if the preprint was not assigned a DOI.
The value of preprints is now recognized, and that case will soon be
closed. But, moving forward, will we need to have the same conversation
about every new type of document? It is not up to altmetrics providers
to decide what is citable. What about patents, trademarks, clinical
trials, or law articles? What about non-textual digital objects like
datasets, software, videos, etc.?
Credit where credit is due, altmetrics providers like Altmetric, ImpactStory, and Plum Analytics
are great projects. They have profoundly changed the way all
stakeholders in science and research policy think of scientific impact,
and they have paved the way for new efforts. But diversity is good, and
we think we can do even better by joining forces on different
challenges.
How is Cobaltmetrics different?
Projects like Altmetric, ImpactStory and Plum Analytics focus on the -metrics side of altmetrics: they provide scores, rankings, and badges. On the other hand, Cobaltmetrics focuses on the alt- part of altmetrics: by gathering data about alt-citations using alt-identifiers in alt-documents written in alt-languages, we aim to solve selection effects and the lack of diversity in scientometrics.
Our
mission is to provide data. We provide citation data that is clean,
stable, and reproducible, and can thus be used not only to compute
impact metrics, but also to build knowledge graphs, train statistical
models, test recommendation systems, and do a lot of other things.
Therefore, Cobaltmetrics is more similar to Crossref Event Data than to any other altmetrics project. To quote Joe Wass’ thoughts on Crossref Event Data:
I should make clear that [we] are not in the business of making metrics.
The services that we were compared to, Altmetric.com and Plum
Analytics, collect the same kind of data, but their ultimate aim is
metrics. Our purpose begins and ends with collecting this underlying
data so that anyone can analyze it. It could be used to make metrics,
but could be used for a lot more other purposes besides.
With Cobaltmetrics, we strive to go deeper than other altmetrics providers:
We
collect citations and backlinks from documents in all languages, and we
have validated the methodology by crawling data from Wikipedia in more
than 180 languages.
We
collect citations and backlinks to all types of documents and digital
objects, including but not limited to scientific publications, books,
patents, trademarks, clinical trials, financial statements, security
vulnerabilities, social media posts, software, videos, etc.
We
collect citations and backlinks to both canonical and non-canonical
URIs. End-users cannot be expected to know whether a given identifier is
persistent, or whether a given URI is canonical. Citations can also be
hidden behind shortened URLs, and different databases will use different
identifiers for the same document. We want our users to be able to copy
any URI into Cobaltmetrics and defer to us for the heavy-lifting.
We currently index 78 million documents and 55 million citations and backlinks extracted from data made available by Hypothesis, StackExchange (all sites), and Wikimedia
(all projects and languages). We mine data in 180+ languages, we unroll
shortened URLs from 175+ shorteners, we crack open URLs to extract
persistent identifiers, and we convert between 50+ types of identifiers.
Our
search engine is powered by a knowledge base that already includes more
than 7 billion groups of identifiers known to be equivalent. It
includes data from trusted sources like Wikidata and PubMed,
but also linked data made available by publishers and content creators.
The knowledge base is used to automagically enrich your queries, try it out!
Where do we go from here?
Cobaltmetrics is still very young, and so is Thunken. We have already opened our public API
for the sake of openness and transparency. Although we have put limits
and quotas on API requests to prevent abuse and ensure availability, we
are committed to providing a free plan for general and reasonable usage.
We also decided early on to release early and release often, so here is a list of the next big features:
New data sources:
not everything that counts can be counted, and not everything that can
be counted counts, but we have seen that altmetrics are a sampling game,
and there are many types of digital objects that we want to track.
Stable releases:
reproducibility is a potential issue with web-based services, and we
need to guarantee that any two users who query Cobaltmetrics with the
same query on the same day will retrieve the same results. Rather than
updating every source on a rolling basis, we plan to build and release a
stable index and a stable knowledge base every month.
Self-reporting table:
the NISO-sponsored working group on altmetrics recommend that
altmetrics providers release a self-reporting table on data quality (cf.
NISO RP-25–201X-3).
We are working on a tool that, before each release, inspects our code
and our data to document how the data was aggregated, how it can be
accessed, and how quality is monitored.
No comments:
Post a Comment