Sunday, 24 November 2019

Increase research visibility


Increase research visibility

Ways to increase research visibility

To attract the attention it deserves, your high quality research needs to be easy to find. Research shows that the activities listed can improve discoverability and increase citations. They are not intended to be a checklist. Some activities may be more appropriate for your discipline than others.

Starting your research

Get an ORCiD. Use it proactively to connect to your research outputs and increase discoverability. Make sure you use it whenever you share your work, for example when you submit a manuscript or publish a dataset. Put it on your personal profiles on the web and social media. This will connect you to your research across the web whatever search tool people are using.
Identifying the specific topics within your field that generate a high level of interest can help to attract future collaborators and in turn increase visibility and impact.
Consider your collaborations. Research shows that papers with more than one author receive more citations. International collaborations can be particularly valuable. Approaches to collaboration will differ between disciplines.

During your research

Present research findings at conferences and, where appropriate, at international congresses. Attending such events also provides opportunities for networking and developing new collaborations.
Use social media and networks to promote your research and reach a wider audience. This can include an audience beyond your own discipline. Examples include ResearchGate, and Twitter
Curate your data and consider what you will share during and at the end of your research.

When you decide where to publish (journal articles)

Who is your audience and what is the best way to reach them? Are you looking to influence policy? Do you want to reach specialists outside of your own discipline? Are you looking to publish in an established journal or would alternative venues offer a better fit?
These resources can help you decide where to publish:
  • Journal Citation Reports (JCR). Allows you to compare journals indexed in the Web of Science using citation data. JCR can show you the highest impact journals in your field.
  • SCImago Journal and Country Rank (SJR) is a free database based on Scopus data. It provides a prestige metric (SJR) based on the quality and reputation of journals. It is included in Scopus Journal Metrics.
  • Scopus "compare sources" allows you to sort journal titles in a subject area by impact values. Use it to identify high impact journals within the Scopus database. This tool allows you to compare journals according the CiteScore, SJR or SNIP (Source Normalized Impact per Paper).
Consider the open access options offered by your chosen journal. What are their article processing charges (or other publication fees)? Do they permit you to deposit your author accepted manuscript in a repository? If it is a hybrid journal is it part of a Plan-S transitional agreement?
How discoverable is your chosen journal? Is it indexed by major databases like Web of Science and Scopus? Does it allocate a Digital Object Identifier (DOI)? A DOI ensures a persistent link that can be cited and tracked.

When you publish

Wherever possible, ensure your research output has a clear title that is direct and concise. This can help promote readership and attract citations from beyond your own discipline.
Think about writing a plain language summary of your research that is understandable to a non-specialist audience. This could be members of the public or researchers from other disciplines. Kudos is a good free tool to publish these or you can set up your own blog. You could also write for The Conversation.
Carefully consider your keywords and include them in your abstract and full text. This will help a reader better understand your content. It may also improve the visibility of your research in search results.
Consider your references. Authors you cite may become future collaborators and may cite you in turn.
Be consistent with your chosen name format so that all relevant papers can be attributed to you. For example, don’t use Elizabeth Jones in one paper and Liz Jones or L M Jones in another. Using your ORCID will also help with this.
Include the standard institutional affiliation “University of Leeds” on all research outputs. Avoid using abbreviations or only including your School, Faculty or research group.
Add your research outputs to Symplectic as soon as possible after acceptance. They will be made open access via the White Rose Research Online repository in line with any embargo

After your research

Make your data findable and citeable from a repository. This can lead back to your original research or to your other work. Use to identify a suitable service for your discipline. The Research Data Leeds repository is available to all Leeds staff.
Find out more about research data management
Consider publishing in a data journal. Practice around disseminating research data is evolving. Data journals offer another route to making data more discoverable and citable. They are often based on data deposited in a data repository and involve some level of peer review.
The University of Edinburgh maintain a list of data journals and their policies.
Continue to disseminate your research outputs on social media. Track social engagement with the Altmetric "donut" on the publication record in White Rose Research Online.

World University Rankings 2020 by subject: physical sciences

How My Recent LinkedIn Post Received Over 6,000 Views

Saturday, 23 November 2019

Ale Ebrahim, Nader (2019): Research Skills Session 7: Indexing Research Tools. figshare. Presentation.

Wednesday, 6 November 2019

A Crisis in “Open Access”: Should Communication Scholarly Outputs Take 77 Years to Become Open Access?

This study diachronically investigates the trend of the “open access” in the Web of Science (WoS) category of “communication.” To evaluate the trend, data were collected from 184 categories of WoS from 1980 to 2017. A total of 87,997,893 documents were obtained, of which 95,304 (0.10%) were in the category of “communication.” In average, 4.24% of the documents in all 184 categories were open access. While in communication, it was 3.29%, which ranked communication 116 out of 184. An Open Access Index (OAI) was developed to predict the trend of open access in communication. Based on the OAI, communication needs 77 years to fully reach open access, which undeniably can be considered as “crisis in scientific publishing” in this field. Given this stunning information, it is the time for a global call for “open access” by communication scholars across the world. Future research should investigate whether the current business models of publications in communication scholarships are encouraging open access or pose unnecessary restrictions on knowledge development.

Tuesday, 5 November 2019

Monday, 4 November 2019

Generating biomedical abstracts with GPT-2


Generating biomedical abstracts with GPT-2

Nov 1 · 6 min read

An example from Healx’s Healnet graph, powered by our NLP tools.


  • We have developed a scientific abstract generator, available at
  • Our abstract generator is powered by OpenAI’s GPT-2 model and fine-tuned on PubMed abstracts.
  • In developing this model, we wanted to better understand GPT-2, as well as its strengths and limitations in understanding language from the biomedical domain.
At Healx, we’re constantly on the lookout for biomedical knowledge we can extract from raw text to assist us in generating our hypotheses towards discovering new treatments for rare diseases. Extracting knowledge from text is inevitably linked to Natural Language Understanding, which is, in and of itself, challenging.
Add to that the complex and constantly-changing biomedical jargon we work with, and you can see how we’re always asking ourselves what the best way is to build strong models that extract entities or relations? And: how do we deal with noise, ambiguities or concept drift over time?
So, a few months ago, when OpenAI’s researchers released their much-discussed language model, known as GPT-2, capable of convincingly generating text, we absolutely wanted to get our hands on it. Go to to try out the scientific abstract generator we developed, using GPT-2.
Given title: Effect of garlic consumption in systemic lupus erythematosus.
Generated abstract
OBJECTIVES: To determine the effect of garlic consumption in systemic lupus erythematosus (SLE) and to evaluate the safety of garlic consumption. METHODS: This was a single-center, randomized, controlled clinical trial. We randomly assigned patients with SLE to receive garlic (G; n = 30) or placebo (P; n = 30) for 12 mo. The primary endpoint was the percentage of patients with an improvement of SLE activity (SLEDAI-2K), which is a composite measure of SLE activity.RESULTS: The G and P groups did not differ significantly at baseline or at 12 mo. At 12 mo, the G and P groups showed no difference in improvement of the SLEDAI-2K (mean difference -0.1, 95% CI: -2.4 to 2.4; P = 0.8). The G and P groups also had significant reductions in the number of adverse events (P < 0.001). No severe adverse events were recorded.CONCLUSIONS: In this study of SLE patients receiving G (n = 30) or P (n = 30), the consumption of garlic for 12 mo did not improve their disease or reduce adverse events compared with the control.
What is GPT-2?
GPT-2 is the second version of OpenAI’s Generative Pretrained Transformer (GPT) model. OpenAI used the new Transformer neural architecture as its building block to build a complex language model that was pre-trained on huge amounts of texts, and with the objective of being able to generate new content. The model can be fine-tuned on new data to solve particular tasks.
But GPT-2 has been just one milestone in a year full of advances in NLP. These advances have mainly come on the back of the Transformer and crazy amounts of hardware horsepower, and have led to one language model after another raising the bar in Natural Language Understanding tasks. First it was GPT, then BERT, followed by GPT-2, ERNIE, XLNet, RoBERTa and most lately ALBERT. Apparently, there are more to come, with even fancier names, though hopefully not from Sesame Street.
And just to understand the extent of these advances, the best performance on the RACE reading comprehension task, rose to 89.4% (Google’s ALBERT in Oct 2019) from 44.1% in 2017.
Unlike the other models, GPT-2 and its predecessor GPT were pre-trained with a focus on generating text rather than solving supervised learning tasks.
What we’ve been doing with GPT-2
In fact, GPT-2 delivered alarmingly good results — so good that OpenAI took the controversial decision to only gradually releasing publicly its models, allegedly to prevent misuse and to better prepare the community for the challenges it brought about.
We initially experimented with the smaller model (124M parameters), but were not happy with the results. But when the third model (774M parameters) became available, we were ready to put our plan into action: we gathered slightly more than 1% (half a million) of PubMed abstracts, and fine-tuned the model on them.
Our goal was to observe how well the model would learn language structure and be able to replicate it. We also wanted to see how novel the generation would be and, a more ambitious goal, to see to what extent the digested abstracts would actually lead the model to learn plausible facts.
What we’ve learned
At the very least, this whole experience has been enlightening with regards to the capabilities of GPT-2, as well as the pitfalls to avoid when fine-tuning or using it, so we decided to share some details here to help other researchers or users in their experiments with it.
For our experiments, we fed titles into our model from articles that had already been published on PubMed and inspected the results:
  • The model can successfully follow its “train of thought” across sentences.
  • It successfully learned abbreviations and sometimes generated new, meaningful ones.
  • It learned to properly structure an abstract into sections, such as Objectives, Methods and Conclusions, as often found in biomedical abstracts. That could be interesting in terms of summarising or extracting factual data.
  • It’s not that good at maths:
…The study was carried out with 1250 participants, of whom 728 and 743 were children and adults, respectively, from Spain…
  • On multiple occasions, it understood ranges of numbers:
…We studied 12 patients with a median age of 44.8 years (range, 21.3–58.1 years); most patients were female (71.4%), were white (87.5%), and had a mean AP duration of 15.9 days (range, 8–23 days). CVVHF was performed for a median of 19.0 hours (range, 8–30.0 hours)….
  • It can successfully come up with new drugs. Well, sort of:
Intravenous and intraperitoneal administration of nafenib improves muscle mass and function in rats with cerebral vasospasm.BACKGROUND: Nafenib, an oral, cytotoxic, cysteine-rich cysteine protease inhibitor, was tested in animal models of cerebral vasospasm by treating animals with nafenib intravascularly or intraparenterally.
The last example comes from an unconditionally generated article. Either way, we’d be really interested in knowing the chemical structure of “nafenib”!
GPT-2 in application
Despite its surprising results, GPT-2 has been relatively little used so far. Tabnine has used it for code auto-completion. Recently researchers used it for summarization, showing that careful fine-tuning can lead to competitive results. Another interesting application has been in creating templates for medical education tests.
We believe that GPT-2’s deeper implications and strengths, as well as its full potential, are not yet fully understood. We also firmly believe that a number of applications will eventually benefit from it, such as biomedical question answering or summarization.
And we believe that the key in all this will be clever fine-tuning. We’re using the scientific abstract generator we created to improve our understanding of the millions of abstracts we process. And, as we said: we process these abstracts to extract biomedical knowledge from them, because, ultimately, everything we do is towards advancing treatments for rare diseases.
In the meantime, we’re getting ready for the next state-of-the-art model.

Technical stuff

  • To fine-tune our model we used gpt-2-simple and to make the 774M model fit into our 16GB GPUs, we used the much useful comments on that Github issue.
  • We tried different parameters, but eventually ended up with a learning rate of 2e-5, only one epoch pass over the data and a maximum sequence length of 256 (that was necessary due to 774 model’s memory restrictions).
  • Fine-tuning on a sole GPU V100 16GB RAM took around 6 days.
Written by

NLP scientist at Because every patient deserves a treatment

We’re an AI-powered & patient-inspired technology company, accelerating the development of treatments for rare diseases. Our AI platform leverages public and proprietary biomedical data and features the world’s leading knowledge graph for rare diseases.

Saturday, 2 November 2019

Indicators of academic productivity in University rankings: criteria and methodologies


Indicators of academic productivity in University rankings: criteria and methodologies

Illustration: FreeDesignFile
Illustration: FreeDesignFile
These days students, academics, and researchers often seek opportunities in institutes of higher education in countries other than their own. They are in search of educational  excellence , career progression or they may wish to specialize in a specific subject area. In this process, indicators of the quality of universities and research centers  are points of reference for a suitable choice. On the other hand, universities are affected by having their reputation made available for all to see and may even be quizzed about the ranking they have received.
The first ranking of North – American universities dates from 1983, and owes its origins to studies which began in 1870, when bodies with connections to the university system of that country began to evaluate their institutes of higher education. The first international ranking of institutes of higher education was carried out by the Shanghai Jiao Tong University, located in Shanghai, China and was known as The Academic Ranking of World Universities’ (ARWU). Its publication caused a certain amount of disquiet, especially in Europe, because institutions in the United States and the United Kingdom were dominant in the listings for both the 20 and 100 best universities.  2004 saw the creation of the European response to the ARWU in the form of The Times Higher Education Supplement World University Ranking, known thereafter simply as Times Higher Education (THE).
Since then, new international rankings have appeared on the scene instigated at the initiative of private companies, organized by great vehicles of communication or institutes of higher education and research, but differing both in the methodology and indicators used as well as in the way the results are presented. People are particularly predisposed to viewing results in the form of tables which arrange institutions according to “indicators of excellence”. These are known as League Tables analogous with the classification of teams in sporting championships. There are other ways of presenting the results gleaned by the various indicators which do not, however, classify institutions in order of excellence. Results can be derived from an overall scoring based on questions such as the quality of the teaching body and the number of publications appearing in high-end journals as well as the infrastructure of the particular institution and the presence of foreign students.
The following is a presentation and discussion of the indicators used to evaluate the academic output of institutions appearing in the major international rankings of universities.
Academic Ranking of World Universities
Academic Ranking of World UniversitiesThe first international ranking of universities was created in 2003 at the initiative of the Shanghai Jiao Tong University, located in Shanghai, China. It is known as the Academic Ranking of World Universities (ARWU) and is updated annually. The indicators used to measure academic output include the number of articles published in the high-end journals  Nature and Science (representing 20% of the total) and The Social Science Citation Index (SCCI), Thomson Reuters (20%) and the number of researchers most cited by Thomson Scientific (also 20%). In this system of ranking, however, academic output is responsible for around 60% of the weighting of the indicators used in the evaluation process.
In addition to world ranking statistics, ARWU also publishes evaluations arranged by country and area of knowledge.
Times Higher Education
Times Higher Education
The second international ranking, Times Higher Education (THE) was published in 2004, as a counterpart to the ARWU which had been created the previous year. Between 2004 and 2009, it used Quacquarelly-Symonds (QS) to harvest and process the data. After 2009, The THE began to use data from Thomson Reuters. The number of articles published in journals which are indexed by this data base are standardized by the number of researchers and by subject and provides data on how proficient the institution is in getting the results of its research output published  in high-end peer reviewed journals. This indicator represents 6% of the total.
The citations measure the impact of institutions’ research, and in the THE ranking they represent 30% of the evaluation points. This concerns  the single most significant indicator of all, the citations, evaluated by  means of the 12 thousand journals which make up part of the Thomson Reuters database, assessed over a period of five years to also take into account subject areas whose citation  half life is greater, as in the case of the social sciences and humanities. Adjustments are also made so as not to favor institutions which specialize in subject areas which are known to generate a high number of citations, such as the health sciences, physics and mathematics.
QS World University Rankings
QS World University RankingsThe multinational company Quacquarelli-Symonds, headquartered in London, England, which originally provided the data for the THE ranking, has since 2004, been publishing the Guide TopUniversities which lists the best institutions world wide. The indicators of academic output include article level citations (with adjustments made for those disciplines which attract a small number of citations), worth 4% of the points available, and the number of articles published  per researcher which is also worth 4% of the available points. Both sets of statistics are collected by the Scopus database, a company affiliated to the multinational publisher Elsevier.
The ranking also provides lists arranged by region and the category QS Stars, in which institutions are evaluated not only by their proficiency in research and teaching, but also by their facilites, innovation and engagement with the region in which they are situated. This allows newer universities or those in developing countries, which according to the criteria used by the majority of rankings, would probably not appear in the top 500 institutions, to be highlighted.
Leiden Ranking
Leiden RankingThe Centre for Science and Technology Studies (CWTS) of the University of Leiden , Holland has developed its own methodology for measuring,  from 2008 onwards, academic impact and other indicators, with the objective of selecting the 500 best institutions in the world.
The bibliometric data is provided by the Web of Science database which collects together the number of publications produced by a particular institution over the previous five years. The citations are calculated using an algorithm which takes into consideration the citations received over a previous five year period and is standardized according to different fields of knowledge and number of journals. Author self – citations are excluded.
The CWTS also provides information on cooperation between universities and industry and makes available maps showing the collaboration between universities which form part of the ranking.
U-MapThis initiative owes its origin to a project developed on the part of The European Classification of Higher Education Institutions which was conceived in 2005 as an alternative to rankings which are based on research productivity, and which offers a “multidimensional “ ranking of institutions and European universities (excluding however, the United Kingdom), grounded in a wide range of indicators.
The principal products of the ranking, which provides a panorama of the diverse nature of European institutions , include the  ProfileFinder, a list of Institutes of  Higher Education which can be compared according to predetermined characteristics, and ProfileViewer which provides an institutional activity profile which can be used to compare institutions.
The indicators of academic  productivity are the annual number of academic publications which are submitted for peer review relative to the number of researchers working in the institution in question, plus other types of publications which are the products of research.  There is also an indicator relative to the number of academics, which do not form part of the previous category.
U-MultirankThis new university ranking, created with financing from the European Union, was launched in January of 2013 and will have its first ranking list published at the beginning of 2014. The focus of this project is to initially evaluate institutions in Europe, United Kingdom, United States, Asia and Australia.
Its approach, which differs from other rankings that are focused primarily on research excellence, includes indicators such as the reputation in research, quality of education and learning, international perspective, knowledge transfer, and contribution to regional growth.
The European Commission and those responsible for the project have yet to define the sources for the indicators on research productivity, but state that they will use the databases of Thomson Reuters (Web of Science) and Elsevier (Scopus).
WebometricsThe Webometrics Ranking of World Universities was launched in 2004 as an initiative of the Cybernetics Laboratory of the National Research Council of Spain (Consejo Superior de Investigaciones Científicas (CSIC)). The project was conceived to promote dissemination through the open access publication on the Web of articles and other documents.
Web indicators are tools used for evaluations in general, however Webometrics does not use the number of accesses or the navigability of sites as an indicator of the performance and visibility of institutions, instead it uses the volume, visibility and impact of the institutions on the Web with emphasis on research results.
Like other rankings, this one also has as its major focus the impact of the research production of institutions. What differentiates it, however, is that there are other forms of publication available on the Web such as repositories, online only journals, as well as informal media in scholarly communication such as blogs, wikis among others. In the final analysis, the ranking seeks to motivate academics to put themselves out on the Web, attracting the attention of the research community and of society as whole.
The ranking includes institutions of higher education, hospitals, and research centers in all continents as well as the BRIC and CIVET country groupings, in addition to analyses by knowledge areas and a world ranking of repositories.
As of 2005, data are updated online every six months. The Web Impact Factor  is how the institutions are ranked. The ranking is based on the log-normalization of the groups of indicators activities/presence and visibility/impact on the Web  in a one-to-one relation.
SCImago Institutions Ranking
SCImago Institutions RankingIn 2012, SCImago created SCImago Institutions Ranking (SIR) using the Scopus database, an integral part of the multinational publisher Elsevier. The SRI publishes two reports per year, one dealing with the Ibero-American Institutions and the other report is global in nature.
The SIR has different characteristics to other university rankings. It does not produce institution lists ordered by their prestige, called  league tables, but instead a comprehensive compendium that presents the analysis of the results of research in Ibero-America and the world. The way in which results are presented consists of tables that contain a richness of information including the position of an institution according to the established criteria, the total of documents published in a period of five years, normalized citation indicators, number of articles in high impact journals, and the excellence rate percentage , derived from the number of articles in proportion to the 10% most cited articles in the respective field.
The SIR presents an innovative methodology to rank universities that are located outside of the USA-UK axis, and which would not be included in the league table rankings, thus allowing for a fair and appropriate analysis of the profiles of these institutions.
University Ranking of Folha
University Ranking of FolhaAs a result of the large increase over the past few years in the number of institutions of higher education in Brazil, a demand for a national ranking of universities appropriate to the realities of the Brazilian context emerged.
On the initiative of the newspaper Folha de São Paulo, Datafolha, under the supervision of the SciELO researcher and expert in the analysis of academic output Rogério Meneghini, developed the Folha University Ranking (RUF). The first edition was published in 2012.
The academic output indicators used in the ranking and that count for 55% of the total points were extracted from the Web of Science (Thomson Reuters), and include the total number of publications, citations received, and articles with international cooperation. These data are normalized by the number of lecturers at the institution. Articles in the database SciELO are also tabulated which endows the RUF with a broader approach in the context of Brazilian academic ouput.

Final Considerations

University rankings that had their beginnings in the 2000’s came to fill an existing gap to guide the choice of students and academics in search of quality teaching and research around the world.
Quantitative assessments tend to be more easily understood and used compared to qualitative ones, just as research impact indicators rank journals, university rankings list institutions. This parallel, however, includes an alert about the trustworthiness of these indicators, as well as recent controversies about the indiscriminate use of the Impact Factor1 .
There are a countless number of problems pointed out by academic output indicators in the rankings, such as: articles disadvantaged in citations because they are published in a language other than English; the a priori  reputation of institutions in North America, the UK and Europe which makes them subject to better evaluations; the inherent differences between results in the life sciences and social sciences; the use of the Impact Factor of journals in which the academic output of the institution is disseminated; the different forms of the peer review process used by different journals, and so on.
The researcher Ellen Hazelhorn of the Dublin Institute of Technology, in her book Rankings and the Reshaping of Higher Education: The Battle for World-Class Excellence makes sharp criticism of the frequent use of rankings by decision makers and research funding agencies which she also presented at a conference organized by UNESCO in 2011 titled Rankings and Accountability in Higher Education: Uses and Misuses. Ellen states that rankings take into account less than 1% of the existing institutions in the world, thus giving the false impression that cultural, economic and health development depends on the universities at the head of the list.
On the same occasion, the Vice-Rector of Malaysia’s National University, Sharifah Shahabudin, declared that more important than the position of a university in a ranking is its principal function “to constantly anticipate and lead through innovation, creating new values, as well as a new social, environmental and financial order for the university, the nation and the region.” In her vision, the indicators should measure the impact of the university, which must still be created and perfected, on the society in which it finds itself.


1 Declaração recomenda eliminar o uso do Fator de Impacto na Avaliação de Pesquisa. SciELO em Perspectiva. [viewed 16 August 2013]. Available from:


HAZELHORN, E. Rankings and the Reshaping of Higher Education: the battle for World-Class Excellence. London: MacMillan Publishers Ltd., 2011.
RAUHVARGERS, A. Global University Rankings and their Impact. Brussels: European University Association, 2011. [viewed 16 August 2013]. Available from: RiALJxYHICQ&usg=AFQjCNGVKgtKX1TQP811f-Eblozz0T_b2A&sig2=Olv15o64Or D7Bp-DZl3znw&bvm=bv.50768961,d.cGE&cad=rja
UNESCO Global Forum: Rankings and Accountability in Higher Education: Uses and Misuses, Paris, 16-17 May 2011. UNESCO. [viewed 16 August 2013]. Available from:
WALTMAN, L., et al. The Leiden Ranking 2011/2012: Data collection, indicators, and interpretation. 2012. [viewed 16 August 2013]. Available from:  arXiv:1202.3941.

lilianAbout Lilian Nassi-Calò

Lilian Nassi-Calò studied chemistry at Instituto de Química – USP, holds a doctorate in Biochemistry by the same institution and a post-doctorate as an Alexander von Humboldt fellow in Wuerzburg, Germany. After her studies, she was a professor and researcher at IQ-USP. She also worked as an industrial chemist and presently she is Coordinator of Scientific Communication at BIREME/PAHO/WHO and a collaborator of SciELO.

Translated from the original in Portuguese by Nicholas Cop Consulting.

Como citar este post [ISO 690/2010]:

NASSI-CALÒ, L. Indicators of academic productivity in University rankings: criteria and methodologies [online]. SciELO in Perspective, 2013 [viewed 02 November 2019]. Available from: