Wednesday 23 January 2019

Why our citation practices make no sense

Source: https://musingsaboutlibrarianship.blogspot.com/2019/01/why-our-citation-practices-make-no-sense.html

People outside academia are often struck by how odd things are in academia.
The most often mentioned example is how researchers rush to give away their work (and this often means giving away the right to the IP) to publishers for nothing, allowing publishers to profit off millions from their work while the researchers are stuck with the intellectual labour.

Of course, there is indeed some method in the madness, since we know why researchers are acting this way.... they are buying prestige by publishing in some big name journal. Also we understand who is invested in the status quo and why. 
But this blog post is not about this.  
Rather I'm pondering on how wasteful and illogical our citing practices are. Every year, thousands of hours are devoted by everyone from students to researchers to copywriters employed by publishers into beating citations into shape. Is this really necessary? Can we make it easier?

Why are there thousands of citation styles?

Nobody is against consistency in referencing of course, but do we really need thousands (8.5k styles according to CSL Style repository) of citation styles existing? 
All this creates confusion and the costs.

Think of the researchers who have to format their references everytime their paper is rejected and they need to resubmit to another journal with it's own unique style. Given the low acceptance rates for top journals and the desire of researchers to try submitting to the top journals first , this means a typical journal article can be resubmitted to more than 1 journal before it is accepted. Even if researchers don't do it properly, copywriters are employed who work to clean up references in accepted manuscripts.

The fact there are thousands of styles have other less obvious costs. In the past decade there has been dozens of projects that try to parse text references and process them into structured data e.g Parscite which then can be used for various functions from finding an appropriate copy in link resolvers, to importing into reference managers.




The fact that there are thousands of citation styles rather than dozens has made this task difficult even with the latest generation of citation parsers based on machine learning techniques.

For instance take the recent work by Crossref to identify reference strings deposited by publishers and insert dois into them. See also this series of blog posts.


Crossref work on reference matching strings to dois

Even though Crossref has managed to improve the recall of reference matching, they still can't do this perfectly because there are so many styles out there and they even note that performance varies depending on the citation style used. In fact, they note the matching rates is much worse on certain styles in Chemistry or Physics which does not require titles or where citation elements like journal titles are abbreviated.

Most rules in citation styles are outdated, arbitrary and make little sense 

If you have ever helped University freshman in writing courses, one of the things you notice is how much anguish they have over citation styles and how many requests for help this generates. A recent study I did, yielded a result that among various skills, confidence in how to cite was actually lower after the writing module!

Part of the reason I suspect, they are so obsessed about citation style is because citation styles are an area where there seems to be a "right" and "wrong" answer and moreover it was one of the areas where it was easy to see whether they have done well or badly (marks lost due to not following citation style are clearly indicated). This is as opposed to less well defined skills like "evaluating sources" or "scoping" where marks lost or rather not gained were less salient and not as less easily attributed to.

It doesn't help that a lot of students merely google and land on some site like the famous Purdue OWL site or use some librarian created libguide that merely covers roughly what should be done and students get anxious when they inevitably run into cases not covered by those guides say APA 6th.

Even if they do go through the official APA 6th guide which is I understand has a pretty comprehensive style guide compared to other styles, they spend hours worrying say about who if anyone to attribute to a webpage they are citing. The surprising answer often is nobody! The problem of course is it's not always clear cut to me when you attribute the organization behind a website and when you leave out an author and I bet  when a lot of lecturers who mark these things see a student leave out an author, they will just assume it was a mistake and take marks off...

And if you think citation styles are easy or that they don't really affect grades,  look at this Tweet from an extremely accomplished academic librarian (among other accomplishments , Lisa is a ex-ACRL President)
But does all this hand wringing over punctuation in citation styles really do anything?

If you ask a freshman who has gone through library IL classes, and ask them why they need to cite properly, they will give the typical answer (often on the Librarian's "Why cite" slide) that it is to give credit, allow readers to find the sources etc.

All this is well and good, but does it really justify all the uncertainity and doubt that student suffers when they try to cite something that doesn't fit nicely in the guidelines?

Moreover even in clear cut cases is there really a reason to obsess over  complicated rules over minor punctation marks? Why do you sometimes need to italize or quote titles in MLA sometimes not? Why do you sometimes need to add et al. sometimes not depending on various conditions?  Is it really critical to add the location (city) of the publisher press as this tweet asks?

Call me naive but if you have the main citation elements that cover the "who created it", "when was it created", "What it is called" and"where it can be found" that's more than enough.

Clearly when you look at the citation styles many rules don't seem to have much logic or reason to them beyond "this is the way to do it". At best there might be a reason in the past that has since been superceded by the current online environment.

Also if there is a logical reason for any of this, why do the 8,000+ over styles all disagree on these details? While I can believe some disciplines would place different emphasis on different aspects of citation elements, I can't imagine this justifies thousands of them.

When you think about it, many of these citing styles were formulated and designed for the print era, and much of the logic for them no longer applies to a online world, particularly one that is increasingly open access. Here's a simple example, there are some styles in Chemistry, Physics etc where everything is abbreviated or there are no titles to save space. This of course makes no sense today and actually makes things harder if you want to follow the references.

Talking about following the references, that's one of the major reasons for citing right? In today's online environment, sticking a doi would be the easiest way to help a user to do that. I wouldn't go so far to say just adding in a doi and stop citing other reference elements, because someone might mess up the doi and some redundancy is good but still you would think most citation styles would mandate dois.

But you would be wrong, currently only 41.2% of styles require dois  (APA and MLA only very recently required or recommended dois) and even this 41.2% figure is somewhat inflated.




The effect of inconsistency - how much is bad?

The APA 6th Style guide (p 181) defends the importance of consistency in references as follows.
 "Consistency in referencing style is important, especially in light of evolving technologies in database indexing such as automatic indexing by database crawlers. These computer programs use algorithms to capture data from primary article as well as the article reference list."
They then state "If reference elements are out of order or incomplete, the algorithm might not recognise them, lowering the likelihood that the reference will be captured for indexing".

I would imagine these algorithms are referring to citation parsers so a interesting question is how much inconsistency can you get away with and still have these citation parsers work. Sure I can imagine missing or messing up an article title, making a big difference but what about forgetting to italize? Lower case versus upper case? Putting an additional dot or not between elements? Are our algorithms that frail?

I haven't looked at studies that specifically study this , where the citation elements are mostly there  (as opposed to outright missing or wrong) but are "inconsistent" in minor punctation marks but again the Crossref study referenced above might be helpful.

In one earlier study they took a random sample of 2,500 items in Crossref , generated reference strings in 11 styles (including APA, MLA, Chicago author-date) and then tried to simulate noise.

Simulating dirty dataset by Crossref for matching of dois

Surprisingly the new search based method they tested is actually quite robust even with an attempt to mutate these strings quite badly, with the top method scoring a precision of 0.99 and recall of 0.79.

I wish they had broken up the data further (for example, the title scrambled degraded style and degraded styles seems to be unrealistic examples) and just give recall and precision stats for the known style + random noise ones, but still the results are suggestive. 
Some caveats, this study only deals with a specific task in citation parsing, linking to dois so it won't work for non-doi materials. Also this method proposes a "search based" technique as opposed to a older field based approach (where the system tries to calculate a similarity score on various important fields like title and author), one might suspect a field based approach would be more sensitive to variance? 

Are citation style makers dismissive of reference managers?

Of course, we live in the computer age and time saving software like reference managers eg Zotero exist, but shockingly, Sebastian Karcher and  Philipp Zumstein (who contribute to the Zotero project) argue that "influential style guides such as the Chicago Manual (14.13) and APA (which makes no mention of automated references at all) are dismissive of such software,"

In the fascinating article  "Citation Styles: History, Practice, and Future" they traces the history of citation styles and the rise and decline of 3 main classes of citation styles, "Note style", "Numbered style" and "Author-date style".

They goes beyond a mere narration of the history but also includes and analysis of styles in the CSL citation styles repository by discipline and type , but it is the final chapter where they write about the future of citation styles that makes me sit up.

They write on the need to standardise to a few standard citation styles rather than create their own which often tends to be inconsistent and are unclear in instructions. I fully agree. I remember once I had students worried sick because they were given a nameless citation style, with a few scant examples as a sample to follow for their thesis. So they almost spent more time worrying about that than actually doing the research.

Sadly, they predict things might become even worse, with the citation style landscape to become more diverse. But what about automated reference managers like EndNote, Zotero, Mendeley you say? This is where it got shocking.

"Currently, influential style guides such as the Chicago Manual (14.13) and APA (which makes no mention of automated references at all) are dismissive of such software, in spite of its growing popularity among academics. Style guides and publishers can and should help, rather than belittle, these efforts. Most importantly, they should refrain from imposing rules that are virtually impossible to automate."

They go on to give an example of APA, that has the "use of journal issue numbers dependent on whether a journal is continuously paginated per volume." and since there is no way to tell if a journal is continously paginated (no database exists with this information) this aspect of APA can't be automated.  Maybe I'm missing something, but this is yet another one of the arcane rules that exist for no real reason I can think of.

Reading all this gives me the overall impression the people in charge of citation styles are if not openly hostile to reference managers, at the very least they are ignoring the issue , leading to a lot of time wasted by both students and researchers, particularly for manuscripts that are resubmited to multiple journals with differing citation formats.  Granted the journal employs staff to do editing and cleanup work after the manuscript is accepted but why the additional effort and expense?

I was struck by a comment by Jordi Scheider recently at Crossref Live 2018 where she pointed out that publishers now spend time helping researchers check references and formatting but don't help them check if a reference has been retracted!  Surely publishers have better things to do?

References are converted from structured references to plan text and back again during journal submissions

I was reminded further of this absurdity when I read that Scholarcy - a tool that can extract references from PDF was used by BMJ publishing group to convert back-files PDF to create structured references (XML probably?) . Granted this was for legacy PDF, but it made me think.





Today, many manuscript submission systems accept documents in Word or PDF and I would guess it's still the most common way. The thing is quite a lot of authors are now using reference managers like Endnote and when you think about it the manuscript goes through a process that involves authors creating citations in structured format using reference managers and then stripping the information and submitting them in plain text, after which when accepted publishers would convert it back again to XML structured format....

In fact , it seems to me that if references were submitted in structured format during the submission and peer review phase, one could more easily do useful things like check for retractions, do citation chains/analysis to automatically identify suitable reviewers etc.

I have been reading about new services for Publishers like UNSILO that use machine learning and AI, to help publishers improve the screening and evaluation of manuscript submission system.  Why not collect as much structured data as possible, rather than rely on extraction techniques which may not be as accurate?

I'm perhaps naive here, but it seems to be more efficient for the citations/references to be submitted in the structured format RIS/Bibtex in the first place. Here is where it gets odd, some journals allow you to submit manuscripts in LaTeX, but it seems some of them them specifically ask you not to submit the .bib files (structured references) but only the processed bibliography  (.bbl) !

As an aside, I was looking at the recent reply from Elsever regarding the revolt that caused the mass resignation of the editorial board at its Journal of Informetrics. 

Part of letter from Elsevier commenting on mass resignation of the editorial board at its Journal of Informetrics.

It seems one of the demands from the editorial board was that Elsevier make the references it deposits in Crossref open like most major publishers have done.

The reply is instructive where Elsevier bemoans the fact references are received in various styles and "more importantly in natural language", hence they have invested "significantly" in citation extraction tech.

Leaving aside the fact that probably the real reason why Elsevier refuses to do this is because it will strengthen competitors to their Scopus product, both commerical (e.g. Dimensions) and open ones  , if we take this response at face view, it does seem odd that Elsevier and other publishers are not mandating more sane practices like submitting citations in RIS/BibTex .

Why this state of affairs? Can we do better?

The amazing thing about this is unlike in the case of Scholarly publishing where you can see why one party wants to keep status quo, the current state of affairs seems counterproductive for everyone.

I can't think of anyone who benefits from this. Sure academic librarians get a lot more research consultations from students who want them to "go through" their references to ensure that it is 100% correct just in case. But I'm pretty sure most academic librarians would prefer to forgo all this and help out with more interesting and important aspects of research such as discussing how Scholarship is a discussion rather than going through mechanics and rules of citations.

Perhaps people behind reference managers would benefit from this? Maybe, but as we have established they don't work as well as they should.

The only group (and is a tiny one) that I can imagine benefits from this state of affairs for sure are the people in charge of the style guides.

In this APA blog post, one of them suggests there is a reason for so many styles - it is simply for signalling purposes such that knowing how to use a style "marks its user as a member of a specific culture".  I'm afraid I'm not sympathetic to this argument, plenty of undergraduates can do a passable APA style, does that really mark them as a member of the psychology culture?

All in all I don't get it.


A better method?


Just before I pressed the published button, Todd Carpenter referred me to this very instructive piece he wrote in 2014, Why Are Publishers and Editors Wasting Time Formatting Citations?  My blog post covers most of the same ground as he does - about the inefficiency of thousands of styles, why we should be encouraging reference managers use and submission of references in structured format etc.

But I'm struck most by his proposal.

The idea is this

"When authors are submitting references, why doesn’t the community simply send in a reference that is submitted like this:


"

This is simple , elegant and efficient. By using permanent IDs (such as ORCIDs, Dois)  as much as possible, we can benefit a lot from machine readable data and linked data techniques. Of course in reality you might want some redundancy with strings in case someone messes up the doi etc.

Getting everyone to agree is of course the tough part. Again this blog post by APA paints a nice picture of how getting everyone to agree on a style will lead to arguments on the most minute points like use of periods, abbreviations vs spelling out, captials etc.

All this strikes me as rule making and following for no good reason. After all, it seems to me that the main purpose of a reference is to cover "Who created the reference", "When was the reference created", "What is this reference called" and finally "Where you can find this reference", do we really need to worry so much about things like periods, commas and case sensitivity?



Acknowledgements : This article has been years in the making and has been influenced by discussions online at various forums such as LSW, on Twitter and as mentioned was highly influenced by Sebastian Karcher and  Philipp Zumstein's article on citation styles and most recently Todd Carpenter's Why Are Publishers and Editors Wasting Time Formatting Citations?  

View comments

Monday 21 January 2019

Oh, What A Tangled Web! Citation Network Underscores Editorial Conflicts of Interest

Source: https://scholarlykitchen.sspnet.org/2018/12/18/tangled-web-of-editorial-interests

Oh, What A Tangled Web! Citation Network Underscores Editorial Conflicts of Interest

The separation of powers is as important in academic publishing as it is in government.
If readers are to trust the integrity of the editorial and peer review process, editors need to be insulated from the business of publishing, which often means keeping them away from their colleagues in marketing, sales, and advertising.
So important is the separation of powers that some publishers physically separate editorial offices from business operations and place them in different cities. If they can’t separate these divisions physically, they will often develop strong internal policies to minimize influence. For example, PLOS does not disclose to the editor whether a submitting author has applied for article processing fee assistance when reviewing a manuscript.
Similarly, many publishers have explicit rules that prevent editors from handling their own paper or the papers of authors very closely associated with them. None of these separations of roles and powers guarantee that the decision to publish is entirely free of bias, but they do demonstrate a seriousness in building an institution, a process, and a product that can be trusted.
Last week, I described a publisher (American Scientific Publishers) that had four of its journals singled out this year by Clarivate Analytics for displaying a “problematic pattern of citations.” In a series of media questions conducted by email, the publisher, Dr. Hari Singh Nalwa, was quick to blame Chinese authors for the problem, before denying that he knew anything of the matter.
Nalwa (a name he adopted later in life) is listed as the “Founder, President, and Chief Executive Officer” of American Scientific Publishers, the Editor-in-Chief (EiC) of two of its journals, and an associate editor for two more. Not only does Nalwa have a stake in the business and editorial operations of his journals, he is also the author of several reviews published in the journals in which he operates as EiC. More surprising is that some of these reviews include first authors (Eric Singh and Ravina Singh), who appear to be his children. In three papers [1, 2, 3] published in 2015, Eric Singh lists the “William S. Hart High School,” located just a few miles from the ASP publications office as his institutional address. Eric is now an undergraduate in the computer sciences department at Stanford University. The LinkedIn page for Ravina Singh states that she worked as an Editorial Assistant and Marketing Associate for ASP between 2010 and 2016, where she “directed the editorial efforts of multiple academic publications that are circulated in over 600 universities worldwide.” Nalwa did not respond to my questions about family relationships at ASP.
Like the family associated most strongly with ASP, the eight ASP journals indexed in the Web of Science show a curious level of self-dealings, with high levels of citations directed to, and from, other ASP journals.
Citation network contributing to the 2017 Impact Factors of 8 journals published by American Scientific Publishers (colored). All other journals are colored grey. The size of each node is scaled to reflect the number of citations from each source.
[T]he vast majority of citations used to calculate 2017 Journal Impact Factors for some ASP journals came from other ASP journals.
As should be visually apparent from the above graph, the vast majority of citations used to calculate 2017 Journal Impact Factors (JIFs) for some ASP journals came from other ASP journals (see Table below). For example, 87% (384) of the citations that determined the JIF score for Journal of Biobased Materials and Bioenergy were from other ASP journals, leaving just 13% (56) citations from other sources. If we were to remove ASP citations from Clarivate’s calculations, its JIF would drop from 2.993 to just 0.381. Similarly, 83% of citations determining the JIF score for Nanoscience and Nanotechnology Letters came from other ASP journals. These two journals also share the same EiC (Dr. Nongyue He). While these percentages should be alarming to most readers, they apparently are not high enough to invoke editorial suppression from the Journal Citation Reports.
Last year, the European Geosciences Union conducted an investigation of Artemi Cerdà –an editor suspected of abusing his position to manipulate the citation record to benefit his own journal, Land Degradation and Development, and his own publications. The publishers of EGU journals (Copernicus and Wiley) were involved as well. Without the separation of roles and powers, such an investigation (and ultimate resignation of the EiC) would not have been possible.
In the case of ASP journals, the founder, owner, CEO, editor, and author not only occupy the same office suite, but the same chair, with no separation of roles or powers. It’s like having the President of the United States overseeing the Executive, Judicial, and Legislative branches of government with the involvement of his children in various decision-making positions. With such a concentration of powers in the hands of a single individual, we shouldn’t expect that ASP will do anything as a result of Clarivate’s editorial expression of concern over the problematic pattern of citations in its journals. “This problem has been resolved so there is nothing to say,” wrote Nalwa in his response to my inquiry.
All hail the Chief!

Table Notes: % JIF Numerator is the percentage of citations from the citing (donor) journal that form the numerator of the citing (recipient’s) Journal Impact Factor calculation. % Exchange to JIF Years is the proportion of citations from donor to recipient that are considered in the JIF calculation. Source items is the number of papers used in the denominator of the JIF.
Phil Davis

Phil Davis

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. https://phil-davis.org/
View All Posts by Phil Davis

Sunday 20 January 2019

Concordia University launches 4TH SPACE -- a dynamic public venue for learning and discovery

Source: https://www.biospace.com/article/releases/concordia-university-launches-4th-space-a-dynamic-public-venue-for-learning-and-discovery

Concordia University launches 4TH SPACE -- a dynamic public venue for learning and discovery

 
MONTREAL, /CNW Telbec/ - 4TH SPACE, a dynamic lab and immersive venue designed to bring knowledge and ideas to life, officially opened today on Concordia University's downtown campus. The public space was conceived as a hub where researchers, students and community members can come together to foster diverse, stimulating conversations and translate these into projects that will benefit Montrealers and people all over the world.
"4TH SPACE takes Concordia's reputation as one of Canada's most open and connected universities a big step further," says Concordia President Alan Shepard. "Agile and responsive, it expands the blueprint of what a university can be and represents what knowledge creation and outreach will look like in the future," he adds.
Montreal Mayor Valérie Plante is equally enthusiastic about its potential. "I am delighted by the creation of 4TH SPACE at Concordia University, a space that promotes citizen engagement and the sharing of ideas," she says. "By integrating knowledge, innovation and experimentation, we are able to work more effectively to improve the quality of life for all Montrealers."
A new standard in accessible knowledge
All are welcome to experience and engage with 4TH SPACE's curated programming, which includes everything from installations to rotating residencies, and special events such as screenings, idea labs, lectures, performances, consultations, hack-a-thons, conferences and more.
Nightly projections extend the experience beyond opening hours of Monday to Friday, 10 a.m. to 6 p.m.
4TH SPACE is located on the ground floor of the J.W. McConnell Building (1400 De Maisonneuve Blvd. W.).
The current program, Cities: Urban Essentials, features a collection of presentations, workshops, installations and events on topics including climate change, land consumption and sustainable development in Montreal.
Bringing ideas to life
The new venue will provide Concordia students with the opportunity to engage in experiential learning opportunities, share their work with the world and bring their ideas to life.
"Historically, public institutions have not necessarily sought to include us, engage us or solicit our input in these important conversations — 4TH SPACE is here to help change that," says Nadia Bhuiyan, Concordia's vice-provost of Partnerships and Experiential Learning.
'Research plays a foundational role in moving society forward
Justin Powlowski, associate vice-president of Strategy and Operations, says 4TH SPACE will help drive Concordia's next-generation research and knowledge in areas such as smart cities, health, cybersecurity and synthetic biology.
"The university is making huge strides in areas that its researchers have identified to have remarkable potential for social impact and innovation," he adds. "But without knowledge mobilization, we were missing an important element that will help this important work move from the specialist and into the hands and minds of a much wider audience."
READ MORE AT www.concordia.ca/4th-space-release.

SOURCE Concordia University

The effect of inconsistency - how much is bad?

Source: http://musingsaboutlibrarianship.blogspot.com/2019/01/why-our-citation-practices-make-no-sense.html

The effect of inconsistency - how much is bad?

The APA 6th Style guide (p 181) defends the importance of consistency in references as follows.
 "Consistency in referencing style is important, especially in light of evolving technologies in database indexing such as automatic indexing by database crawlers. These computer programs use algorithms to capture data from primary article as well as the article reference list."
They then state "If reference elements are out of order or incomplete, the algorithm might not recognise them, lowering the likelihood that the reference will be captured for indexing".

I would imagine these algorithms are referring to citation parsers so a interesting question is how much inconsistency can you get away with and still have these citation parsers work. Sure I can imagine missing or messing up an article title, making a big difference but what about forgetting to italize? Lower case versus upper case? Putting an additional dot or not between elements? Are our algorithms that frail?

I haven't looked at studies that specifically study this , where the citation elements are mostly there  (as opposed to outright missing or wrong) but are "inconsistent" in minor punctation marks but again the Crossref study referenced above might be helpful.

In one earlier study they took a random sample of 2,500 items in Crossref , generated reference strings in 11 styles (including APA, MLA, Chicago author-date) and then tried to simulate noise.

Simulating dirty dataset by Crossref for matching of dois

Surprisingly the new search based method they tested is actually quite robust even with an attempt to mutate these strings quite badly, with the top method scoring a precision of 0.99 and recall of 0.79.

I wish they had broken up the data further (for example, the title scrambled degraded style and degraded styles seems to be unrealistic examples) and just give recall and precision stats for the known style + random noise ones, but still the results are suggestive. 
Some caveats, this study only deals with a specific task in citation parsing, linking to dois so it won't work for non-doi materials. Also this method proposes a "search based" technique as opposed to a older field based approach (where the system tries to calculate a similarity score on various important fields like title and author), one might suspect a field based approach would be more sensitive to variance? 

Are citation style makers dismissive of reference managers?

Of course, we live in the computer age and time saving software like reference managers eg Zotero exist, but shockingly, Sebastian Karcher and  Philipp Zumstein (who contribute to the Zotero project) argue that "influential style guides such as the Chicago Manual (14.13) and APA (which makes no mention of automated references at all) are dismissive of such software,"

In the fascinating article  "Citation Styles: History, Practice, and Future" they traces the history of citation styles and the rise and decline of 3 main classes of citation styles, "Note style", "Numbered style" and "Author-date style".

They goes beyond a mere narration of the history but also includes and analysis of styles in the CSL citation styles repository by discipline and type , but it is the final chapter where they write about the future of citation styles that makes me sit up.

They write on the need to standardise to a few standard citation styles rather than create their own which often tends to be inconsistent and are unclear in instructions. I fully agree. I remember once I had students worried sick because they were given a nameless citation style, with a few scant examples as a sample to follow for their thesis. So they almost spent more time worrying about that than actually doing the research.

Sadly, they predict things might become even worse, with the citation style landscape to become more diverse. But what about automated reference managers like EndNote, Zotero, Mendeley you say? This is where it got shocking.

"Currently, influential style guides such as the Chicago Manual (14.13) and APA (which makes no mention of automated references at all) are dismissive of such software, in spite of its growing popularity among academics. Style guides and publishers can and should help, rather than belittle, these efforts. Most importantly, they should refrain from imposing rules that are virtually impossible to automate."

They go on to give an example of APA, that has the "use of journal issue numbers dependent on whether a journal is continuously paginated per volume." and since there is no way to tell if a journal is continously paginated (no database exists with this information) this aspect of APA can't be automated.  Maybe I'm missing something, but this is yet another one of the arcane rules that exist for no real reason I can think of.

Reading all this gives me the overall impression the people in charge of citation styles are if not openly hostile to reference managers, at the very least they are ignoring the issue , leading to a lot of time wasted by both students and researchers, particularly for manuscripts that are resubmited to multiple journals with differing citation formats.  Granted the journal employs staff to do editing and cleanup work after the manuscript is accepted but why the additional effort and expense?

I was struck by a comment by Jordi Scheider recently at Crossref Live 2018 where she pointed out that publishers now spend time helping researchers check references and formatting but don't help them check if a reference has been retracted!  Surely publishers have better things to do?

References are converted from structured references to plan text and back again during journal submissions

I was reminded further of this absurdity when I read that Scholarcy - a tool that can extract references from PDF was used by BMJ publishing group to convert back-files PDF to create structured references (XML probably?) . Granted this was for legacy PDF, but it made me think.





Today, many manuscript submission systems accept documents in Word or PDF and I would guess it's still the most common way. The thing is quite a lot of authors are now using reference managers like Endnote and when you think about it the manuscript goes through a process that involves authors creating citations in structured format using reference managers and then stripping the information and submitting them in plain text, after which when accepted publishers would convert it back again to XML structured format....

In fact , it seems to me that if references were submitted in structured format during the submission and peer review phase, one could more easily do useful things like check for retractions, do citation chains/analysis to automatically identify suitable reviewers etc.

I have been reading about new services for Publishers like UNSILO that use machine learning and AI, to help publishers improve the screening and evaluation of manuscript submission system.  Why not collect as much structured data as possible, rather than rely on extraction techniques which may not be as accurate?

I'm perhaps naive here, but it seems to be more efficient for the citations/references to be submitted in the structured format RIS/Bibtex in the first place. Here is where it gets odd, some journals allow you to submit manuscripts in LaTeX, but it seems some of them them specifically ask you not to submit the .bib files (structured references) but only the processed bibliography  (.bbl) !

As an aside, I was looking at the recent reply from Elsever regarding the revolt that caused the mass resignation of the editorial board at its Journal of Informetrics. 

Part of letter from Elsevier commenting on mass resignation of the editorial board at its Journal of Informetrics.

It seems one of the demands from the editorial board was that Elsevier make the references it deposits in Crossref open like most major publishers have done.

The reply is instructive where Elsevier bemoans the fact references are received in various styles and "more importantly in natural language", hence they have invested "significantly" in citation extraction tech.

Leaving aside the fact that probably the real reason why Elsevier refuses to do this is because it will strengthen competitors to their Scopus product, both commerical (e.g. Dimensions) and open ones  , if we take this response at face view, it does seem odd that Elsevier and other publishers are not mandating more sane practices like submitting citations in RIS/BibTex .

Why this state of affairs? Can we do better?

The amazing thing about this is unlike in the case of Scholarly publishing where you can see why one party wants to keep status quo, the current state of affairs seems counterproductive for everyone.

I can't think of anyone who benefits from this. Sure academic librarians get a lot more research consultations from students who want them to "go through" their references to ensure that it is 100% correct just in case. But I'm pretty sure most academic librarians would prefer to forgo all this and help out with more interesting and important aspects of research such as discussing how Scholarship is a discussion rather than going through mechanics and rules of citations.

Perhaps people behind reference managers would benefit from this? Maybe, but as we have established they don't work as well as they should.

The only group (and is a tiny one) that I can imagine benefits from this state of affairs for sure are the people in charge of the style guides.

In this APA blog post, one of them suggests there is a reason for so many styles - it is simply for signalling purposes such that knowing how to use a style "marks its user as a member of a specific culture".  I'm afraid I'm not sympathetic to this argument, plenty of undergraduates can do a passable APA style, does that really mark them as a member of the psychology culture?

All in all I don't get it.


A better method?


Just before I pressed the published button, Todd Carpenter referred me to this very instructive piece he wrote in 2014, Why Are Publishers and Editors Wasting Time Formatting Citations?  My blog post covers most of the same ground as he does - about the inefficiency of thousands of styles, why we should be encouraging reference managers use and submission of references in structured format etc.

But I'm struck most by his proposal.

The idea is this

"When authors are submitting references, why doesn’t the community simply send in a reference that is submitted like this:


"

This is simple , elegant and efficient. By using permanent IDs (such as ORCIDs, Dois)  as much as possible, we can benefit a lot from machine readable data and linked data techniques. Of course in reality you might want some redundancy with strings in case someone messes up the doi etc.

Getting everyone to agree is of course the tough part. Again this blog post by APA paints a nice picture of how getting everyone to agree on a style will lead to arguments on the most minute points like use of periods, abbreviations vs spelling out, captials etc.

All this strikes me as rule making and following for no good reason. After all, it seems to me that the main purpose of a reference is to cover "Who created the reference", "When was the reference created", "What is this reference called" and finally "Where you can find this reference", do we really need to worry so much about things like periods, commas and case sensitivity?




Acknowledgements : This article has been years in the making and has been influenced by discussions online at various forums such as LSW, on Twitter and has mentioned was highly influenced by Sebastian Karcher and  Philipp Zumstein's article on citation styles and most recently Todd Carpenter's Why Are Publishers and Editors Wasting Time Formatting Citations?  
7

View comments

Google Scholar Citation Profiles: the good, the bad, and the better

Source: https://harzing.com/blog/2018/11/google-scholar-citation-profiles-the-good-the-bad-and-the-better

Google Scholar Citation Profiles: the good, the bad, and the better

Since 2012 Google Scholar offers academics the opportunity to create their own profile, something I would really recommend you to do. Setting up a Google Scholar Citation Profile is easy and very quick. A GS Profile is your academic business card, it is the quickest and easiest way for other academics to see all your publications at one glance. If you have a common name it is also the only sure fire way to disambiguate your publication record from that of your namesakes. Just make sure that, once you have created your profile, you click the box "Make my profile public". Otherwise you will be the only one who is able to see it, which defeats the whole purpose of creating a profile in the first place.

The good: a great solution for "stray" citations

Creating a GS profile is also a great solution for one of the biggest annoyances in citation analysis: the presence of "stray" citations. Stray citations are not the same as multiple identical web versions of the same paper; Google Scholar normally aggregates those under one master record. What I mean with "stray citations" are records that have not been aggregated under their master record. These 2nd (and sometimes 3rd and further) versions of the record typically only have a small number of citations each and are generally the result of misspelling of an author’s name, the title of the publication or the journal. They can also be caused by Google Scholar parsing errors. For more details on this, please see: Google Scholar: Stray citations.
Stray citations tend to be particularly common for "non-traditional" publications, such as software, books, book chapters, and conference papers as there is generally no standardised way to reference them. It is therefore much harder for Google Scholar to figure out whether they do refer to the same publication. For instance, although Google Scholar does a much better job than the Web of Science for references to my Publish or Perish software programme, there are still many stray citations (see screenshot below), which - in my GS Profile - I have all merged into the master record. Any records in your GS Profile that contain merged citations are shown with a * behind the citations. You can merge strays by logging into your profile, checking the box in front of the records you want to merge, and clicking merge.

No this doesn't mean Google Scholar is rubbish! (1)

It is important to note that stray citation records are not unique to Google Scholar. They are for instance prevalent in the Web of Science as well if you use the "Cited Reference" search function [which includes references to books and non-ISI listed journals] rather than the general search function. I need to submit data change reports to Clarivate nearly every single week to ask them to merge my stray citations in their relevant master records. See also: Web of Science: How to be robbed of 10 years of citations in one week! and Bank error in your favour? How to gain 3,000 citations in a week.
One of the most-cited academics in the field of Management – Geert Hofstede – has published a book called "Culture’s Consequences". This book was first published in 1980, with a 2nd revised edition in 2001. These two versions respectively have more than 12,000 and more than 9,000 citations under the title "Cultures Consequence". However, there are also hundreds of additional stray citation records in ISI’s Cited Reference search, all referring to the same two books. Many stray entries in ISI are simple misspellings of the title (see below for some of the more amusing bloopers). In most of these cases, the references were actually correct in the referring works and the spelling errors appear to have been made by ISI data entry staff.
 

The bad: badly polluted profiles (but you can easily avoid this)

As it is you (not Google Scholar) who is creating this profile, it is you who needs to maintain it and keep it up-to-date. This is not Google Scholar's responsibility. However, many academics only take a few minutes to create their profile, don’t look at any of the options and thus don't realise the default option is adding new articles automatically. That's not entirely surprising as Google Scholar doesn't make it very obvious how to change this. But it actually is very easy to do. Just login to your profile and click on the little cross you see in the title bar.
Click on "Configure article updates". Then on the next page, click the second option. Don't "fall" for the Google Scholar "recommended" option. As is common with these type of services, recommended options cater for lazy and forgetful people. You might think it will save you time as you do not have to confirm updates every time, but be realistic: how many articles do you publish a year? Most of us do not publish so much that logging in, after an email promp with a link, to approve legitimate additions becomes a burden. It takes all of 30 seconds. It is also a great opportunity to manually correct or supplement anything that GS got wrong by editing the record in question.

Essential for those with common names

If you have a common name, putting your updates on manual isn't optional, it is essential! Look at this "student of business management" at Salford University who left their profile updates on automatic.
But even if your name is unique - like mine - you might want to keep th quality control in your own hands. Below are three publications that GS thinks should be in my profile. The first two are just weird. The third one seems to be conflating two articles in the same issue of Scientometrics. Do you think any of them would add much to my credibility as an academic researcher? Thought not!
 

No this doesn't mean Google Scholar is rubbish! (2)

Please do realise though that this does not mean that Google Scholar data are rubbish and that they should be avoided at all cost (see also Sacrifice a little accuracy for a lot more comprehensive coverage). None of the three “publications” has any citations and in the normal course of events everyone would ignore them anyway. But why pollute your profile with them?
Again, don't think badly poluted profiles are unique to Google Scholar. My blogpost Health warning: Might contain multiple personalities documents the frankly hilarious lack of author disambiguation in the Web of Knowledge Essential Science Indicators. Just look at the top-10 most cited authors according to the Web of Knowledge: they are all called Zhang, Wang, Li or Liu and on average they publish 12 articles a day, 365 days a year. Ever heard of the Chinese expression: "Three Zhang (or/and) Four Li"? It means "anyone" or "everyone".

The better: GS Profiles and Publish or Perish

Since version 5 my free citation analysis software Publish or Perish allows you to do Google Scholar Profile searches. Thus any work you put into cleaning up your Google Scholar Profile is well worth the effort as you will be able to display your complete profile in a neat list in Publish or Perish and sort it any way you like. This is more difficult in the web interface, which only allows sorting by title and year and by default only provides you with 10 results per page. You will also get a wealth of citation metrics based on an accurate and complete profile.
Since version 6 using Publish or Perish also allows you to search for key words and institutions, making it very easy to get an overview of the most cited academics in a particular field or institution. This can be particularly helpful when looking for collaborators, reviewers, keynote speakers etc. Please note though that fields in Google Scholar are self-selected and not standardised. For instance, for one of my own areas of expertise, I have seen four different variants used: "International HRM", "IHRM", "International Human Resource Management" and "International HR".

Related blogposts

The Wellcome Trust funds OpenCitations

Source: https://opencitations.wordpress.com/2018/12/23/the-wellcome-trust-funds-opencitations/

The Wellcome Trust funds OpenCitations

The Open Biomedical Citations in Context Corpus funded by the Wellcome Trust

The Wellcome Trust, which funds research in big health challenges and campaigns for better science, has agreed to fund The Open Biomedical Citations in Context Corpus, a new project to enhance the OpenCitations Corpus, as part of the Open Research Fund programme.
As readers of this blog will know, the OpenCitations Corpus is an open scholarly citation database that freely and legally makes available accurate citation data (academic references) to assist scholars with their academic studies, and to serve knowledge to the wider public.

Objectives

The Open Biomedical Citations in Context Corpus, funded by the Wellcome Trust for 12 months from March 2019, will make the OpenCitations Corpus (OCC) more useful to the academic community by significantly expanding the kinds of citation data held within the Corpus, so as to provide data for each individual in-text reference and its semantic context, making it possible to distinguish references that are cited only once from those that are cited multiple times, to see which references are cited together (e.g. in the same sentence), to determine in which section of the article references are cited (e.g. Introduction, Methods), and, potentially, to retrieve the function of the citation.
At OpenCitations, we will achieve these objectives in the following ways:
  • by extending the OpenCitations Data Model so as to describe how the in-text reference data should be modeled in RDF for inclusion in the OpenCitations Corpus;
  • by develping scripts for extracting in-text references from articles within the Open Access Subset of biomedical literature hosted by Europe PubMed Central;
  • by extending the existing ingestion workflow so as to add the new in-text reference data into the Corpus;
  • by developing appropriate user interfaces for querying and browsing these new data.

Personnel

We are looking for a post-doctoral computer scientist / research engineer specifically to achieves the aforementioned objectives. This post-doctoral appointment will start the 1st of March 2019. We seek a highly intelligent, skilled and motivated individual who is expert in Python, Semantic Web technologies, Linked Data and Web technologies. Additional expertise in Web Interface Design and Information Visualization would be highly beneficial, plus a strong and demonstrable commitment to open science and team-working abilities.
The minimal formal requirement for this position is a Masters degree in computer science, computer science and engineering, telecommunications engineering, or equivalent title, but it is expected that the successful applicant will have had research experience leading to a doctoral degree. The position has a net salary (exempt from income tax, after deduction of social security contributions) in excess of 23K euros per year.
The formal advertisement for this post – which will be held at the Digital Humanities Advanced Research Centre (DHARC), Department of Computer Classical Philology and Italian Studies, University of Bologna, Italy, under the supervision of Dr Silvio Peroni – is published online, and it is accompanied by the activity plan (in Italian and English). The application must be presented exclusively online by logging in the website https://concorsi.unibo.it (default in Italian, but there is a link to switch the language in English). People who do not have a @unibo.it email account must register to the platform. The deadline for application is the 25th January 2019 at 15:00 Central Europe Time. Please feel free to contact Silvio Peroni (silvio dot peroni at unibo dot it) for further information.

People involved

The people formally involved in the projects are:
  • Vincent Larivière – École de Bibliothéconomie et des Sciences de l’Information, Université de Montréal, Canada;
  • Silvio Peroni (Principal Investigator) – Digital Humanities Advanced Research Centre (DHARC), Department of Computer Classical Philology and Italian Studies, University of Bologna, Italy, and Director of OpenCitations;
  • David Shotton – Oxford e-Research Centre, University of Oxford, Oxford, UK, and Director of OpenCitations;
  • Ludo Waltman – Centre for Science and Technology Studies (CWTS), Leiden University, Netherlands.
In addition, the project is supported by Europe PubMed Central (EMBL-EBI, Hinxton, UK).

Saturday 19 January 2019

Why free speech and open debate are essential in universities

Source: https://blog.derby.ac.uk/2018/02/free-speech-open-debate-essential-universities

Why free speech and open debate are essential in universities

Free speech is a hotly debated topic currently in the university sector. Dennis Hayes, Professor of Education at the University of Derby and Director of Academics For Academic Freedom, discusses why freedom of speech is censored and what can be done to tackle it.

The fourth annual Free Speech University Rankings (FSUR) were released in February and, once again, they reveal a very depressing picture of the state of free speech in universities: “55 per cent of universities now actively censor speech, 39 per cent stifle speech through excessive regulation, and just six per cent are truly free, open places.”
This is a disturbing finding, because universities cannot really be said to be universities unless they allow unrestricted freedom of speech and open debate.

Why is freedom of speech important?

Freedom of speech is the foundation of freedom. Unless we express our beliefs and ideas and put them up for debate and challenge, we cannot know whether what we think is true or false. We cannot even begin to defend other values, such as equality or democracy, without exercising our freedom of speech. If we do not engage in debate we are no more than parrots uttering sounds that have no meaning.
Universities embody society’s commitment to freedom of speech in its fullest sense. Academics are paid not only to have opinions but to research and test those opinions. This is what ‘academic freedom’ means. If universities restrict freedom of speech they are attacking themselves. They are in danger of turning themselves into training establishments that teach ‘truths’ that cannot be challenged.

Why is freedom of speech censored?

We live in a therapeutic culture that is documented and explained in my co-authored book The Dangerous Rise of Therapeutic Education. Therapeutic education emphasises emotion over the intellect and has produced what is often called the ‘Snowflake Generation’ of students. This characterisation of a generation is unfair because they are not snowflakes who can’t cope with ‘offensive’ or challenging ideas. It is universities and student unions that see students as vulnerable and unable to cope. They believe that the university must be transformed into a ‘safe space’ in which students are protected from emotional harm through exposure to ‘offensive’ ideas. This is why many restrictions and bans seem to be aimed at ‘protecting’ students. The reality is they are stunting their intellectual growth and potential.

What can universities do to defend freedom of speech?

Many of the restrictions universities place on freedom of speech are an epiphenomenon of committees. They are often ‘added to’ policies by academics and administrators going beyond any legal requirement and trying to regulate out ‘offensive’ ideas. The few universities that regularly get a ‘Green’ ranking for having a ‘hands off’ approach to freedom of speech are those with minimalist policies. With the agreement of the Vice-Chancellor and the Academic Registrar, the University of Derby began a review of all polices to remove regulations that went beyond the law and were ‘over enthusiastic’ about controlling speech. The work is ongoing but this year the university has achieved a ‘Green’ ranking for freedom of speech. A similar process could be undertaken in every university committed to freedom of speech and they could soon move out of red to an amber or even a green ranking. A start could be made by weeding out any statement that contains the word ‘offensive’.
Universities should also remind student unions that if they wish to use the label ‘University of X SU’ they must uphold the values of the university and not restrict freedom of speech. Student unions may claim to be independent organisations – legally they are – but if they ban, censor and ‘No Platform’ speakers and groups, they should not lay claim to a title borrowed from a distinguished institution whose values they reject.
If universities really believe that free speech is their foundational value, they must take it seriously and put in the hard work to ensure it is not accidently undermined by committees or by political activists amongst the academic staff or in student unions who happily undermine the university in pursuit of political goals.
For further press information please contact the Corporate Communications Team on 01332 591891, pressoffice@derby.ac.uk or @derbyunipress