Source: http://musingsaboutlibrarianship.blogspot.com/2021/06/what-do-we-really-know-about-academic.html
What do we really know about academic search engine optimization (ASEO) and an announcement.
Announcement
This blog has been using Feedburner to provide RSS and email subscriptions for over a decade (since 2009!). Unfortunately Feedburner has accounced that email subscriptions are going away in July, so to ensure that those of you who have subscribed to this blog via email continue to receive emails, I'm moving everyone to a new service - follow.it for the next blog post in July (estimated release 15th July).
What this means for you.
For those of you who subscribe via email:
No action needed, you wll automatically be subscribed to emails via the new follow.it service There is no need to create a followit account unless you want to. If you do not wish to continue email subscriptions from this blog and want to be excluded from this service, please email me to aarontay at gmail.com by 2nd of July 2021.
For those of you who subscribe via RSS:
No action needed currently. The Feedburner RSS feed will continue to work, however I highly recommend you switch over to the new RSS feed as it's unclear how long Feedburner is going to be maintained.
To do so, Enter the following https://follow.it/musings-about-librarianship-aaron-tay into your RSS feed reader
What do we really know about academic search optimization?
One of the areas that I have been asked by researchers in recent years is on how to maximise visibility of their research.
Thus far, I have had 2 musings on this topic
- Systematically improving the visibility of your research on social media - what would this look like?
- Systematically improving the visibility of your research on social media -Visual abstracts, Video abstracts , Plain language abstracts and more
The first covers how to track your online promotions and to assess what works while the second post focuses more specifically on whether posting additional content like visual abstracts, video abstracts helps with gaining more attention on social media.
All this is well and good but what about Academic Search Engine Optimization (ASEO). If you are unfamilar with the concept of Search Engine Optimization (SEO) it is usually defined as such
Search engine optimization is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic
SEO industry is by now a two decade old industry, where consultants provide services that will help draw traffic to your websites by ranking high on search engine result pages (SERP) which implies of course claiming to know the secrets to getting your website and pages highly ranked on top search engines such as Google.
I understand it is a highly competitive industry, with high stakes (ranking #1 on a popular keyword vs #100 on google makes a big deal for traffic), where gurus pore over comments by Googlers, or run experiments to try to figure out how what makes Google's search algorithms tick and it's a constant cat and mouse game, as SEO experts react to major Google updates that make big changes to the algorithms.
By analogy, Academic Search Engine Optimization (ASEO) has the same idea which is to optimize your academic content (typically journal articles) so it will rank higher in search engines typically Google and Google Scholar.
The term "Academic Search Engine Optimization" appears to be coined in 2010 , and the first paper that coined the term studied this in the context of what aftects the ranking of Google Scholar.
The first paper on ASEO found that citation have a very strong influence on the ranking of publications found in Google Scholar. This is a result with a strong effect (r=0.9) that has been found over and over again in follow up papers with larger samples and more sophisticated methods and is generally accepted to be one of the things we know for sure about Google Scholar.
In general though, I don't think ASEO has been a hot area of research in the decade that followed (at least if you search by the term). While there has been tons of research around Google Scholar, a glance at the bibliography produced by EC3 Research Group's Google Scholar Digest shows that most of the research on Google Scholar focuses on traditional bibliometrics areas such as comparison of Google Scholar index coverage with other scholarly indexes, whether Google Scholar is appropriate as a ranking tool etc.
As you will see, there is a strand in bibliometrics that while not focusing on ASEO per se, studies factors correlated with citation counts and this is sometimes used to support ASEO recommendations.
That said a recent publication - Increasing visibility and discoverability of scholarly publications with academic search engine optimization published in 2021 caught my eye. How much has the research on ASEO improved since the term was coined in 2010?
What follows will be a short summary of the paper and a discussion of the sources of the evidence for such ASEO tips. I discuss what ASEO really "works" means, difficulty of actually measuring this and musing about whether we can really use correlational observation studies to make confident recommendations for ASEO.
Ultimately the question is what we can really be sure about ASEO today.
An summary of ASEO tips in 2021
The article - Increasing visibility and discoverability of scholarly publications with academic search engine optimization published is positioned as a practical piece to advise researchers and contains no original research of it's own but instead draws from other published research and there's a lot to like about it.
Firstly, it nicely divides the advice into sections like
- Title optimization
- Keyword optimization
- Abstract optimization
But are we sure these ASEO tips really work?
I'm not an expert on ASEO by any means but I'm somewhat familar with the literature around Google Scholar and due to my past work, have a bit of understanding of search engines. Most of the advice in the article look reasonable to me and are worth a try.
The problem is, are we really sure these ASEO tips work? Can they backfire? How do we know?
One objection that immediately leaps to mind is that a lot of the advice given here are based on research many years old. A seasoned SEO expert would tell you any specific usable optimization tips that are a few years old are pretty much useless since Google changes so quickly.
One answer to that is that if one confines oneself to Google Scholar, given how ASEO hasn't attracted a lot of attention, and that Google Scholar is mostly a side project in Google, it is unlikely the algo has changed as much as Google. Sure there has been papers showing that one can spam Google Scholar with fake papers to increase citation counts (or see more recent paper ), but I think Google Scholar is on record of not being overly worried as this doesn't seem to have caught on. (Side note: Does predatory journals count as spamming?).
That said I have encountered companies that have approached me with offers to enhance discovery of Institutional Repository contents, basically by converting PDF to xml. In theory that might help, but who knows? Still this is ASEO more akin to traditional SEO being done at the site or repository level (e.g. speed up pages, proper site maps etc), which we will not discuss here
Still I have doubts. That we really know if ASEO actually works.
But let's define what works mean... it could mean
(a) If I followed the ASEO tips, my paper would rank higher or appear more often than if I didn't in Google Scholar etc
or
(b) If I followed the ASEO tips, my paper would get more visibility leading to more attention, download and ultimately citations.
Clearly (b) is much harder to provide evidence for since even if you achieve a higher ranking on Google Scholar and is seen the casual pathway to acting on it to citing it is a bit further. Just as traditional SEO might help your webpage appear more and get more eyeballs whether they ultimately buy from your website (or whatever action you want them to do) might still not occur.
But let's say we stick with (a) and just focus on visibility on Google Scholar, how sure are we these ASEO tips given work?
The main issue here is that even if ASEO works it is often hard to be sure. With traditional SEO, you have a baseline ranking of your website at a certain point of time, intervene and measure the change. With a lot of ASEO tips like writing titles, abstracts, you just have one shot at it.....
Still, what reasons are we given to believe ASEO works?
Looking at the paper I see two common reasons given.
a) Reasons based on understanding how the academic search engine generally works (first principles)
b) Reasons based on empirical studies - correlational, observational studies
Reasons based on understanding how the academic search engine generally works
"Images and graphs in publications should also be optimized for findability. It is important to note that text in images can only be recognized by search engines if the image is saved in a vector graphics format such as .svg and .eps"
The article gives the above advice for handing images in publications and goes on to point out that text in common image formats like .bmp, .jpg and .png formats are not machine-readable and suggests adding the file caption as metadata to the images.
This I believe is based on very early studies showing Google/Google Scholar could not index such images, assuming this remains unchanged this seems to be worth doing.
Here's another
From an ASEO perspective there should be more emphasis on the title and less information hidden in subtitles. Many databases do not even display subtitles in the list of results, which indicates that these articles will also be poorly ranked. Furthermore, it is difficult for users to identify the subject of an article, especially if the title is very long and therefore cut when using a mobile device.
In this case as the paper states, this recommendation is based on the way academic search engines display titles particularly on mobile. They make the very sensible suggestion that firstly creative titles that aren't clear what topic the publication is covering might not be a good idea and even if you combine a creative but unclear main title with a more definitive subtitle it can backfire since the subtitle often isnt displayed or is cut off after the first few characters and the reader might just assume the paper isn't relevant.
Another guideline given for titles, is to avoid Suspended hyphens , special characters, diacritical signs and formulae (basically characters that can't be expressed in UTF-8) in titles because they reduce findability.
The issue with suspended hyphens is that we as human readers can make the semantic connection but search engines often cannot. Humans will readily combine the parts ‘pre-’ and ‘natal’ to pre-natal or prenatal, while search engines cannot always combine them.
Again this rings true, as a librarian, I can tell you whenever I search for a publication long title with a hyphen or colon and I can't find it, I will try to search again with a short fragment without the hyphen or colon just in case and often the publication appears!
So findability is definitely an issue.
Mathematical formulae can create another set of problems: the representation in basic text code. It is vital to limit characters to basic text code like UTF-8.11 In that case, if special characters like the Φ symbol cannot be displayed properly the problem lies with the end user’s device. ...In order to find the book, you will need to use the Φ symbol. If the platform that hosts the book cannot display a special character like the Φ symbol, the search functionality is impaired."
Of course all these issues make it hard for a publication to be properly indexed and when citation indexes try to link references to source publications (typically done automatically), such problems can lead to real citations being missed ( being seen as a citation variant) leading to lost citations!
There are a lot more such examples, they sometimes don't even cite peer reviewed papers for evidence because they are based on common sense reasoning.
I think such advice is probably harmless, though I don't really know if we can be sure they actually do help.
Reasons based on empirical studies - correlational, observational studies
So are there actually empirical studies that give you some assurance these tips help? This is where things get iffy.
There is indeed quite a lot of papers studying what affects citation rates of papers and unfortunately the vast majority are basically correlational studies.
For example there are quite a lot of studies on the effect of different charateristics on titles on citation counts and they are inevitably correlational studies.
Just taking one example, The effect of characteristics of title on citation rates of articles provides a correlational study covering different aspects of titles from
- use of a hyphen or a colon separating different ideas within a sentence
- articles with different words in the keywords (at least two different keywords)
- length of title etc
"Write a review paper - Reviews are more likely to be cited than original research papers"
"Deposit paper in Open Access repository" and "Start blogging"
which look harmless and are worth a try as the more avenues your paper are found or mentioned, you have a better chance to be visible and probably won't hurt.
The working papers also suggests as a seperate strategy of making papers Open Access based on a belief in the Open Access Citation advantage. Unfortunately this is a disputed finding and for obvious reasons, of so many X correlates with citation count studies this is the one finding that has gotten the most scrutiny. How much scrunity? Enough studies to fill a bibliography and a systematic review with some findings suggesting in some fields hybrid journals have the biggest Open Citation Advantage though controlling for preprint citations diminishes it. Still I wouldn't run out to pay a huge APC in a hybrid based on this!
Fact is though there are a lot of to my eyes semi-random findings in such correlational studies, for example I recently read a study with a fairly counter-initutive result that says on the balance papers that get cited in reviews tend to be worse off in citation counts eventually, than those that were not. To my eyes this looks like a very sophisticated study (goes beyond the usual let's download papers from Scopus and do a simple correlation/t-test) and it gives fairly good a priori reasons for why the results might be so, but I still wouldn't run around asking review papers to remove their cites of my paper based on just this study!
"Papers published after having first been rejected elsewhere receive significantly more citations (Ball 2012)"
Conclusion
for best citability, the length of title should be no less than 10 words but no more than 25, article length should be.....
Add a comment