Tuesday, 30 August 2022

What do we really know about academic search engine optimization (ASEO) and an announcement.

 Source: http://musingsaboutlibrarianship.blogspot.com/2021/06/what-do-we-really-know-about-academic.html

Announcement

This blog has been using Feedburner to provide RSS and email subscriptions for over a decade (since 2009!). Unfortunately Feedburner has accounced that email subscriptions are going away in July, so to ensure that those of you who have subscribed to this blog via email continue to receive emails, I'm moving everyone to a new service - follow.it for the next blog post in July (estimated release 15th July).

What this means for you.


For those of you who subscribe via email:

No action needed, you wll automatically be subscribed to emails via the new follow.it service  There is no need to create a followit account unless you want to. If you do not wish to continue email subscriptions from this blog and want to be excluded from this service, please email me to aarontay at gmail.com by 2nd of July 2021.


For those of you who subscribe via RSS:

No action needed currently. The Feedburner RSS feed will continue to work, however I highly recommend you switch over to the new RSS feed as it's unclear how long Feedburner is going to be maintained. 

To do so, Enter the following https://follow.it/musings-about-librarianship-aaron-tay into your RSS feed reader

What do we really know about academic search optimization?

One of the areas that I have been asked by researchers in recent years is on how to maximise visibility of their research. 

Thus far, I have had 2 musings on this topic

The first covers how to track your online promotions and to assess what works while the second post focuses more specifically on whether posting additional content like visual abstracts, video abstracts helps with gaining more attention on social media.

All this is well and good but what about Academic Search Engine Optimization (ASEO). If you are unfamilar with the concept of Search Engine Optimization (SEO) it is usually defined as such

Search engine optimization is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic

SEO industry is by now a two decade old industry, where consultants provide services that will help draw traffic to your websites by ranking high on search engine result pages (SERP) which implies of course claiming to know the secrets to getting your website and pages highly ranked on top search engines such as Google.

I understand it is a highly competitive industry, with high stakes (ranking #1 on a popular keyword vs #100 on google makes a big deal for traffic), where gurus pore over comments by Googlers, or run experiments to try to figure out how what makes Google's search algorithms tick and it's a constant cat and mouse game, as SEO experts react to major Google updates that make big changes to the algorithms. 

By analogy, Academic Search Engine Optimization (ASEO) has the same idea which is to optimize your academic content (typically journal articles) so it will rank higher in search engines typically Google and Google Scholar.

The term "Academic Search Engine Optimization" appears to be coined in 2010 , and the first paper that coined the term studied this in the context of what aftects the ranking of Google Scholar.

The first paper on ASEO found that citation have a very strong influence on the ranking of publications found in Google Scholar. This is a result with a strong effect (r=0.9) that has been found over and over again in follow up papers with larger samples and more sophisticated methods and is generally accepted to be one of the things we know for sure about Google Scholar.

In general though, I don't think ASEO has been a hot area of research in the decade that followed (at least if you search by the term). While there has been tons of research around Google Scholar, a glance at the bibliography produced by EC3 Research Group's Google Scholar Digest shows that most of the research on Google Scholar focuses on traditional bibliometrics areas such as comparison of Google Scholar index coverage with other scholarly indexes, whether Google Scholar is appropriate as a ranking tool etc.

As you will see, there is a strand in bibliometrics that while not focusing on ASEO per se, studies factors correlated with citation counts and this is sometimes used to support ASEO recommendations.

That said a recent publication - Increasing visibility and discoverability of scholarly publications with academic search engine optimization published in 2021 caught my eye. How much has the research on ASEO improved since the term was coined in 2010?

What follows will be a short summary of the paper and a discussion of the sources of the evidence for such ASEO tips. I discuss what ASEO really "works" means, difficulty of actually measuring this and musing about whether we can really use correlational observation studies to make confident recommendations for ASEO.

Ultimately the question is  what we can really be sure about ASEO today. 

An summary of ASEO tips in 2021

The article - Increasing visibility and discoverability of scholarly publications with academic search engine optimization published is positioned as a practical piece to advise researchers and contains no original research of it's own but instead draws from other published research and there's a lot to like about it.

Firstly, it nicely divides the advice into sections like

  • Title optimization
  • Keyword optimization
  • Abstract optimization
and even has a section on ASEO for books (which is excellent since a lot of advice acts like journal articles are the only thing).

They also have three nice infographics which are available CC-BY that summarise some but not all the tips. 





But are we sure these ASEO tips really work?

I'm not an expert on ASEO by any means but I'm somewhat familar with the literature around Google Scholar and due to my past work, have a bit of understanding of search engines. Most of the advice in the article look reasonable to me and are worth a try. 

The problem is, are we really sure these ASEO tips work? Can they backfire? How do we know?

One objection that immediately leaps to mind is that a lot of the advice given here are based on research many years old. A seasoned SEO expert would tell you any specific usable optimization tips that are a few years old are pretty much useless since Google changes so quickly.

One answer to that is that if one confines oneself to Google Scholar, given how ASEO hasn't attracted a lot of attention, and that Google Scholar is mostly a side project in Google, it is unlikely the algo has changed as much as Google. Sure there has been papers showing that one can spam Google Scholar with fake papers to increase citation counts  (or see more recent paper ), but I think Google Scholar is on record of not being overly worried as this doesn't seem to have caught on. (Side note: Does predatory journals count as spamming?).

That said I have encountered companies that have approached me with offers to enhance discovery of Institutional Repository contents, basically by converting PDF to xml. In theory that might help, but who knows? Still this is ASEO more akin to traditional SEO being done at the site or repository level (e.g. speed up pages, proper site maps etc), which we will not discuss here  

Still I have doubts. That we really know if ASEO actually works.

But let's define what works mean... it could mean

(a) If I followed the ASEO tips, my paper would rank higher or appear more often than if I didn't in Google Scholar etc

or

(b) If I followed the ASEO tips, my paper would get more visibility leading to more attention, download and ultimately citations.

Clearly (b) is much harder to provide evidence for since even if you achieve a higher ranking on Google Scholar and is seen the casual pathway to acting on it to citing it is a bit further. Just as traditional SEO might help your webpage appear more and get more eyeballs whether they ultimately buy from your website (or whatever action you want them to do) might still not occur.

But let's say we stick with (a) and just focus on visibility on Google Scholar, how sure are we these ASEO tips given work?

The main issue here is that even if ASEO works it is often hard to be sure. With traditional SEO, you have a baseline ranking of your website at a certain point of time, intervene and measure the change. With a lot of ASEO tips like writing titles, abstracts, you just have one shot at it.....

Still, what reasons are we given to believe ASEO works?

Looking at the paper I see two common reasons given.

a) Reasons based on understanding how the academic search engine generally works (first principles)

b) Reasons based on empirical studies - correlational, observational studies


Reasons based on understanding how the academic search engine generally works

"Images and graphs in publications should also be optimized for findability. It is important to note that text in images can only be recognized by search engines if the image is saved in a vector graphics format such as .svg and .eps"

The article gives the above advice for handing images in publications and goes on to point out that text in common image formats like .bmp, .jpg and .png formats are not machine-readable and suggests adding the file caption as metadata to the images.

This I believe is based on very early studies showing Google/Google Scholar could not index such images, assuming this remains unchanged this seems to be worth doing. 

Here's another

From an ASEO perspective there should be more emphasis on the title and less information hidden in subtitles. Many databases do not even display subtitles in the list of results, which indicates that these articles will also be poorly ranked. Furthermore, it is difficult for users to identify the subject of an article, especially if the title is very long and therefore cut when using a mobile device. 

In this case as the paper states, this recommendation is based on the way academic search engines display titles particularly on mobile. They make the very sensible suggestion that firstly creative titles that aren't clear what topic the publication is covering might not be a good idea and even if you combine a creative but unclear main title with a more definitive subtitle it can backfire since the subtitle often isnt displayed or is cut off after the first few characters and the reader might just assume the paper isn't relevant.

Another guideline given for titles, is to avoid Suspended hyphens , special characters, diacritical signs and formulae (basically characters that can't be expressed in UTF-8) in titles because they reduce findability.

The issue with suspended hyphens is that we as human readers can make the semantic connection but search engines often cannot. Humans will readily combine the parts ‘pre-’ and ‘natal’ to pre-natal or prenatal, while search engines cannot always combine them. 

 Again this rings true, as a librarian, I can tell you whenever I search for a publication long title with a hyphen or  colon and I can't find it, I will try to search again with a short fragment without the hyphen or colon just in case and often the publication appears! 

So findability is definitely an issue. 

Mathematical formulae can create another set of problems: the representation in basic text code. It is vital to limit characters to basic text code like UTF-8.11 In that case, if special characters like the Φ symbol cannot be displayed properly the problem lies with the end user’s device. ...In order to find the book, you will need to use the Φ symbol. If the platform that hosts the book cannot display a special character like the Φ symbol, the search functionality is impaired."

Of course all these issues make it hard for a publication to be properly indexed and when citation indexes try to link references to source publications (typically done automatically), such problems can lead to real citations being missed  ( being seen as a citation variant) leading to lost citations!

There are a lot more such examples, they sometimes don't even cite peer reviewed papers for evidence because they are based on common sense reasoning. 

I think such advice is probably harmless, though I don't really know if we can be sure they actually do help.


Reasons based on empirical studies - correlational, observational studies

So are there actually empirical studies that give you some assurance these tips help? This is where things get iffy. 

There is indeed quite a lot of papers studying what affects citation rates of papers and unfortunately the vast majority are basically correlational studies.

For example there are quite a lot of studies on the effect of different charateristics on titles on citation counts and they are inevitably correlational studies.

Just taking one example, The effect of characteristics of title on citation rates of articles provides a correlational study covering different aspects of titles from

  • use of a hyphen or a colon separating different ideas within a sentence
  • articles with different words in the keywords (at least two different keywords)
  • length of title etc
Beyond effect of titles, other studies have studied length of articles, reference lists , effect of authors and more.

So can we just use these findings to provide support for ASEO tips? For example this working paper "extracts 33 different ways for increasing the citations possibilities." based on such studies.

The issue of course is these are only correlational studies should we based ASEO tips on these studies? By doing so are we making casual claims based on studies that do not warrant such claims? 

While evidence of correlation does not necessarily mean there are no casual links (for example the evidence for Google Scholar weighting relevancy by citation count is so strong few people doubt this is a casual factor), most correlational studies are not on as strong footing and  the studies sometimes or often contradict.




Even for ASEO tips that seem to make sense like avoiding the presence of colons in title (see above) results have been mixed with some studies finding no effect, while others finding an effect. Similarly with studies about the length of titles, with some titles suggesting longer titles get more citation counts and others the opposite. 

No doubt confounding factors are at play plus the usual replicability issues, but I think one should be careful about giving confident recommendations for ASEO based on just these studies. 

In fact, in the working paper I mentioned they are careful to state these are for "increasing the citations possibilities" which is careful phrasing. Still when I look through the 33 strategies, I think some of the strategies suggested are on far stronger footing than others.

For example among the 33 strategies there are ones like

"Write a review paper - Reviews are more likely to be cited than original research papers"
which I think is one of the findings that is as solid as you get in this arena, given that citation metrics now normalize by type (article, review).

Then there are ones like
"Deposit paper in Open Access repository" and "Start blogging"

which look harmless and are worth a try as the more avenues your paper are found or mentioned, you have a better chance to be visible and probably won't hurt.

The working papers also suggests as a seperate strategy of making papers Open Access based on a belief in the Open Access Citation advantage. Unfortunately this is a  disputed finding and for obvious reasons, of so many X correlates with citation count studies this is the one finding that has gotten the most  scrutiny. How much scrunity? Enough studies to fill a bibliography and a systematic review with some findings suggesting in some fields hybrid journals have the biggest Open Citation Advantage though controlling for preprint citations diminishes it. Still I wouldn't run out to pay a huge APC in a hybrid based on this!

Fact is though there are a lot of to my eyes semi-random findings in such correlational studies, for example I recently read a study with a fairly counter-initutive result that says on the balance papers that get cited in reviews tend to be worse off in citation counts eventually, than those that were not. To my eyes this looks like a very sophisticated study (goes beyond the usual let's download papers from Scopus and do a simple correlation/t-test) and it gives fairly good a priori reasons for why the results might be so, but I still wouldn't run around asking review papers to remove their cites of my paper based on just this study!

For an example of a ASEO tip given that seems odd to me is the following
"Papers published after having first been rejected elsewhere receive significantly more citations (Ball 2012)"
I haven't read Ball 2012, but this sounds really odd to me, and I'm not sure what authors supposed to do with this finding?


Conclusion

My view is ASEO is currently still in a nascent state and one must be careful about overpromising. I've seen invited speakers confidently give suggestions like the title should be X to Y words long, reference lists so and so long based on such isolated studies.

While I understand the desire from researchers to want clear cut guidance, I think we need to avoid giving misleading advice particularly when such advice isn't easy to follow or natural to do.

For instance, telling researchers to use ORCID and other PIDs when possible , deposit to repostories when possible and to some extent avoid hyphens in titles probably won't hurt.

But often savvy researchers want more and may even push us to give less obvious advice, still I would be really iffy about giving definitive advice like
for best citability, the length of title should be no less than 10 words but no more than 25, article length should be..... 
This is not to say actual studies on ASEO similar to RCT (randomised control trials) cannot be done to give us better confidence to make casual claims the way SEO does so, but thus far I have not seen much academic research along those lines.

That said, there might indeed be some private underground group of experts who have indeed studied ASEO deeply (some publishers? some private consultants) and are keeping such knowledge private only to paying customers. After all if hard earned ASEO techniques that work become generally known, everyone adopting them will negate their effectiveness.

I also suspect if ASEO becomes a real thing (the way SEO is), it would overall be a net harm, with only ASEO specialists gaining benefits, as everyone spends a lot of effort to keep up while remaining in place (like the Red Queen Effect where you run to just keep in the same spot?). 

Fortunately, I suspect despite the intense competition in academia as it currently stands there is an instinctive dis-taste about such techniques which are seen as "gaming" which goes against the general belief that good research will/should naturally rise to the top.

Perhaps just as there is "White hat SEO" and "Black hat SEO", ASEO techniques might be classified similarly..

However, while this might be so now , who knows what the future will bring....



0

Add a comment

No comments:

Post a Comment