Source: https://musingsaboutlibrarianship.blogspot.com/2023/09/list-of-academic-search-engines-that.html
Sep
28
List of academic search engines that use Large Language models for generative answers and some factors to consider when using
List of academic search engines that use Large Language models for generative answers
This
is a non-comprehensive list of academic search engines that use
generative AI (almost always Large language models) to generate direct
answers on top of list of relevant results, typically using Retrieval Augmented Generation (RAG) Techniques. We expect a lot more!
This
technique involves grounding the generated answer by using a retriever
to find text chunks or sentences (also known as context) that may answer
the question.
Besides generating direct answers with citations, it seems to me this new class of search engine often but not always
a) Use Semantic Search (as opposed to Lexical search)
b)
Use the ability of Large Language Models to extract information from
papers such as "method", "limitations", "region" and display them in a
literature review matrix format
For more see recording by me - The possible impact of AI on search and discovery (July 2023)
The table below is updated to 28th Sept 2023
Name | Sources | LLM used | Upload your own PDF? | Produces literature review matrix? | Other features |
Elicit.com/old.elicit.org | Semantic Scholar |
OpenAI GPT models & other opensource LLMs | Yes | Yes |
|
Consensus | Semantic Scholar | GPT4 for summarises | No | No, has Consensus meter | |
scite.ai assistant | Open Scholarly metadata and citation statements from selected partners | "We
use a variety of Language models depending on situation." GPT3.5
(generally), GPT4 (enterprise client), Claude instant (fallback) | No | No |
|
scispace | Unknown | Unknown | Yes | Yes | |
Zeta alpha (R&D in AI) | Mostly Comp Science content only | - OpenAI GPT Models | No | NA |
|
Core-GPT / technical paper (unreleased?) | CORE | GPT4 | No | No | |
Scopus.ai (closed beta) | Scopus index | ? | No | No |
|
Dimensions AI assistant (closed beta) | Dimension index | Dimensions General Sci-Bert and Open AI’s ChatGPT. | No |
NA |
|
Technical aspects to consider
- What is the source used for the search engine?
A
lot of these tools currently use Semantic Scholar, OpenAlex, Arxiv etc
which are basically open scholarly metadata and open access full-text
sources. Open Scholarly metadata is quite comprehensive, however using
open access full text only may lead to unknown biases.
Scite.ai
here probably has the biggest advantage here given it also has some
paywall full-text (technically citation statements only) from publisher
partners.
That said, you cannot assume that just because the source includes full-text it is being used for extraction.
For
example, Dimensions and Elicit which do have access to full-text do not
appear to be currently using it for direct answers. For technical or
perhaps legal reasons their direct answers are only extracted from
abstracts. This is unlike Scite assistant which does cite text beyond
abstracts.
Elicit does seem to use the available full-text (open access) for generate of the literature review matrix.
- Are there ways for users to check/verify accuracy of the generated direct answer, or extracted information in the literature review matrix?
RAG
type systems ensures hat the citations made are always "real" citations
found in their search index, however there is no guarantee that the
generated statement is supported by the citation.
In
my view, a basic feature such systems should have is a feature to make
it easy to check the accuracy of the answers generated.
When
a sentence is followed by a citation, typically the whole paper isn't
being cited. The system grounds ititsnswer based on a sentence or two
from the paper. The best systems like Elicit or scite assistant make it
easy to see which are the extracted sentences/context used to support
the answer. This can be done via mouseover (scite assistant) or with
highlights (elicit).
- How accurate are the generated direct answers and/or extracted information in the literature review matrix in general?
Features
that allow users to check, verify answers are great, but even better is
if the system can provide some scores to give users a sense of how
generally reliable the results are over a large number of examples.
One way to measure such citation accuracy is via citation precision and recall scores. However,
such scores only measures whether the generated statement and citation
given supports the generated statement but do not measure if the
generated statements actually answer the question!
A more complete solution is based on ragas framework which measures four aspects of the generated answer
The first two relate to generation part of the pipeline
- faithfulness - measures how consistent the generated answer is with the contexts retrieved. This is done by checking if the claims in the generated answers can be deduced from the context
- Answer Relevancy - measures if the generated answer tries to address the question. This does not actually check if the answer is factually correct (which is checked by faithfulness), there might be a tradeoff between the first two
The second two relate to the retrieval part of the pipeline or measures how good the retrieval is
- Context Precision - This looks at whether the retriever is able to consistently find contexts that are relevant to the answer such that most of the citations retrieved are relevant.
- Context Recall - This is the converse of the context precision, is the system able to retrieve most of the contexts that might answer the question
The final score could be a harmonic mean of all four scores.
It
would be good if systems could generate these stats for users to have a
sense of the reliability of these systems, though as of time of writing
none of the academic search systems have released such evaluations.
- How generative AI features are integrated in the search and how it affects you should search
We are still very early in the days of search+generative AI. It's unclear how such features will be integrated into the search.
There are also dozens of ways to do RAG/generative AI + search, either at inference time or even at pretraining stage
- How does the query get converted to match the retrieved contexts - some examples
- It could just do simple type of keyword matching
- It could ask prompt the language model to come up with search strategy which is then used
- It could convert the query into embedding and match with preindexed embeddings of documents/text
- How do you combine the retrieved contexts with the LLM (Large Language model)
How it is implemented can lead to different optimal ways of searching.
For example, say you looking for papers on whether there is an open access citation advantage. Should you search like...
1. Keyword Style - Open Access citation advantage
2. Natural Language style - Is there an Open Access citation advantage?
3. Prompt engineering style - You
are a top researcher in the subject of Scholarly communication. Write a
500 word essay on the evidence around Open Access citation advantage
with references
Not all methods
will work equally well (or at all) for these systems even those based on
RAG, e,g, Elicit works for 1&2 but not 3, scite assistant works for
all even #3.
- Other additional features
As
shown in the table above, other nice features include the ability to
upload PDFs for extraction to supplement the limitations of the tool's
index is clearly highly desirable.
Scite
assistant currently provides dozens of options to control how the
generation of answers work is also an interesting direction. For
example, you can specify the citations must come from a certain topic,
journal or even individual set of papers you specify,
- Other Non-technical factors
The
usual non-technical factors when choosing systems to use apply of
course. This includes, user privacy (is the system training on your
queries), sustainability of the system (what's their business model?)
etc,
Some (non-comprehensive) list of general web search engines that use LLMs to generate answers
- Bing Chat
- Perplexity.ai
- You.com
Side
note : Some systems are chatbots where it may decide to search when
necessary, as opposed to Elicit, Scispace which are search engines that
always search....
Some (non-comprehensive) list of Chatgpt plugins that search academic papers - Requires ChatGPT Plus (default is Bing Chat)
- Scholar.ai
- Consensus search
- Research by Vector
- Scholarly
- Litmaps
- Paperpile
- Science
- NextPaper.ai
- txyz
- Scholarly Graph Link
- Bibliography Crossref
- MixerBox Scholar
- Scholar assist
- Scholarly insight
Note a lot just cover arxiv or at best open access papers or metadata.
No comments:
Post a Comment