What’s a citation good for, anyway?

put a great deal of value on the citation in the academic and library
worlds: we count them and collect them, teach people how to use and make
them, use them to track down objects and judge quality with them. But
we rarely actually articulate the many ways in which citations are used,
or what makes one citation valuable and another poor quality, and how
to tell.

also rarely ask these questions on Wikipedia, although we spend a great
deal of time — hundreds of thousands of person-hours — creating and
verifying citations. We hang notability of entire topics on citations;
we emphasize them as crucial to our success as a useful reference
source. But we rarely interrogate them. It’s not at all clear that the
citations are heavily used — most people read the first few paragraphs
and stop, after all — and while there’s much guidance on the kinds of
things we’re hoping to cite, there’s little on the ideal citation model
itself. Citing style on Wikipedia has haphazardly followed a few
different academic models and at least on the English Wikipedia, these
have merged into a folk style all its own. Are the ways we cite sources
sufficient for what we are trying to do with them? This is a crucial
question that is not to my mind answered.
let’s start from first principles and articulate what a citation is
used for, outside of Wikipedia: in an academic paper, in a book, in a
court case or patent — anywhere they might show up.
1. First and foremost, a citation acts as an identifier of a unique work.
A citation tells you that a certain thing — often a written thing, but
also perhaps a visual work, or computer software, a statue or movie or
something else in a fixed medium — is meant.
course, this usually means (to follow the FRBR model) the work, not the
instantiation of the work. When I cite a journal article, or
Huckleberry Finn, I mean the general idea of that article, or the book
written by Mark Twain, not a specific copy of the journal or the book.
This isn’t always true though. Anyone who writes about rare books, or
art, or webpages, often means to cite a particular unique thing — that
copy of that painting in the Louvre, that particular post on
Reddit — and extra information must be given in those cases, such as
where that unique thing is located.
commonly, the specific instantiation — the exact book I have in my
hands — might not matter, but the version of the work — the specific
edition, translation, or printing — might matter a great deal: a page
citation in the 2nd printing of the Indian paperback edition of a
textbook will have an entirely different page number from the original
UK hardback edition, despite citing the same material. One translation
of Ovid is not the same as another. This is accounted for in citation
styles that deal with literary works, but is treated haphazardly in
in mind, uniquely identifying a work can be easier said than done — the
library and software community is still trying to figure out how to
appropriately cite computer software, or video games, such that the
right version is identified. We also often confuse what does need a
unique identifier (a work that only has one location, such as a single
webpage) with a work that doesn’t, but happens to have several locations
where it might be found, including the one cited (the copy of
Huckleberry Finn on Project Gutenberg).
A citation’s next job, nearly as important (and of particular interest
to reference librarians), is to enable a person to locate the specific
work that is meant.
that I know what it is, can I get my hands on it to read it? Where does
a copy of this work exist? To answer these questions, it’s not enough
to just give the author and title of a journal article — if you do, you
are putting the burden of searching for extra information in a
bibliographic database on the reader. Instead, we give the source title
of the journal, and the volume and page — a crude but effective locator
system for finding an article in a long run of print journals. Nowadays,
unique online IDs serve that purpose, saving everyone’s time by taking
you directly to the online location of the article. But, just having the
ID is not enough in a citation: if there’s a typo in the number, as
there may well be, you need some fall-back information to locate the
article; and if there’s no online ID, you usually need the volume, page,
etc. even for finding the online copy. Often, it’s only the combination
of the author, title and journal that uniquely identifies an article,
and makes it possible to find; more information is better.
report numbers, journal volumes, encyclopedia titles: these are all
used for location. Generally, these pieces of data can only physically
locate the source if they are first mediated through a (sometimes
arcane) search system — I use a library catalog to translate a journal
title into possible holdings locations, for instance. To aid in all
this, the precise form of the citation punctuation — the fact that we
put issue numbers in parentheses and pages after that — does serve as a
shorthand to these locator fields, and our academic training supports
knowing which fields to search on in what search systems (book titles in
catalogs, article titles in bibliographic databases, etc). So without
these metadata fields, we’re lost.
But even this function, historically, is highly discipline-dependent in citation styles. See this citation:
Aguirre, J. E., Ginsburg, A. G., Dunham, M. K., et al. 2011, ApJS, 192, 4
is formatted in the recommended format for the American Astronomical
Society, a major publisher in the field of astronomy (and is in fact
copied from their author instructions). How meaningful is this citation
to a historian, a biologist, or a non-academic? As a science librarian, I
can tell you off the top of my head that article was published in the
Astrophysical Journal supplement series, volume 192, page 4, and that
furthermore it’s probably online (because it was published in 2011) but
also that it is not included with the journal ApJ (because it’s a
supplement). Everyone else who wasn’t trained as a physicist or a
physics librarian, however, is left to search around for the wretched
thing without an article title or a unique identifier to help them. How
meaningful or useful is such a citation format outside of the
astrophysical community? Tracking journal abbreviations, the vagaries of
report numbers, and odd metadata formatting is the bread and butter of
reference librarians, but is also necessary for a truly useful citation
system in Wikipedia that draws on the academic literature which is,
without exception, inconsistent and field-specific.
Another citation example:
Bennett, P., “Engine Oils and Engine Durability,” SAE Technical Paper 690767, 1969, doi:10.4271/690767.
is it? Neither fish nor fowl, this is not an article or a book but an
SAE technical paper, a kind of report published by the Society of
Automotive Engineering and given a unique number. First these reports
were published in print in big books with indexes, now they are online;
the report number makes perfect sense — if you already know what it is.
There is a DOI in this citation to the online (paywalled) version, but
that’s a recent addition — for older citations, you’d just have to know
to look the paper up by that number in a certain series book. (Plus, for
actual findability, many academic engineering libraries have these
older SAE papers in print but not online). So identifying all parts of
this citation is crucial for writing a correct citation. The lesson is
that restrictive citation systems that don’t leave flexibility and room
for things like unexpected and unique report numbers will fail in always
producing citations that are effective identifiers or locators.
Identifying a thing in the world: related to the above two functions,
but slightly different in execution, a citation might be to a thing that
exists in the world but is not a human-created work: a chemical, say,
or a star, or a species.
There are complex, conflicting,
overlapping and occasionally proprietary identifying schemes for all of
these types of things (and everything else in the world that humans have
studied as well). Confusingly, these systems, like article citations,
sometimes but not always conflate location and identification functions.
ID systems for interstellar objects use international IDs and names
which can be used against reference material to help you find the object
in the sky, whereas zoological identifiers use a Latin name and the
reference to the author/year of the first paper identifying the species,
which acts as a kind of unique identifying system and a handy pointer
to the literature in one.
Vanessa (Vanessa) Fabricius, 1807
— an example of a species reference, from the Wikipedia article about zoology author citation style
IDs, such as the proprietary but universally used CAS numbers, however,
simply serve to disambiguate that particular chemical, while giving you
no information about where the chemical might actually be procured from
or found. This is all highly field-specific, of course.
4. Credit where credit’s due: don’t plagiarize!
We tell our students this, drilling this lesson into them practically
from grade school on. Give credit to people whose ideas you are using!
We use citations as a mechanism of acknowledgement — letting the world
know who we are building on. How effective this is, of course, depends
entirely on how clear and precise the citation is and what the text
itself that holds the citation says. Citing an entire book, when what
you’re quoting is a sentence on a particular page, doesn’t do much good
for acknowledgment; neither does citing a whole paper when what you used
is a specific figure, which was perhaps itself reused from another
source. Here, granularity and specificity to the extent possible is key.
Getting credit: here is where academia comes to the fore. We hang
entire careers on writing papers and citing them in academia, and having
a highly-cited paper is a mark of prestige.
We build elaborate
systems for counting who cites who just for this purpose (though almost
all of them leave things out and only work for some fields). We want to
know if others have used our work, and we want to trace where and why.
Human pride in our own work powers much of scholarship. As an academic
community, at least in the American system, citations and citation
counting is so important that it is not yet clear how to get academic
credit (i.e., tenure review) for things that don’t follow a traditional
citation and review model, like blog posts: we only know how to deal
with work that falls into particular molds. The burgeoning field of
altmetrics seeks to change this by counting who cites who for all sorts
of online work, but it will be slow going. The issue of getting credit
in citations is so important that entire disciplinary standards and
ethical matters hinge on whose name is listed first in a list of
authors, with it being understood that the first author — or in some
fields, confusingly, the last author — is the most important and
deserves the “most” credit, unless of course there’s another standard or
it’s otherwise specified.
Educating others about the field and acknowledging our own roots: we
cite to let others know we know the field, and to signpost it for them.
I write a paper about the history of encyclopedias, the chances of me
citing Diderot are extremely high, whether or not I focus in on the
Enlightenment. We like to acknowledge the preeminent works in the field,
the first papers, the groundbreaking papers, the central pillars of
thought in the community. This is especially true in things like
textbooks and encyclopedia articles where the bibliography is meant to
educate and guide readers. This can be a distorting factor, of course,
in how much a given central paper is cited versus any other paper,
whether it’s truly used or not. Of course, another factor that limits
the usefulness of such an educational bibliography is how much
information is included: one of the biographical dictionaries I use a
lot (The Dictionary of Scientific Biography)
has extensive and very good bibliographies to comprehensive works about
the subject, but doesn’t note the language those works are written in,
which isn’t very useful if you track something down and it turns out to
be in a language you don’t speak.
7. Padding our resumes:
not unrelated to the above two factors, authors are people too, and the
urge to cite ourselves — whether rightly (we are making mention of our
previous work because we are building on it) or wrongly (we are
unnecessarily making mention of our previous work) is strong. I once had
a graduate student tell me he never used article bibliographies,
because they were all just self-cites anyway. That may be a little
extreme, but it’s not entirely untrue.
8. Judging the quality of a work:
when an academic reviews a paper, we also look at the citations — is
there a comprehensive survey of past work, have the central things in
the field been cited, have the things cited been published by reputable
publishers and in an appropriate time period? This review gets more
stringent the more comprehensive and important the work — people’s
dissertations do hang on having a good literature review. All of this of
course is highly subjective, and is subject to all of the factors
Legal precedent: this is a special way in which citations function in
the law and in some documents such as patents, where citations to past
cases (or inventions) establish what precedent has been invoked.
(or perhaps because of?) having such a weighty and important purpose,
legal citation is probably the most arcane and impenetrable citation
style there is, almost useless to people without training in it. Law is
also the only field I know of where there are actual classes — entire
academic classes — on how to cite things, which may mean that the
lawyers have gone a bit overboard in making it difficult.

findability, acknowledgement, getting credit, showing off one’s
knowledge and establishing precedent: that’s not bad for a one- or
two-line code. But what else do citations additionally do in Wikipedia?

1. Establish notability:
we judge notability, at least in the English Wikipedia, based on a
rather elaborate and not particularly scientific combination of the
number of citations and their “quality”, by which we mean their
pertinence to the topic — does the citation cover it in any depth — and
the quality of the publisher of that citation.
quality piece is tricky, because it is so very dependent on the subject
being described. If we’re talking about astronomy, it’s not so hard,
perhaps; I trust the reviewing standards of the American Astronomical
Society, and the other publishers in the field, and I trust that an
article published in one of their journals is likely truthful and about
something new. (Though of course we also have the arXiv, where most
astronomy papers are published today, and poor-quality journals that
also don’t review or review badly). But what if our field is general
news, or celebrity biography? Who’s to say that a particular newspaper
or gossip site is or isn’t reputable, factual or neutral? At any rate,
epistemological concerns aside, a citation should have enough
information — date and sourcing and language and extent of coverage of
the topic — to let us know if it really is useful for establishing
notability or not. One line in the New York Times
does not a full biography make, but can it be used for establishing
notability? We can’t answer that question unless we first know the depth
of our citations.
2. Establish quality:
as per above, once the topic is written about, we often use the
citations to judge the overall quality of the entry, and its approximate
trustworthiness, for better or worse.
3. Provenance of facts:
unlike most other types of technical or academic writing, Wikipedia is
perhaps unique in requiring that everything come from an outside source.
We struggle with this; does a footnote at an end of a paragraph mean
that the whole paragraph came from that source, or just the last
sentence? How can we be sure the nuance of that sentence is actually
backed up by whatever the source says? Is the source itself trustworthy,
and according to who (see above)? Do we have any special guarantee the
author looked at the source — can we access it ourselves, and how?
(Arguably, a citation to a rare manuscript or out of print book that’s
only held in one country’s libraries is as useful, to most readers, as
no citation at all).

Where does this leave us?

must be flexible, to deal with the wide variety of identifier schemes
and odd citation structures that exist in the world. At the same time,
they should respect historical and long-engrained formatting that
enables them to be human readable-and-parsable and useful for their
locator duties. They should, ideally, indicate the relationship between
the source and the new text — this is a job that has never been
historically possible, but may be with new online identifying and
annotation systems (I think about the now nearly 20 years old
experimental wiki system PurpleNumbers from Doug Englebart, which as a proof of concept is still one of the best line and paragraph identifiers I’ve seen).
that are dependent on points in time or unique and possibly ephemeral
instances (a dynamic news webpage) should indicate that. Citations
should have semantic data sufficient to allow both giving and getting
credit. And they should be transferable between different systems:
journal abbreviations should map to journal titles, and back again. Law
citations should be expandable for the rest of us. And unique IDs should
map to the content without losing the information that is encoded in
the rest of the citation, however slight that is.

citation is a small and underappreciated miracle of scholarship: an
imprecise encoding device that nearly everyone gets slightly wrong (I’ve
never seen a paper yet, published or not, that didn’t have some sort of
formatting issues or typos in the citations) — and yet, despite all
this, for those trained in their ways they are instantly recognizable
and serve a multitude of purposes. Citations deserve better than we give
them: they deserve to shine.

written for the first WikiCite hackathon, May 2016

