What would science look like if it were invented today?
By Daniel Mietchen
The Internet represents an opportunity to change this system,
one which has created a 300-year-old, collective long-term memory, into
something new and more efficient, perhaps adding in a current,
collective short-term working memory at the same time. With new online
tools, scientists could begin to share techniques, data and ideas
online to the benefit of all parties, and the public at large. (Robert J. Simpson, paraphrasing Michael Nielsen)
Sure, it is hard to imagine you reading this blog post in a world
which hadn't yet engaged in science but the question "What would email
look like if it were invented today" was recently addressed during the presentation of the Wave protocol,
and entertaining some similar ideas on reinventing science may perhaps
be worthwhile: how would a system have to be designed that creates and
structures knowledge such that these two complex processes can
effectively feed on and adapt to each other, making use of the most
appropriate technologies at hand? Both processes are highly
interrelated but to facilitate the discussion, we will first consider
them separately (in this and the next issue of the Euroscientist), and then provide a synthesis (to which you can contribute).
Part I: What would knowledge creation look like if it were invented today
The basic components of research
Let us start by considering scientific knowledge creation — or
research, for short. Within the framework of existing knowledge, this
requires, as a first step, the identification (and perhaps further
characterization) of a gap to be bridged or closed, albeit some
methodologists prefer or even have to construct their bridges before
choosing a suitable place to install them.
Once such a gap has been identified (we will leave a detailed
consideration of this process for later), three basic components are
necessary to close it, usually following each other as stages of a
research project:
Planning: an idea on how to bridge or close the gap
Realization: the means to put the idea into practice
Verification: independent assessment of the realization.
A fourth component is crucial to the process — appropriate
communication during and across the three basic stages as well as
beyond individual research projects. Traditionally, this was (and still
is) accomplished separately for each of them:
Grant proposals after an idea had been prepared for realization,
Conference and journal papers once the realization had progressed, and
Further papers (by independent investigators) once replication had been attempted (e.g. as a control experiment in a follow-up study).
The decoupling of this fourth component from the other three, however, is simply a trait our research landscape has inherited from the era of paper-based scientific communication, and by far not a technical necessity today when basically any kind of information can be shared instantly
(with few exceptions, e.g. patient data) within and beyond the
scientific community. For our purposes, we will thus reframe the
concept of putting ideas or results on paper as putting them on a wiki,
a blog, a dedicated online repository or successors of these (e.g. as
blips or wavelets within the proposed Wave protocol) — in any case a shared
research environment — from where they can be syndicated and aggregated
in various forms and embedded in other digital environments.
Hello to public research environments online
In this kind of framework (best known as Open Science, henceforth public research environment
to emphasize that the concept is applicable across disciplines and that
communication in and with the public is different from science as we
know it), individual contributions (or comment thereupon) can be automatically assigned a unique identifier (henceforth contribution ID; this may be a revision number with time stamp in wikis or databases, a DOI for journal articles or an ISBN
for books), linked to its originator (henceforth contributor ID;
usually the user name) as well as other relevant information (e.g. funding sources),
and aggregated in various forms. In a paper-based system, contributor
ID is mainly composed of an author's surname plus some representation —
variable across journals — of given names, such that a single
contributor ID may be shared by different individuals whose names are
identical or similar, while some individuals (especially those with
multiple initials, with non-English characters, or who changed their
name after marriage) may have more than one contributor ID. For online
platforms, the contributor ID is generally unique within but not across
individual online platforms, although a number of solutions towards unique identification of contributors have been implemented (e.g. OpenID), including some specifically targeted at scientists (e.g. Researcher ID).
Each contribution ID can not only be linked to its contributor but
also tagged (similar to the keywords currently accompanying manuscripts
or grant proposals) and have their quality assessed (or rated, for short) by individual contributors (perhaps as a function of the overlap between the tags for their personal expertise and those of the contribution under consideration) according to a pre-defined set of evaluation criteria
(e.g. appropriateness to the current stage of a given project,
reliability of the information supplied, or presentation with enough
context to be understood by specialists and/ or the public). Some
journals already allow such ratings and further comments. However, none of them currently provides aggregations of ratings or comments by contributor, although technical standards for such purposes are operational (e.g. hreview). Despite possible herding effects and other sources of error, the principle feasibility (not the effectiveness)
of generating and aggregating such user-defined metrics has been
demonstrated on multiple online platforms, especially in non-scholarly
environments (tagging: Flickr; rating: Ebay) but in some scholarly ones too (tagging at CiteULike).
No working implementation currently exists that would address the lack of incentives
for scientists to engage in collaborative research assessment of this
sort but since both publishers and funding agencies have managed to
coerce scientists and their institutions into all sorts of behaviour
during research assessment exercises in the past and present, they
should have no problems providing incentives to participate in this one
which has the added benefits of being both transparent and beneficial
to the scientific community as a whole (it is of note in this respect
that there are very few incentives in the current system
to deliver timely, fair and detailed peer reviews for grant proposals
or manuscripts). One way to do this would be to require that every
reference cited should be rated
by the citing researchers (some journals already single out a few
references in this manner as being "of outstanding interest" or
similar, but aggregating such ratings of single references in a global database like Open Library
would be more helpful), another would be to include both the quality
and the quantity of a specific researcher's ratings (both active and
passive) into the determination of the variable portion of her research funding,
perhaps with some sort of normalization by the usage frequency of the
tags involved (to balance between large and small fields of inquiry,
and to avoid exaggerated claims). The remaining obstacles to a wider
adoption of such transparent reputation schemes based on a public
research environment with unique contribution and contributor ID
schemes are thus not of a technical nature, and we shall assume these
features to be available for the system we are about to design.
So far, we have only covered technical aspects of redesigning a research system emancipated from the paper medium but, as Michael Nielsenput it,
"[T]here is a second and more radical way of thinking about how the
Internet can change science, and that is through a change to the
process and scale of creative collaboration itself, enabled by social
software such as wikis, online forums and their descendants." In a
similar vein, Timothy Gowers started the Polymath project with a blog post discussing the following idea:
It seems to me that, at least in theory, a different
model could work: different, that is, from the usual model of people
working in isolation or collaborating with one or two others. Suppose
one had a forum (in the non-technical sense, but quite possibly in the
technical sense as well) for the online discussion of a particular
problem. The idea would be that anybody who had anything whatsoever to
say about the problem could chip in. And the ethos of the forum — in
whatever form it took — would be that comments would mostly be kept
short. In other words, what you would not tend to do, at least if you
wanted to keep within the spirit of things, is spend a month thinking
hard about the problem and then come back and write ten pages about it.
Rather, you would contribute ideas even if they were undeveloped and/or
likely to be wrong.
This short way of communication is taken to an extreme via the
exchange of text messages over mobile phones and web platforms,
particularly Twitter or the social aggregator FriendFeed, and even though scientists clearly form a minority on such platforms, they didbegin to incorporate them into their research.
Quick poll: did you check any references in this post so far? How
did you did that? And how do you usually do it when you read a paper?
Sadly, even though most scientific journals now publish their content
on the internet, most of the formatting is still being performed with
paper as a target — only rarely are hyperlinks incorporated even in the
online versions. Online environments, on the other hand, are built
around hyperlinks and allow to embed basicallyanykindofmedia, for example the Science Commons video below that highlights the value of sharing scientific information.