Tuesday, March 28, 2017

Novel processes and metrics for a scientific evaluation: preliminary reflections

Reflections on  Michaël Bon, Michael Taylor, Gary S. McDowell. “Novel processes and metrics for a scientific evaluation rooted in the principles of science - Version 1”. SJS (26 Jan. 2017)

Following are my initial reflections on what I would describe as a ground-breaking effort toward articulating a radically transformation of scholarly communication, a transformation that I regard as much needed and highly timely as the current system is optimized for the technology of the 17th century (printing press and postal system) and is far from taking full advantage of the potential of the internet.

The basic idea described by the authors is to replace the existing practices of evaluation of scholarly work with a more collaborative and open system they call the Self-Journals of Science


The title Self-journals of science: I recommend coming up with a new name. The name is likely to give the impression of vanity publishing, even though this is not what the authors are suggesting, which appears to be more along the lines of a new form of collaborative organization of peer review.  

Section 1 Introduction: the inherent shortcomings of an asymetric evaluation system appears to attempt to describe how scientific communication works, its purpose, and critique, with citations, in just a few pages. This is sufficient to tell the reader where the authors are coming from, but too broad in scope to have much depth or accuracy. I am not sure that it makes sense to spend a lot of time further developing this section. For example, the second paragraph refers to scientific recognition as artificially turned into a resource of predetermined scarcity. I am pretty sure that further research could easily yield evidence to back up this statement - e.g. Garfield's idea of the "core journals" to eliminate the journals one needn't bother buying or reading, and the apparently de facto assumption that a good journal is one with a high rejection rate. On page 3, first paragraph, 4 citations are given for one statement. A quick glance at the reference list suggests that this may be stretching what the authors of the cited works have said. For example, at face value it seems unlike that reference 4 with a title of "Double-blind review favours increased representation of female authors" actually supports the author's assertion that "Since peer-trials necessarily involve a degree of confidentiality and secrecy..many errors, biases and conflicts of interest may arise without the possibility of correction". It seems that the authors of the cited article are making exactly the opposite argument, arguing that semi-open review results in bias. If I was doing a thorough review, I would look up at least a few of the cited works and if the arguments cited are not justified in the cited works I hand the work of reading the works cited and citing appropriately back to the authors.

The arguments presented are provocative and appropriate for initiating an important scholarly discussion. Like any provocative work, the arguments may be relatively stronger for the task of initiating needed discussion but somewhat weak due to lack of counter-argument. For example, the point of Section 1.4 is that "scientific conservatism is placing a brake on the pace of change". Whether anything is placing a brake on the pace of change in 2017 is, I believe, arguable. However, the authors also do not address the benefits of scientific conservatism here, although the arguments made elsewhere e.g. "The validity of an article is established by a process of open and objective debate by the whole community" are arguments for scientific conservatism (or so I argue). The potential benefits of scientific conservatism are not addressed. For example, one needs to understand this tendency of science to fully appreciate the current consensus on climate change.

Section 2 defines scientific value as validity and importance

There are some interesting ideas here, however the authors conflate methodological soundness with validity. A research study can reflect the very best practices in today's methodology and present logical conclusions based on existing knowledge while still being incorrect or invalid (lacking external validity) for such reasons as limitations on our collective knowledge. A logical argument based on a premise incorrectly perceived to to be true can lead to logical but incorrect conclusions.

The authors state that "the validity of an article is established by a process of open and objective debate by the whole community". This is one instance of what I see as overstatement of both current and potential future practice. Only in a very small scholarly community would it be possible for every member of the community to read every article, never mind have an open and objective debate about each article. I think the authors have a valid point here, but direct this at the wrong level. This kind of debate occurs with the big picture paradigmatic issues such as climate change, not at the level of the individual article.

Perceived importance of an article is given along with validity as the other measure for evaluation of an article.  This argument needs work and critique. I agree with the author (and Kuhn) about the tendency towards scientific conservatism, and I think we should be aware of bias in any new system, especially one based on open review. People are likely to perceive articles as more important if they advance work that falls within an existing paradigm or a new one that is gaining traction than truly pioneering work. With open review, I expect that authors with existing high status are more likely to be perceived to be writing important work while new, unknown, female authors or those from minority groups are more likely to have their work perceived as unimportant.

I do not wish to dismiss the idea of importance, rather I would like to suggest that this needs quite a bit of work to define appropriately. For example, if I understand correctly replication of previous experiments is perceived as a lesser contribution than original work. This is a disincentive to replication that seems likely to increase the likelihood of perpetuating error. Assuming this is correct, and we wish to change the situation, what is needed is something like a consensus that replication should be more highly valued, otherwise if we rely on perceived importance this work is likely to continue to be undervalued.

Section 2.2 Assessing validity by open peer review

This section presents some very specific suggestions for a review system. One comment that I have is that this approach reflects a particular style. The idea of embedded reviews likely appeals more to some people than to others. Journals often provide reviewers with a variety of options depending on their preferred style; a written review like this, or go through the article and track changes. The + / - vote system for reviews strikes me as a tool very likely to reflect the personal popularity of reviewers and/or particular viewpoints rather than adding to the quality of reviews. There are advantages and disadvantages to authors being able to select the reviews that they feel are of most value to them. The disadvantage is that authors with a blind spot or conscious bias are free to ignore reviews that a really good editor would force them to address before a work could be published.

Section 3 Benefits of this evaluation system

Here the authors argue that this evaluation system can be transformed into metrics for the purpose of evaluation (number of scholars engaged in peer review, fraction that consider the article is up to scientific standards) and for importance (the number of authors that have curated the article in their self-journal). Like the authors, I think we need to move away from publishing in high impact factor journals as a surrogate of quality. However, I argue against metrics-based evaluation, period. This is a topic that I will be writing about more over the coming months. For now, suffice it to say that quickly moving to new metrics-based evaluation systems appears to me likely to create worse problems than such a move is meant to solve. For example, if we assume that scientific conservatism is a thing and is a problem, isn't a system where people are evaluated based on the number of people who review one's work and find it up to standards likely to increase conservatism?

Some strengths of the article:
  •  recognizing the need for change and hopefully kick-starting an important discussion
  • starting with the idea that we scholars can lead the transformation ourselves
  • focus on collaboration rather than competition
To think about from my perspective:
  • researcher time: realism is needed. An article that is reviewed by two or three people who are qualified to judge soundness of method, logic of arguments and clarity of writing should be enough. It isn't a good use of the time of researchers to have a whole lot of people looking at whether a particular statistical test was appropriate or not.
  •  this is work for scholarly communities, not individuals. The conclusion speaks to the experience of arXiv. arXiv is a service shared by a large community and supported by a partnership of libraries that has staff and hosting support.  
  • the Self-Journals of Science uses the CC-BY license as a default.  Many OA advocates regard this license as the best option for OA, however I argue that this is a major strategic error for the OA movement. My arguments on the overlap between open licensing and open access are complex and can be found in my series Creative Commons and Open Access Critique. To me this is a central issue that the OA movement must deal with, and so I raise it here and continue to avoid participating in services that require me to use this license for my work.
Key take-aways I hope people will get out of this review:
  • forget metrics - don't come up with a replacement for impact factor, let's get out of metrics-based evaluation altogether
  • look for good models, like arXiv because communities are complicated. What works?
  • let's talk - some of us may want immediate radical transformation of scholarly communication, but doing this well is going to take some time, to figure out the issues, come up with potential solutions, let people try stuff and see what works and what doesn't, and research too
  • be realistic about time and style - researchers have limited time, and people have different preferred styles. New approaches need to take this into account.
For more on this topic, watch for my keynote at the What do rankings tell us about higher education? roundtable at UBC this May.