I thought readers would appreciate this sobering evaluation of the peer review process, which (like so many other things) isn’t quite the model process some think. This interesting article is from the Financial Post, FP Comment section, Friday, June 22, 2007.
Lessons of figure skating
Peer review is a crucial part of science funding, but scientists could learn from the skating world that more than two opinions are needed for a good judgment
Scientific peer reviewers are the best specialists that editors can find to read the manuscripts they receive. Peer reviewers usually serve as unpaid, hardworking experts. In essence, journal peer reviewers stand on guard for society as a whole, to ensure that only scientifically credible articles get published.
But long before any journal peer review, the research needs to be financed, so a different kind of peer review takes place. To apply to publicly supported granting agencies, researchers need to describe why an idea needs investigation and how they would conduct their experiments. Instead of a worldwide pool of experts, funding agencies usually must rely on committees, or groups of scientists from various fields. Those who serve as peer reviewers for funding agencies are also volunteers, giving of their time in the often thankless task of reading many applications for funding. The goal of their peer review is to provide a score that agencies will use to rank who gets funding. In theory, peer reviews applicable to grant applications ensure that limited research dollars support the best science.
The peer review of research-grant applications is a huge problem for all concerned. Not counting the thinking and the groundwork, a typical medical researcher spends an absolute minimum of a month of full-time work writing a grant application. After that, according to the Canadian Institutes for Health Research (CIHR), our federal government’s largest medical and health funding agency, the odds of funding success are about one out of six. If the grant application aims to fund brand-new research, the odds of success are even worse. Peer review should ensure that good research stands a better chance of success than a roll of the dice.
Almost five years ago, Warren Thorngate, a statistician from Carleton University, examined the statistics about the peer-review process at CIHR. His “Thorngate report” paints a very sad picture. He shows that the scoring of scientific grant applications is no different from other situations in which humans need to “score” something, whether it is judging figure skaters, or choosing a paint colour. Any two people can agree or disagree, just by the luck of the draw of which two people are selected to judge. I have long wondered how many people who apply for research funding have read the Thorngate report with care.
Basically, every grant application is given to two members of a committee to read, each assigning a score out of five. That score is a judgment based on quality guidelines about the possible importance of the research to the health of Canadians, the quality of the experiment proposed, etc. The result, according to Thorngate, is that “perceived differences in the quality of the applications accounted for less than 25% of the variance of internal reviewers’ ratings. Individual differences among the internal reviewers seemed to account for the rest.”
This means that 75% of what compares your score with everybody else’s is just plain randomness. Everyone who has applied repeatedly for research grant money knows this, and it applies to just about any peer-review system, not just CIHR. For example, I sent exactly the same grant proposal to two funding agencies at the same time. Agency A scored the proposal so badly it was not even worth discussing. Agency B scored the proposal as the best of the 20 it considered. When I complained to Agency A about this discrepancy, it replied that its low score simply reflected a difference of opinion.
For researchers who need support, the random gamble of the way applications are scored and ranked is a huge problem. Researchers now accept that peer review is junk science, because it is not science at all. Applicants for grants know that, despite the sincere efforts of peer reviewers (all of whom have also been applicants), the opinion-based judgments of peer reviews do end up functioning like a lottery. And just like any lottery, the only way to be sure of winning is to keep on buying a ticket.
For funding agencies, the randomness of peer review has created an ever-growing problem. As applicants keep recycling grant applications into the lottery, the number of applications climbs and the success rate drops. New research ideas entering the pool are quickly watered down into a sea of applications. The burden of dealing with applications that need reviewing increases. With that, the mental capacity of peer reviewers becomes ever more strained. It becomes difficult for them to do justice to every application, they are overworked, and they become quick to toss proposals out of competition.
To outsiders, peer review is a mysterious scientific system that serves as our ultimate way to determine research quality. Warren Thorngate tells us with evidence that the quality of judgment in peer review is no more reliable than for any other kind of judgment call.
In the field of figure skating, performances are scored and averaged from several judges, with the highest and lowest scores tossed out. The scoring system for figure skating is more scientific, because ranking for a given performance is designed to be reproducible. Figure skating has minimized the lottery effect. The problem for those of us who apply for medical research funding is that, with only two reviewers to score applications, the scoring system that compares each applicant with the competition is just too noisy. I am by no means criticizing peer review of research grants because there really is no better alternative. But we need to make the system less of a crapshoot for applicants.
In science, the usual way to make things less random is to average more inputs. This means to average scores from more than the usual two peer reviewers who sit on committees. However, according to Thorngate, even though our CIHR sends proposals to outside experts for peer review, their opinions “matter little in the adjudication process” and “the usefulness of external reviews remains a mystery.” In other words, to a statistician, it looks like the extra peer reviews available are wasted because there is no evidence that they count toward the ranking for funding. This is not good science.
Counting the input from a greater number of judges in the average score works for figure skating. Those responsible for designing the way research grant applications are ranked need to borrow a page from the world of sports and make the system as reliable as it would be if an audience were watching.
Reinhold Vieth is Professor, Department of Nutritional Sciences and Department of Laboratory Medicine and Pathobiology at the University of Toronto, and Director, Bone and Mineral Laboratory Pathology and Laboratory Medicine at Mount Sinai Hospital in Toronto.