Ensuring Enough Reviews
In an effort to continually balance reviewer effort with great results, Innovation Central now has a second way to show review progress. Sooner or later everyone using H2H asks how many reviews they really need to achieve a good tradeoff between minimum reviewer effort and maximum accuracy and reliability of the ranking. It’s an excellent question. There is no theory to give guidance, perhaps because perfectly fair ranking is known to be impossible to achieve (the subject of the 1972 Nobel Prize in Economics), and so we’ve done a great deal of simulation to understand the issues and give best advice.
There are two sources of wobble in the ranking of any given idea: “statistical wobble” due to too-few reviews, easily remedied by getting more reviews, and “logical inconsistency” which is a fact of life with human voting: a reviewer might say that idea A is better than B, B is better than C, and C is better than A. That’s inconsistent but it happens all the time and especially between reviewers. Simulation shows that these different effects aren’t readily distinguished, so we can’t give specific advice. Instead, we use sampling statistics to estimate the standard deviation of rank for every idea, and relate that with plain-English advice, for example:
What this means is that two ideas with ranking scores about 11 units or less apart really aren’t significantly better or worse than one another, while two ideas 25 units apart (i.e. upper quartile vs second quartile) are definitely different.
The “wobble” in idea ranks is shown graphically with a display of ranks computed with random subsets of the total reviews. If ideas’ ranks stay about the same even when 20% of the reviews are discarded, then the overall scoring is quite stable and unlikely to change with more reviews. That’s shown below since the color bands are generally thin and horizontal.