Measuring the recall achieved to within +/- 5% to demonstrate that a production is defensible can require reviewing a substantial number of random documents. For a case of modest size, the amount of review required to measure recall can be larger than the amount of review required to actually find the responsive documents with predictive coding. This article describes a new method requiring much less document review to demonstrate that adequate recall has been achieved. This is a brief overview of a more detailed paper I’ll be presenting at the DESI VII Workshop on June 12th.

The proportion of a population having some property can be estimated to within +/- 5% by measuring the proportion on a random sample of 400 documents (you’ll also see the number 385 being used, but using 400 will make it easier to follow the examples). To measure recall we need to know what proportion…

View original post 769 more words