Understandably, vendors of predictive coding software want to show off numbers indicating that their software works well. It is important for users of such software to avoid drawing wrong conclusions from performance numbers.
Consider the two precision-recall curves below (if you need to brush up on the meaning of precision and recall, see my earlier article):
The one on the left is incredibly good, with 97% precision at 90% recall. The one on the right is not nearly as impressive, with 17% precision at 70% recall, though you could still find 70% of the relevant documents with no additional training by reviewing only the highest-rated 4.7% of the document population (excluding the documents reviewed for training and testing).
Why are the two curves so different? They come from the same algorithm applied to the same document population with the same features (words) analyzed and the exact same random sample…
View original post 636 more words