Fair Comparison of Predictive Coding Performance

Clustify Blog - eDiscovery, Document Clustering, Predictive Coding, Information Retrieval, and Software Development

Understandably, vendors of predictive coding software want to show off numbers indicating that their software works well.  It is important for users of such software to avoid drawing wrong conclusions from performance numbers.

Consider the two precision-recall curves below (if you need to brush up on the meaning of precision and recall, see my earlier article):precision_recall_for_diff_tasks

The one on the left is incredibly good, with 97% precision at 90% recall.  The one on the right is not nearly as impressive, with 17% precision at 70% recall, though you could still find 70% of the relevant documents with no additional training by reviewing only the highest-rated 4.7% of the document population (excluding the documents reviewed for training and testing).

Why are the two curves so different?  They come from the same algorithm applied to the same document population with the same features (words) analyzed and the exact same random sample…

View original post 636 more words

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s