The most critical component of the predictive coding exercise is training of the system. The whole point of this component is to separate relevant content from non-relevant content. The point is most definitely not to separate the responsive documents from the non-responsive documents. These are two very different standards. Separating responsive documents from non-responsive documents usually requires not only identification of non-relevant content, but also dissecting relevant content to meet responsiveness requirements. The latter is all too often where the training process goes wrong.
One of the more beneficial goals of using predictive coding software is the ability to accurately identify and eliminate non-relevant documents from the review universe. With that in mind, system trainers need to remember that they should avoid dismissing relevant content because it does not meet responsiveness requirements. I know that has been said thousands of times, but it needs to be said again and again.
Responsiveness in your case may hinge upon whether a particular widget that was manufactured in Seattle was red. If a system trainer then dismisses an e-mail as non-relevant because it discusses blue widgets made in Seattle they are confusing the system and hindering the process. To truly get the most out of the process you must include the blue Seattle widgets discussion as relevant, and likely also include discussion about widgets of other colors manufactured in other cities. Discussion of the manufacture of widgets is relevant content. Whether they were made in Seattle and if they were red will determine whether the document is responsive.