Latest Grossman and Cormack Study Proves Efficacy of Multimodal Search for Predictive Coding Training Documents and the Folly of Random Search – Part Two

e-Discovery Team ®

This is a continuation of my earlier blog with the same title: Latest Grossman and Cormack Study Proves Efficacy of Multimodal Search for Predictive Coding Training Documents and the Folly of Random Search – Part One. 

Latest Grossman Cormack Study

grossman_cormack_filteredThe information scientist behind this study is Gordon V. Cormack, Professor, University of Waterloo. He has a long history as a search expert outside of legal search, including special expertise in spam searches. The lawyer who worked with Gordon on this study is Maura R. Grossman, Of Counsel, Wachtell, Lipton, Rosen & Katz. In addition to her J.D., she has a PhD in psychology, and has been a tireless advocate for effective legal search for many years. Their work is well known to everyone in the field.

The primary purpose of their latest study was not to test the effectiveness of training based on random samples. That was a…

View original post 2,865 more words

Advertisements

Predictive Coding Confusion

Clustify Blog - eDiscovery, Document Clustering, Predictive Coding, Information Retrieval, and Software Development

This article looks at a few common misconceptions and mistakes related to predictive coding and confidence intervals.

Confidence intervals vs. training set size:  You can estimate the percentage of documents in a population having some property (e.g., is the document responsive, or does it contain the word “pizza”) by taking a random sample of the documents and measuring the percentage having that property.  The confidence interval tells you how much uncertainty there is due to your measurement being made on a sample instead of the full population.  If you sample 400 documents, the 95% confidence interval is +/- 5%, meaning that 95% of the time the range from -5% to +5% around your estimate will contain the actual value for the full population.  For example, if you sample 400 documents and find that 64 are relevant (16%), there is a 95% chance that the range 11% to 21% will…

View original post 1,626 more words

Latest Grossman and Cormack Study Proves Efficacy of Multimodal Search for Predictive Coding Training Documents and the Folly of Random Search – Part One

e-Discovery Team ®

Nasreddin_(17th-century_miniature)There is a well-known joke found in most cultures of the world about a fool looking for something. This anecdote has been told for thousands of years because it illustrates a basic trait of human psychology, now commonly called after the joke itself, the  Streetlight Effect. This is a type of observational bias where people only look for whatever they are searching by looking where it is easiest. This human frailty, when pointed out in the right way, can be funny. One of the oldest known forms of pedagogic humor illustrating the Streetlight effect comes from the famous stories of Nasrudin, aka, Nasreddin, an archetypal wise fool from 13th Century Sufi traditions. Here is one version of this joke attributed to Nasreddin:

One late evening Nasreddin found himself walking home. It was only a very short way and upon arrival he can be seen to be upset about something. Alas, just then a…

View original post 2,133 more words