Ever since the March 2, 2015 Rio Tinto opinion and order, there has been a lot of buzz in eDiscovery around the phrase “Continuous Active Learning” (CAL). Judge Peck briefly mentioned CAL while summarizing the available case law around seed-set sharing and transparency. For the sake of clarity, the term seed-set in this post refers to the initial group of training documents used to kick off a Technology Assisted Review (TAR) project. We refer to the review sets that follow as training sets. The point of Judge Peck’s mention of CAL, as I understood it, was to alert readers to the possibility that seed-set selection and disclosure disputes may become much less necessary as TAR tools and protocols continue to evolve.
Judge Peck pointed to recent research and a law review article by Maura Grossman and Gordon Cormack to support that notion. Those works made two important points about seed-set documents. First, they asserted that the selection and coding of seed-set documents is less likely to define the ultimate success of TAR projects employing a true CAL protocol. The general theory there is that the influence of misclassified seed documents is fleeting, since the classifier used to identify successive training set documents is recreated after each round, rather than simply revised or refitted. Second, they argued that seed-set transparency is not the guaranteed path to TAR project completeness, since neither the producing nor receiving party has a true understanding of the breadth of the concepts / information types in a collection.
The fact that Judge Peck cited the work of Grossman and Cormack as the basis for his statement is important, because the definition of CAL asserted in those publications is different from what the makers of many TAR tools would offer – even those that claim to be CAL capable.
Read More at the Altep blog: Read More at the Altep Blog: What is Continuous Active Learning (CAL), Really? – Part One