TAR Course Updated to Add Video on Step Seven and the All Important “Stop Decision”

e-Discovery Team ®

We added to the TAR Course again this weekend with a video introducing Class Fourteen on Step Seven, ZEN Quality Assurance Tests. ZEN stands for Zero Error Numerics with the double-entendre on purpose, but this video does not go into the math, concentration or reviewer focus. Ralph’s video instead provides an introduction to the main purpose of Step Seven from a work-flow perspective, to test and validate the decision to stop the Training Cycle steps, 4-5-6.

The Training Cycle shown in the diagram continues until the expert in charge of the training decides to stop. This is a decision to complete the first pass document review. The stop decision is a legal, statistical decision requiring a holistic approach, including metrics, sampling and over-all project assessment. You decide to stop the review after weighing a multitude of considerations, including when the software has attained a highly stratified distribution of documents. See

View original post 1,074 more words

FCA US LLC v. Cummings – It’s Not Perfect, but It Does Need to be Better

TAR eDiscovery orders and opinions have made some pretty big splashes in the last five years, and the recent FCA US LLC v. Cummings, Inc., order, despite being brief, was no exception. The court took up the question of whether keyword search culling of a data set prior to the application of Technology Assisted Review (i.e., TAR or Predictive Coding) is the preferred method. The answer, in the court’s opinion, was simple but powerful: it is not.

Some have described this decision as a “nightmare.” Others have less vividly decried it as likely to impede much needed progress in the use of advanced analytics. While I understand the causes for concern, I find it hard to disagree with the court’s decision based on my understanding of the relevant judicial precedent and the gravity of the flaws associated with keyword search culling.

Personally, I don’t believe that TAR judicial history to date, apart from the circumstance and proportionality based rulings in In re Biomet (Apr. 18, 2013) and Bridgestone (July 22, 2014), support another outcome….

See the full post at the Altep Blog: FCA US LLC v. Cummings – It’s Not Perfect, but It Does Need to be Better


Substantial Reduction in Review Effort Required to Demonstrate Adequate Recall

Clustify Blog - eDiscovery, Document Clustering, Predictive Coding, Information Retrieval, and Software Development

Measuring the recall achieved to within +/- 5% to demonstrate that a production is defensible can require reviewing a substantial number of random documents.  For a case of modest size, the amount of review required to measure recall can be larger than the amount of review required to actually find the responsive documents with predictive coding.  This article describes a new method requiring much less document review to demonstrate that adequate recall has been achieved.  This is a brief overview of a more detailed paper I’ll be presenting at the DESI VII Workshop on June 12th.

The proportion of a population having some property can be estimated to within +/- 5% by measuring the proportion on a random sample of 400 documents (you’ll also see the number 385 being used, but using 400 will make it easier to follow the examples).  To measure recall we need to know what proportion…

View original post 769 more words

My Top Five Takeaways from The U.S. Tax Court’s Emphatic Affirmation of Predictive Coding


Dynamo Holdings Limited Partnership v. Commissioner

In an order dated July 13, 2016, the U.S. Tax Court once again strongly supported the use of Predictive Coding. The case had already featured some notable opinions and orders on the topic. This recent order is a fun read for analytics nerds and newcomers alike, as the Court did a great job of laying out the associated facts and addressing the typical arguments for and against use of the technology. Here are a few items that caught my attention as I read it.

Opposing Party as SME

As I noted in a recent blog post, the use of an SME to train a Predictive Coding algorithm can be an expensive proposition. The responding party here took an interesting approach by allowing the requesting party to serve as their SME. There are certainly cases where parties meet to discuss seed documents, but there are also a fair number in which the parties do not feel the need to disclose use of the technology, never mind the documents used to train it.

I can understand why parties are uncomfortable submitting to their opponent’s discretion. However, Dynamo’s aggressive training approach clearly helped their cause. In fact, the Commissioner’s training of the algorithm was the only process-related factor the Court mentioned when concluding that “[t]here is no question” Dynamo met its discovery obligations. It is an interesting strategy to consider if Predictive Coding is accepted by the court and parties, as it may save you some training costs and bolster the defensibility of your process.

Read more at the Altep blog: My Top Five Takeaways from The U.S. Tax Court’s Emphatic Affirmation of Predictive Coding

To SME or Not to SME (in TAR)… That is the Question


This article assumes that Technology Assisted Review is being deployed in a production review setting where the user seeks to identify potentially relevant documents from among a larger corpus, and to subject those documents to full manual review. The use of TAR as an investigative or fact finding tool is a more financially flexible proposition, and the efficiency of that approach should be evaluated via separate standards.

There has been some debate in the past few years about the proper role of the Subject Matter Expert (SME) in technology assisted review (TAR) – a discussion which has understandably resulted in plenty of disagreement. There was a time when most blog posts and white papers swore that SME training was the only path to success, but that position looks to have softened some.

I have always been a bit skeptical of the necessity of SME training, especially when that SME takes the form of a law firm partner or senior associate. While a more compelling argument can be made for client insiders as SME trainers, I am not convinced that SME training is necessary in either case. There are two main factors that drive my opinion here – consistency and cost. Each of these factors must be carefully considered before embarking on a TAR project.

Proper planning is required to achieve consistency. However, consistency coming at a price tag that is greater than anticipated review savings is practically meaningless. The margin for cost efficiency between linear manual review and technology assisted review is thinner than you might think if you don’t carefully consider and control training costs.

Read more at the Altep blog: To SME or Not to SME (in TAR)… That is the Question

Redefining Responsiveness Evaluation: Making the Case for True Unique Document Review


If you are reading this blog, you have probably heard the story many times by now. Document review is the most expensive part of eDiscovery. Like many, I find myself asking the same question again and again. How can we do it better? One obvious answer is by defensibly reviewing less. The not so obvious part of that answer is the available methods for doing so.

The growing absurdity of document review volumes and associated costs have been an obsession of mine for a while now. I have spent a lot of time in the past few years studying methods and technologies capable of better controlling those volumes and cost. Global de-duplication, e-mail threading, textual near duplicate identification, and more complex varieties of technology assisted review (TAR), such as Equivio Relevance and Relativity Assisted Review, have all proved useful in my quest to help clients review and spend less.

Read more at the Altep blog: Redefining Responsiveness Evaluation: Making the Case for True Unique Document Review

Reviewing Native Excel Files, Part 1: Detecting Inconsistent Formulas

Looking forward to seeing all of these posts…

Excel Esquire

This is the first in a series of posts about reviewing native Excel files produced by parties in litigation.  We’ve finally reached a tipping point in litigation where the production of native Excel files (rather than inscrutable thousand-page printouts) is the rule rather than the exception.  Discovery stipulations now routinely contain a provision that calls for Excel files to be produced natively (does yours?), and Magistrate Judge Facciola famously observed that tiffing out electronic documents such as spreadsheets is “madness” (Covad Commc’ns. Co. v. Revonet, Inc., 2009 WL 2595257 (D.D.C. Aug. 25, 2009)).  The question for practicing lawyers today is how to review those files, and how to exploit the wealth of information they often contain.

Today we look at Excel’s built-in feature that flags inconsistent formulas, and see how that feature can call attention to potentially critical information lurking beneath the surface.

Suppose your client is a plaintiff in…

View original post 352 more words