TAR Course Updated to Add Video on Step Seven and the All Important “Stop Decision”

e-Discovery Team ®

We added to the TAR Course again this weekend with a video introducing Class Fourteen on Step Seven, ZEN Quality Assurance Tests. ZEN stands for Zero Error Numerics with the double-entendre on purpose, but this video does not go into the math, concentration or reviewer focus. Ralph’s video instead provides an introduction to the main purpose of Step Seven from a work-flow perspective, to test and validate the decision to stop the Training Cycle steps, 4-5-6.

The Training Cycle shown in the diagram continues until the expert in charge of the training decides to stop. This is a decision to complete the first pass document review. The stop decision is a legal, statistical decision requiring a holistic approach, including metrics, sampling and over-all project assessment. You decide to stop the review after weighing a multitude of considerations, including when the software has attained a highly stratified distribution of documents. See

View original post 1,074 more words

FCA US LLC v. Cummings – It’s Not Perfect, but It Does Need to be Better

TAR eDiscovery orders and opinions have made some pretty big splashes in the last five years, and the recent FCA US LLC v. Cummings, Inc., order, despite being brief, was no exception. The court took up the question of whether keyword search culling of a data set prior to the application of Technology Assisted Review (i.e., TAR or Predictive Coding) is the preferred method. The answer, in the court’s opinion, was simple but powerful: it is not.

Some have described this decision as a “nightmare.” Others have less vividly decried it as likely to impede much needed progress in the use of advanced analytics. While I understand the causes for concern, I find it hard to disagree with the court’s decision based on my understanding of the relevant judicial precedent and the gravity of the flaws associated with keyword search culling.

Personally, I don’t believe that TAR judicial history to date, apart from the circumstance and proportionality based rulings in In re Biomet (Apr. 18, 2013) and Bridgestone (July 22, 2014), support another outcome….

See the full post at the Altep Blog: FCA US LLC v. Cummings – It’s Not Perfect, but It Does Need to be Better

 

Substantial Reduction in Review Effort Required to Demonstrate Adequate Recall

Clustify Blog - eDiscovery, Document Clustering, Technology-Assisted Review (Predictive Coding), Information Retrieval, and Software Development

Measuring the recall achieved to within +/- 5% to demonstrate that a production is defensible can require reviewing a substantial number of random documents.  For a case of modest size, the amount of review required to measure recall can be larger than the amount of review required to actually find the responsive documents with predictive coding.  This article describes a new method requiring much less document review to demonstrate that adequate recall has been achieved.  This is a brief overview of a more detailed paper I’ll be presenting at the DESI VII Workshop on June 12th.

The proportion of a population having some property can be estimated to within +/- 5% by measuring the proportion on a random sample of 400 documents (you’ll also see the number 385 being used, but using 400 will make it easier to follow the examples).  To measure recall we need to know what proportion…

View original post 769 more words

My Top Five Takeaways from The U.S. Tax Court’s Emphatic Affirmation of Predictive Coding

predictivecoding-cove_20160803-183830_1

Dynamo Holdings Limited Partnership v. Commissioner

In an order dated July 13, 2016, the U.S. Tax Court once again strongly supported the use of Predictive Coding. The case had already featured some notable opinions and orders on the topic. This recent order is a fun read for analytics nerds and newcomers alike, as the Court did a great job of laying out the associated facts and addressing the typical arguments for and against use of the technology. Here are a few items that caught my attention as I read it.

Opposing Party as SME

As I noted in a recent blog post, the use of an SME to train a Predictive Coding algorithm can be an expensive proposition. The responding party here took an interesting approach by allowing the requesting party to serve as their SME. There are certainly cases where parties meet to discuss seed documents, but there are also a fair number in which the parties do not feel the need to disclose use of the technology, never mind the documents used to train it.

I can understand why parties are uncomfortable submitting to their opponent’s discretion. However, Dynamo’s aggressive training approach clearly helped their cause. In fact, the Commissioner’s training of the algorithm was the only process-related factor the Court mentioned when concluding that “[t]here is no question” Dynamo met its discovery obligations. It is an interesting strategy to consider if Predictive Coding is accepted by the court and parties, as it may save you some training costs and bolster the defensibility of your process.

Read more at the Altep blog: My Top Five Takeaways from The U.S. Tax Court’s Emphatic Affirmation of Predictive Coding

To SME or Not to SME (in TAR)… That is the Question

tar-cove_20160715-201644_1

This article assumes that Technology Assisted Review is being deployed in a production review setting where the user seeks to identify potentially relevant documents from among a larger corpus, and to subject those documents to full manual review. The use of TAR as an investigative or fact finding tool is a more financially flexible proposition, and the efficiency of that approach should be evaluated via separate standards.

There has been some debate in the past few years about the proper role of the Subject Matter Expert (SME) in technology assisted review (TAR) – a discussion which has understandably resulted in plenty of disagreement. There was a time when most blog posts and white papers swore that SME training was the only path to success, but that position looks to have softened some.

I have always been a bit skeptical of the necessity of SME training, especially when that SME takes the form of a law firm partner or senior associate. While a more compelling argument can be made for client insiders as SME trainers, I am not convinced that SME training is necessary in either case. There are two main factors that drive my opinion here – consistency and cost. Each of these factors must be carefully considered before embarking on a TAR project.

Proper planning is required to achieve consistency. However, consistency coming at a price tag that is greater than anticipated review savings is practically meaningless. The margin for cost efficiency between linear manual review and technology assisted review is thinner than you might think if you don’t carefully consider and control training costs.

Read more at the Altep blog: To SME or Not to SME (in TAR)… That is the Question

Redefining Responsiveness Evaluation: Making the Case for True Unique Document Review

singular-focus

If you are reading this blog, you have probably heard the story many times by now. Document review is the most expensive part of eDiscovery. Like many, I find myself asking the same question again and again. How can we do it better? One obvious answer is by defensibly reviewing less. The not so obvious part of that answer is the available methods for doing so.

The growing absurdity of document review volumes and associated costs have been an obsession of mine for a while now. I have spent a lot of time in the past few years studying methods and technologies capable of better controlling those volumes and cost. Global de-duplication, e-mail threading, textual near duplicate identification, and more complex varieties of technology assisted review (TAR), such as Equivio Relevance and Relativity Assisted Review, have all proved useful in my quest to help clients review and spend less.

Read more at the Altep blog: Redefining Responsiveness Evaluation: Making the Case for True Unique Document Review

Reviewing Native Excel Files, Part 1: Detecting Inconsistent Formulas

Looking forward to seeing all of these posts…

Excel Esquire

This is the first in a series of posts about reviewing native Excel files produced by parties in litigation.  We’ve finally reached a tipping point in litigation where the production of native Excel files (rather than inscrutable thousand-page printouts) is the rule rather than the exception.  Discovery stipulations now routinely contain a provision that calls for Excel files to be produced natively (does yours?), and Magistrate Judge Facciola famously observed that tiffing out electronic documents such as spreadsheets is “madness” (Covad Commc’ns. Co. v. Revonet, Inc., 2009 WL 2595257 (D.D.C. Aug. 25, 2009)).  The question for practicing lawyers today is how to review those files, and how to exploit the wealth of information they often contain.

Today we look at Excel’s built-in feature that flags inconsistent formulas, and see how that feature can call attention to potentially critical information lurking beneath the surface.

Suppose your client is a plaintiff in…

View original post 352 more words

Can You Really Compete in TREC Retroactively?

Another great post from Bill Dimm.

Clustify Blog - eDiscovery, Document Clustering, Technology-Assisted Review (Predictive Coding), Information Retrieval, and Software Development

I recently encountered a marketing piece where a vendor claimed that their tests showed their predictive coding software demonstrated favorable performance compared to the software tested in the 2009 TREC Legal Track for Topic 207 (finding Enron emails about fantasy football).  I spent some time puzzling about how they could possibly have measured their performance when they didn’t actually participate in TREC 2009.

One might question how meaningful it is to compare to performance results from 2009 since the TREC participants have probably improved their software over the past six years.  Still, how could you do the comparison if you wanted to?  The stumbling block is that TREC did not produce a yes/no relevance determination for all of the Enron emails.  Rather, they did stratified sampling and estimated recall and prevalence for the participating teams by producing relevance determinations for just a few thousand emails.

Stratified sampling means that the…

View original post 1,680 more words

The Kind of Stuff I Think About Late At Night…

The most critical component of the predictive coding exercise is training of the system.  The whole point of this component is to separate relevant content from non-relevant content.  The point is most definitely not to separate the responsive documents from the non-responsive documents.  These are two very different standards.  Separating responsive documents from non-responsive documents usually requires not only identification of non-relevant content, but also dissecting relevant content to meet responsiveness requirements.  The latter is all too often where the training process goes wrong.

One of the more beneficial goals of using predictive coding software is the ability to accurately identify and eliminate non-relevant documents from the review universe.  With that in mind, system trainers need to remember that they should avoid dismissing relevant content because it does not meet responsiveness requirements.  I know that has been said thousands of times, but it needs to be said again and again.

Responsiveness in your case may hinge upon whether a particular widget that was manufactured in Seattle was red.  If a system trainer then dismisses an e-mail as non-relevant because it discusses blue widgets made in Seattle they are confusing the system and hindering the process.  To truly get the most out of the process you must include the blue Seattle widgets discussion as relevant, and likely also include discussion about widgets of other colors manufactured in other cities.  Discussion of the manufacture of widgets is relevant content.  Whether they were made in Seattle and if they were red will determine whether the document is responsive.

The Path to E-Mail Production IV, Revisited

Ball in your Court

path of email-4This is the ninth in a series revisiting Ball in Your Court columns and posts from the primordial past of e-discovery–updating and critiquing in places, and hopefully restarting a few conversations.  As always, your comments are gratefully solicited.

The Path to Production: Are We There Yet?

(Part IV of IV)

[Originally published in Law Technology News, January 2006]

The e-mail’s assembled and accessible.  You could begin review immediately, but unless your client has money to burn, there’s more to do before diving in: de-duplication. When Marge e-mails Homer, Bart and Lisa, Homer’s “Reply to All” goes in both Homer’s Sent Items and Inbox folders, and in Marge’s, Bart’s and Lisa’s Inboxes.  Reviewing Homer’s response five times is wasteful and sets the stage for conflicting relevance and privilege decisions.

Duplication problems compound when e-mail is restored from backup tape.  Each tape is a snapshot of e-mail at a moment…

View original post 776 more words