My Top Five Takeaways from The U.S. Tax Court’s Emphatic Affirmation of Predictive Coding


Dynamo Holdings Limited Partnership v. Commissioner

In an order dated July 13, 2016, the U.S. Tax Court once again strongly supported the use of Predictive Coding. The case had already featured some notable opinions and orders on the topic. This recent order is a fun read for analytics nerds and newcomers alike, as the Court did a great job of laying out the associated facts and addressing the typical arguments for and against use of the technology. Here are a few items that caught my attention as I read it.

Opposing Party as SME

As I noted in a recent blog post, the use of an SME to train a Predictive Coding algorithm can be an expensive proposition. The responding party here took an interesting approach by allowing the requesting party to serve as their SME. There are certainly cases where parties meet to discuss seed documents, but there are also a fair number in which the parties do not feel the need to disclose use of the technology, never mind the documents used to train it.

I can understand why parties are uncomfortable submitting to their opponent’s discretion. However, Dynamo’s aggressive training approach clearly helped their cause. In fact, the Commissioner’s training of the algorithm was the only process-related factor the Court mentioned when concluding that “[t]here is no question” Dynamo met its discovery obligations. It is an interesting strategy to consider if Predictive Coding is accepted by the court and parties, as it may save you some training costs and bolster the defensibility of your process.

Read more at the Altep blog: My Top Five Takeaways from The U.S. Tax Court’s Emphatic Affirmation of Predictive Coding

To SME or Not to SME (in TAR)… That is the Question


This article assumes that Technology Assisted Review is being deployed in a production review setting where the user seeks to identify potentially relevant documents from among a larger corpus, and to subject those documents to full manual review. The use of TAR as an investigative or fact finding tool is a more financially flexible proposition, and the efficiency of that approach should be evaluated via separate standards.

There has been some debate in the past few years about the proper role of the Subject Matter Expert (SME) in technology assisted review (TAR) – a discussion which has understandably resulted in plenty of disagreement. There was a time when most blog posts and white papers swore that SME training was the only path to success, but that position looks to have softened some.

I have always been a bit skeptical of the necessity of SME training, especially when that SME takes the form of a law firm partner or senior associate. While a more compelling argument can be made for client insiders as SME trainers, I am not convinced that SME training is necessary in either case. There are two main factors that drive my opinion here – consistency and cost. Each of these factors must be carefully considered before embarking on a TAR project.

Proper planning is required to achieve consistency. However, consistency coming at a price tag that is greater than anticipated review savings is practically meaningless. The margin for cost efficiency between linear manual review and technology assisted review is thinner than you might think if you don’t carefully consider and control training costs.

Read more at the Altep blog: To SME or Not to SME (in TAR)… That is the Question

Redefining Responsiveness Evaluation: Making the Case for True Unique Document Review


If you are reading this blog, you have probably heard the story many times by now. Document review is the most expensive part of eDiscovery. Like many, I find myself asking the same question again and again. How can we do it better? One obvious answer is by defensibly reviewing less. The not so obvious part of that answer is the available methods for doing so.

The growing absurdity of document review volumes and associated costs have been an obsession of mine for a while now. I have spent a lot of time in the past few years studying methods and technologies capable of better controlling those volumes and cost. Global de-duplication, e-mail threading, textual near duplicate identification, and more complex varieties of technology assisted review (TAR), such as Equivio Relevance and Relativity Assisted Review, have all proved useful in my quest to help clients review and spend less.

Read more at the Altep blog: Redefining Responsiveness Evaluation: Making the Case for True Unique Document Review

Reviewing Native Excel Files, Part 1: Detecting Inconsistent Formulas

Looking forward to seeing all of these posts…

Excel Esquire

This is the first in a series of posts about reviewing native Excel files produced by parties in litigation.  We’ve finally reached a tipping point in litigation where the production of native Excel files (rather than inscrutable thousand-page printouts) is the rule rather than the exception.  Discovery stipulations now routinely contain a provision that calls for Excel files to be produced natively (does yours?), and Magistrate Judge Facciola famously observed that tiffing out electronic documents such as spreadsheets is “madness” (Covad Commc’ns. Co. v. Revonet, Inc., 2009 WL 2595257 (D.D.C. Aug. 25, 2009)).  The question for practicing lawyers today is how to review those files, and how to exploit the wealth of information they often contain.

Today we look at Excel’s built-in feature that flags inconsistent formulas, and see how that feature can call attention to potentially critical information lurking beneath the surface.

Suppose your client is a plaintiff in…

View original post 352 more words

Can You Really Compete in TREC Retroactively?

Another great post from Bill Dimm.

Clustify Blog - eDiscovery, Document Clustering, Technology-Assisted Review (Predictive Coding), Information Retrieval, and Software Development

I recently encountered a marketing piece where a vendor claimed that their tests showed their predictive coding software demonstrated favorable performance compared to the software tested in the 2009 TREC Legal Track for Topic 207 (finding Enron emails about fantasy football).  I spent some time puzzling about how they could possibly have measured their performance when they didn’t actually participate in TREC 2009.

One might question how meaningful it is to compare to performance results from 2009 since the TREC participants have probably improved their software over the past six years.  Still, how could you do the comparison if you wanted to?  The stumbling block is that TREC did not produce a yes/no relevance determination for all of the Enron emails.  Rather, they did stratified sampling and estimated recall and prevalence for the participating teams by producing relevance determinations for just a few thousand emails.

Stratified sampling means that the…

View original post 1,680 more words

The Kind of Stuff I Think About Late At Night…

The most critical component of the predictive coding exercise is training of the system.  The whole point of this component is to separate relevant content from non-relevant content.  The point is most definitely not to separate the responsive documents from the non-responsive documents.  These are two very different standards.  Separating responsive documents from non-responsive documents usually requires not only identification of non-relevant content, but also dissecting relevant content to meet responsiveness requirements.  The latter is all too often where the training process goes wrong.

One of the more beneficial goals of using predictive coding software is the ability to accurately identify and eliminate non-relevant documents from the review universe.  With that in mind, system trainers need to remember that they should avoid dismissing relevant content because it does not meet responsiveness requirements.  I know that has been said thousands of times, but it needs to be said again and again.

Responsiveness in your case may hinge upon whether a particular widget that was manufactured in Seattle was red.  If a system trainer then dismisses an e-mail as non-relevant because it discusses blue widgets made in Seattle they are confusing the system and hindering the process.  To truly get the most out of the process you must include the blue Seattle widgets discussion as relevant, and likely also include discussion about widgets of other colors manufactured in other cities.  Discussion of the manufacture of widgets is relevant content.  Whether they were made in Seattle and if they were red will determine whether the document is responsive.

The Path to E-Mail Production IV, Revisited

Ball in your Court

path of email-4This is the ninth in a series revisiting Ball in Your Court columns and posts from the primordial past of e-discovery–updating and critiquing in places, and hopefully restarting a few conversations.  As always, your comments are gratefully solicited.

The Path to Production: Are We There Yet?

(Part IV of IV)

[Originally published in Law Technology News, January 2006]

The e-mail’s assembled and accessible.  You could begin review immediately, but unless your client has money to burn, there’s more to do before diving in: de-duplication. When Marge e-mails Homer, Bart and Lisa, Homer’s “Reply to All” goes in both Homer’s Sent Items and Inbox folders, and in Marge’s, Bart’s and Lisa’s Inboxes.  Reviewing Homer’s response five times is wasteful and sets the stage for conflicting relevance and privilege decisions.

Duplication problems compound when e-mail is restored from backup tape.  Each tape is a snapshot of e-mail at a moment…

View original post 776 more words