TAR Course Updated to Add Video on Step Seven and the All Important “Stop Decision”

e-Discovery Team ®

We added to the TAR Course again this weekend with a video introducing Class Fourteen on Step Seven, ZEN Quality Assurance Tests. ZEN stands for Zero Error Numerics with the double-entendre on purpose, but this video does not go into the math, concentration or reviewer focus. Ralph’s video instead provides an introduction to the main purpose of Step Seven from a work-flow perspective, to test and validate the decision to stop the Training Cycle steps, 4-5-6.

The Training Cycle shown in the diagram continues until the expert in charge of the training decides to stop. This is a decision to complete the first pass document review. The stop decision is a legal, statistical decision requiring a holistic approach, including metrics, sampling and over-all project assessment. You decide to stop the review after weighing a multitude of considerations, including when the software has attained a highly stratified distribution of documents. See

View original post 1,074 more words

FCA US LLC v. Cummings – It’s Not Perfect, but It Does Need to be Better

TAR eDiscovery orders and opinions have made some pretty big splashes in the last five years, and the recent FCA US LLC v. Cummings, Inc., order, despite being brief, was no exception. The court took up the question of whether keyword search culling of a data set prior to the application of Technology Assisted Review (i.e., TAR or Predictive Coding) is the preferred method. The answer, in the court’s opinion, was simple but powerful: it is not.

Some have described this decision as a “nightmare.” Others have less vividly decried it as likely to impede much needed progress in the use of advanced analytics. While I understand the causes for concern, I find it hard to disagree with the court’s decision based on my understanding of the relevant judicial precedent and the gravity of the flaws associated with keyword search culling.

Personally, I don’t believe that TAR judicial history to date, apart from the circumstance and proportionality based rulings in In re Biomet (Apr. 18, 2013) and Bridgestone (July 22, 2014), support another outcome….

See the full post at the Altep Blog: FCA US LLC v. Cummings – It’s Not Perfect, but It Does Need to be Better


Substantial Reduction in Review Effort Required to Demonstrate Adequate Recall

Clustify Blog - eDiscovery, Document Clustering, Technology-Assisted Review (Predictive Coding), Information Retrieval, and Software Development

Measuring the recall achieved to within +/- 5% to demonstrate that a production is defensible can require reviewing a substantial number of random documents.  For a case of modest size, the amount of review required to measure recall can be larger than the amount of review required to actually find the responsive documents with predictive coding.  This article describes a new method requiring much less document review to demonstrate that adequate recall has been achieved.  This is a brief overview of a more detailed paper I’ll be presenting at the DESI VII Workshop on June 12th.

The proportion of a population having some property can be estimated to within +/- 5% by measuring the proportion on a random sample of 400 documents (you’ll also see the number 385 being used, but using 400 will make it easier to follow the examples).  To measure recall we need to know what proportion…

View original post 769 more words

What is Continuous Active Learning (CAL), Really? – Part One


Ever since the March 2, 2015 Rio Tinto opinion and order, there has been a lot of buzz in eDiscovery around the phrase “Continuous Active Learning” (CAL). Judge Peck briefly mentioned CAL while summarizing the available case law around seed-set sharing and transparency. For the sake of clarity, the term seed-set in this post refers to the initial group of training documents used to kick off a Technology Assisted Review (TAR) project. We refer to the review sets that follow as training sets. The point of Judge Peck’s mention of CAL, as I understood it, was to alert readers to the possibility that seed-set selection and disclosure disputes may become much less necessary as TAR tools and protocols continue to evolve.

Judge Peck pointed to recent research and a law review article by Maura Grossman and Gordon Cormack to support that notion. Those works made two important points about seed-set documents. First, they asserted that the selection and coding of seed-set documents is less likely to define the ultimate success of TAR projects employing a true CAL protocol. The general theory there is that the influence of misclassified seed documents is fleeting, since the classifier used to identify successive training set documents is recreated after each round, rather than simply revised or refitted. Second, they argued that seed-set transparency is not the guaranteed path to TAR project completeness, since neither the producing nor receiving party has a true understanding of the breadth of the concepts / information types in a collection.

The fact that Judge Peck cited the work of Grossman and Cormack as the basis for his statement is important, because the definition of CAL asserted in those publications is different from what the makers of many TAR tools would offer – even those that claim to be CAL capable.

Read More at the Altep blog: Read More at the Altep Blog: What is Continuous Active Learning (CAL), Really? – Part One

Creative Analytics – Part 3: The Toolbox


By Sara Skeens and Josh Tolles

Welcome to part three of our Creative Analytics series. Part one provided a suggested road-map for getting more comfortable with analytics tools, and exploring more creative uses. In part two, we discussed some of the challenges common to the presentation phase of the EDRM, which require us to look for creative solutions. This brings us to part three – the solutions. In this post we will provide more detail on a few key tools and techniques that we deploy to overcome those common challenges. This final installment is intended to serve as the closing primer for our co-hosted webinar with kCura that will be taking place tomorrow, Wednesday July 13th – Leveraging Analytics for Depo & Trial Prep. Please tune in then where we will put things into a more visual, workflow-based perspective.

Narrowing The Field – Making The Most of Your Time 

Deposition and trial preparations typically begin as production review ends (in some cases the two processes can run over each other as well, adding additional complications). It is here that you are usually faced with making sense of two distinct data sets – your produced documents and productions received. Traditional fact finding efforts here involve simply leveraging reviewer coding and supplemental keyword searches. These techniques are a great place to start, but can be highly time inefficient and almost always suffer in terms of completeness.

One helpful early approach here is to limit your fact finding data set to only unique content as much as possible. Analyzing duplicate content is a painful drain on resources. Whether a false keyword hit, or a true hot document, you generally only need one good look within the four corners to assess its value. This can be a bit counter intuitive, especially if you have been working with family coding guidelines during your review efforts. However, it is best to start small when time is of the essence. Identify key individual documents as quickly as possible and then build context around those items later.

Read more at the Altep blog: Creative Analytics – Part 3: The Toolbox

Creative #Analytics: Solving Challenges in the Presentation Phase

Financial analysis and forecast

This post is Part 2 of a series – you can also watch a video of the related webinar, or read Part 1, on the kCura Blog.

By Joshua Tolles and  Sara Skeens

In our last post, we discussed the value of looking at analytics in e-Discovery with a creative mindset, and a few steps that you can take to expand your problem solving horizons. As we noted there, analytics is most commonly thought of as a tool to be applied during the review phase of the EDRM to control data sizes; however, we’d like to change that. At Altep, we frequently use analytics to solve many more problems than just those found in the production review arena. With a firm grasp on the technology, plenty of curiosity, and a healthy passion for “building a better mouse trap,” we have found quite a few areas where analytics can help turn the eDiscovery rat race into a more methodical and scalable process.

The presentation phase of the EDRM is one such an area. While the EDRM roadmap tells us that analysis occurs in conjunction with review and production, much of the real analysis work is done post-production, in the time leading up to presentation. Cases are often made or broken at deposition, and most certainly at trial. Thorough preparation and a crystal clear understanding of the facts and available evidence are essential to success. However, you may encounter any of several potential pitfalls as you meet your discovery deadline and begin preparations.

Read more at the Altep blog: Creative Analytics – Part 2: The Presentation Phase

3 Steps to More Creative e-Discovery Analytics

By: Sara Skeens and Joshua Tolles


Flexibility and adaptability are two of the more important traits of any highly successful legal professional. Those traits are rarely more in demand than right now, when growing data volumes mean we continue to see and solve new and different discovery challenges—many of which would have seemed impossible or too difficult to resolve just a few years ago.

As the average case size continues to grow, and the definition of “unduly burdensome” continues to develop, a premium has been placed on discovery strategies that are both defensible and cost-effective. Both new and existing technologies have become the key to addressing this evolution.

Step 1: Build a strong foundation.

A strong understanding of technology provides the foundation for innovation. A deep understanding of what’s available, when combined with flexibility, creativity, and an understanding of how the tools can work together, opens up a world of problem-solving possibilities. Armed with technical know-how and a drive to think outside the box, you are not limited by when and how these tools are most commonly used. Playing off the strengths and weaknesses of each tool and its ability to solve a problem can fill in gaps and increase efficiency.

Analytics as a technology was primarily introduced in e-discovery to solve the challenge of growing data sizes and associated, often prohibitively high, review costs. Applications such as clustering, categorization, technology-assisted review (TAR), email threading, and near-duplicate detection are now implemented on a daily basis to do just that. Growing acceptance means these tools often come with templated workflows that make getting started much easier. Starting small by putting these workflows into practice can help you build enough expertise to identify new opportunities for tackling your most complex projects with customized, combined, and creative workflows ideated with your unique goals in mind.

Step 2: Explore each analytics feature on its own merit.

Did you know the same technology supports both categorization and technology-assisted review? In both use cases, the technology is trained by users’ decisions to organize documents based on content. There are, however, differences in the training and quality control methodologies that allow each of these options to be more applicable in certain situations. Categorization might be a useful exercise for QC purposes when it’s performed in conjunction with a privilege review, for instance, while a TAR project can help accelerate the earliest stages of reviewing your data.

These differences in approach exemplify the ability to use the same underlying technology to solve various challenges. As you begin implementing analytics in your projects, get to know each feature and how it can benefit different use cases. What makes email threading valuable? What about clustering?

Step 3: Explore analytics features in different combinations and stages.

The beauty of analytics is that it is more a method than a tool. “The method of logical analysis” is how Merriam-Webster defines analytics, and its very nature makes it ideal for flexing and adapting to new use cases.

Once you’ve gotten comfortable with each analytics tool and the benefits it can present, you’ll have the confidence to start combining features in the same workflows to see how one tool’s results can improve another’s. Maybe just one feature will do the job for the small case that just came through your door. But maybe a combination of features is required for the next big case. For example, how does email threading cut down noise in a TAR project? How might foreign language identification make your team’s approach to clustering results of international data more efficient? The possibilities are endless.

With a little education and expert guidance, you can apply analytics tools creatively, without limiting your team to a single, go-to approach that may not be up to snuff for solving your most complex e-discovery challenges.

You can even apply analytics beyond the scope of initial review. We at Altep will be releasing a series of posts on the Altep blog discussing how to use analytics in e-discovery in new and exciting ways, leading up to a thought-provoking webinar discussing its use in the presentation phase of the EDRM. There has been little focus on leveraging analytics during this phase, though it can benefit teams faced with organizing sets of produced data while courtroom deadlines loom.

By creatively leveraging analytics during deposition and trial, you can considerably cut time and costs during this phase, as well as locate key information more quickly to increase your chances for success. The techniques at your disposal are certainly a departure from traditional strategies, but they are tested and proven solutions that work.

Sara Skeens is a consultant for advanced review and analytics with Altep’s litigation consulting group. She has over 10 years of experience providing solutions and workflow guidance to case teams and enterprise clients in the areas of preservation, review, analysis, production, and presentation. She is a Relativity Certified Expert and has held positions in law firms, government, and providers working in both criminal and civil litigation, as well as investigations.

Joshua Tolles is a senior consultant for advanced review and analytics with Altep’s litigation consulting group. In this role, he provides process, solutions, and workflow guidance to case teams and enterprise clients in the areas of preservation, collections, processing, review, analysis, and production. Joshua is a licensed attorney in Washington State and the District of Columbia, and a Relativity Certified Expert.

Also available via the kCura blog: 3 Steps to More Creative e-Discovery Analytics

My Top Five Takeaways from The U.S. Tax Court’s Emphatic Affirmation of Predictive Coding


Dynamo Holdings Limited Partnership v. Commissioner

In an order dated July 13, 2016, the U.S. Tax Court once again strongly supported the use of Predictive Coding. The case had already featured some notable opinions and orders on the topic. This recent order is a fun read for analytics nerds and newcomers alike, as the Court did a great job of laying out the associated facts and addressing the typical arguments for and against use of the technology. Here are a few items that caught my attention as I read it.

Opposing Party as SME

As I noted in a recent blog post, the use of an SME to train a Predictive Coding algorithm can be an expensive proposition. The responding party here took an interesting approach by allowing the requesting party to serve as their SME. There are certainly cases where parties meet to discuss seed documents, but there are also a fair number in which the parties do not feel the need to disclose use of the technology, never mind the documents used to train it.

I can understand why parties are uncomfortable submitting to their opponent’s discretion. However, Dynamo’s aggressive training approach clearly helped their cause. In fact, the Commissioner’s training of the algorithm was the only process-related factor the Court mentioned when concluding that “[t]here is no question” Dynamo met its discovery obligations. It is an interesting strategy to consider if Predictive Coding is accepted by the court and parties, as it may save you some training costs and bolster the defensibility of your process.

Read more at the Altep blog: My Top Five Takeaways from The U.S. Tax Court’s Emphatic Affirmation of Predictive Coding

To SME or Not to SME (in TAR)… That is the Question


This article assumes that Technology Assisted Review is being deployed in a production review setting where the user seeks to identify potentially relevant documents from among a larger corpus, and to subject those documents to full manual review. The use of TAR as an investigative or fact finding tool is a more financially flexible proposition, and the efficiency of that approach should be evaluated via separate standards.

There has been some debate in the past few years about the proper role of the Subject Matter Expert (SME) in technology assisted review (TAR) – a discussion which has understandably resulted in plenty of disagreement. There was a time when most blog posts and white papers swore that SME training was the only path to success, but that position looks to have softened some.

I have always been a bit skeptical of the necessity of SME training, especially when that SME takes the form of a law firm partner or senior associate. While a more compelling argument can be made for client insiders as SME trainers, I am not convinced that SME training is necessary in either case. There are two main factors that drive my opinion here – consistency and cost. Each of these factors must be carefully considered before embarking on a TAR project.

Proper planning is required to achieve consistency. However, consistency coming at a price tag that is greater than anticipated review savings is practically meaningless. The margin for cost efficiency between linear manual review and technology assisted review is thinner than you might think if you don’t carefully consider and control training costs.

Read more at the Altep blog: To SME or Not to SME (in TAR)… That is the Question

Redefining Responsiveness Evaluation: Making the Case for True Unique Document Review


If you are reading this blog, you have probably heard the story many times by now. Document review is the most expensive part of eDiscovery. Like many, I find myself asking the same question again and again. How can we do it better? One obvious answer is by defensibly reviewing less. The not so obvious part of that answer is the available methods for doing so.

The growing absurdity of document review volumes and associated costs have been an obsession of mine for a while now. I have spent a lot of time in the past few years studying methods and technologies capable of better controlling those volumes and cost. Global de-duplication, e-mail threading, textual near duplicate identification, and more complex varieties of technology assisted review (TAR), such as Equivio Relevance and Relativity Assisted Review, have all proved useful in my quest to help clients review and spend less.

Read more at the Altep blog: Redefining Responsiveness Evaluation: Making the Case for True Unique Document Review