Detecting Hidden Data In Office Documents — Soapbox (Part 1 of 5)

This is the first part of a five part entry regarding practical tips for identify hidden or difficult to detect data in Microsoft Office documents.  Part one is where I spend a bit of time on my soap box talking about why this issue is important to me.  Parts two through five will discuss some quick practical tips for identifying and reviewing difficult data that I have picked up over the years while doing and managing the work.  I am certainly not the first to write a post on this topic (see, Accessing hidden metadata in Office documents for eDiscovery and  There’s Hidden, and Very Hidden Data in Excel), and I certainly don’t claim to bring any groundbreaking information to the game.  I just felt like writing about something, and this topic seemed like a good place to start.  I hope you enjoy these posts and find them to be of some use.

As a former manager of document review projects, there was one thing that kept me awake at night more than any other.  It wasn’t the sometimes impossible client or partner driven deadlines.  It wasn’t the difficulty of managing groups of attorneys, most of whom no longer had the will or morale to be concerned with improving their eDiscovery skills or knowledge after years of soulless and thankless work.  That particular problem was certainly the second most concerning issue I encountered, and it led to me becoming very passionate about how people should be managed, developed, and mentored.  But that is a topic for another time and another post in the near future.

The concern that kept me awake more than any other was the fear that someone on my team would miss something, a critical fact for the case or some indicator of privilege, because they didn’t know how to spot the signs of hidden content, and did not know how to root it out when they did.  As anyone who has been in the industry well knows, it is a delicate balance of speed and precision, and you sometimes need to sacrifice one at the expense of the other depending upon the circumstances of a particular assignment.  Speed is always appreciated and reviewers not keeping pace will be dropped.  A blatant lack of precision in favor of speed usually becomes obvious after a short time and those reviewers will also be dropped.  These are the nuts and bolts of grading document reviewers, and it is easy to get lost in the all too unreliable metrics.  But the metrics are a somewhat superficial evaluation of talent, and are more or less establishing acceptable minimums rather than evaluating top end performers.  The numbers make it easy to put people in boxes with regard to their performance, and that makes us feel like we are managing.  There is a need to get beyond those numbers to see who really “gets it” and to leverage that understanding to inform the decisions of the rest of the team.  It is no easy task, however, when most cases are working under very difficult timelines, and with a temporary project goal as the focus.

As I alluded to above, the world of document review is also a very transient one.  Project assignments are often short-term, and upon short notice.  I worked in the industry for about nine years, and I worked with at least a few hundred attorneys on various projects during that time.  I know that might not sound like much, but I was on only a couple of temporary assignments before having the good fortune to land a permanent position with a great firm for seven of those nine years.  Even with a permanent position, there were always many attorneys coming and going.  The constant exposure to all of these different project groups really opened my eyes as to the unbelievably wide-ranging computer literacy and Microsoft Office skills that the average contract attorney in the industry may have.  While the quick and necessary answer to the problem of a lack of computer literacy was often to try to teach the individual enough to get them functioning and ready for review, I became increasing obsessed with the contradiction that you can have these same individuals effectively reviewing documents that all to often require some specialized knowledge just to make sure they can see what the document is really saying.  Whether idealistic or not, I became fixated during the last few years of my term in that role upon the idea that these attorneys must be taught the necessary skill to effectively review documents.

Now, I am not saying that every document has some kind of secret hidden code embedded where only a select few know how to find it.  What I am saying that that on any given day, on any given project, a reviewer is likely to encounter Word, PowerPoint, or Excel documents where some significant amount of data is hidden or obscured from view.  The intent in “hiding” this data is not always something sinister.  In fact, it is usually quite the opposite.  Data within a spreadsheet is typically hidden from view with the very simple intention of presenting a cleaner and more attractive document that is not full of distractions.  I am also not here to entertain the arguments that data processing tools make it so that reviewers don’t need to worry much about this issue since almost all content can be “un-hidden” during processing.  As someone who has been there and done that, I can say with certainty that this assertion is false.  Some of the work can be done during processing, but a reviewer still must be able to at least spot signs of potentially concerning hidden data.

It was with these thoughts rattling around my head that I set out to develop a presentation a couple of years ago to teach our team members about the dangers of hidden data in Office documents, and how to mitigate the associated risks through effective identification and review of that data.  I considered that initial presentation to be a success, even though it definitely bored some in the audience to sleep over the course of those 50 minutes.  However, that presentation, in both format and substance, was not really something that could be shared publicly for a variety of reasons.  So, I figured it is about time that I take what I consider to the be most important aspects of the presentation and distill them down into something fit for public consumption.  That is my goal for parts two through five of this entry.  If I have not bored to you to sleep yet with my rambling, please do stay tuned and I will try to get that entry in order.  This really is a critical eDiscovery topic in my mind, and I think it is issues like this one that are at the heart of many of the opinions authored in the recent years expressing concern over attorney eDiscovery competency.  Some small part of me wants to think that this should be a non-issue these days, but just like the inability to effectively image Excel documents, I am afraid it lives on.  Thanks again for reading thus far.

Stay tuned…

Visualizing Data in a Predictive Coding Project

e-Discovery Team ®

This blog will share a new way to visualize data in a predictive coding project. I only include a brief description this week. Next week I will add a full description of this project. Advanced students should be able to predict the full text from the images alone. Study the text and try to figure out the details of what is going on.

Soon all good predictive coding software will include visualizations like this to help searchers to understand the data. The images can be automatically created by computer to accurately visualize exactly how the data is being analyzed and ranked. Experienced searchers can use this kind of visual information to better understand what they should do next to efficiently meet their search and review goals.

For a game try to figure out how the high and low number of relevant documents that you must find in this review project to…

View original post 560 more words

Use Excel to Count the Number of Emails in Each Email Chain

Excel Esquire

Courts and litigants have long struggled with the question of how to describe email chains on a privilege log.  Should you log only the most recent email, or log every email in the chain–or something in between?  New York has recently adopted a potentially burdensome rule on this topic–one that cries out for an Excel solution.

Effective September 2, 2014, Commercial Division Rule 11-b imposes new obligations on litigants in New York Supreme who create document-by-document privilege logs, as opposed to the now-preferred “categorical privilege logs.” See here to read the rule.  Among other things, entries for email chains should now indicate “the number of e-mails within the dialogue.”  Rule 11-b (b)(3)(iii).  That means you can log only the most recent email in a given chain, but you need to also disclose how many emails are in the chain.

How, exactly, does one figure out the number of emails in every email…

View original post 751 more words