Detecting Hidden Data In Office Documents — Soapbox (Part 1 of 5)

This is the first part of a five part entry regarding practical tips for identify hidden or difficult to detect data in Microsoft Office documents.  Part one is where I spend a bit of time on my soap box talking about why this issue is important to me.  Parts two through five will discuss some quick practical tips for identifying and reviewing difficult data that I have picked up over the years while doing and managing the work.  I am certainly not the first to write a post on this topic (see, Accessing hidden metadata in Office documents for eDiscovery and  There’s Hidden, and Very Hidden Data in Excel), and I certainly don’t claim to bring any groundbreaking information to the game.  I just felt like writing about something, and this topic seemed like a good place to start.  I hope you enjoy these posts and find them to be of some use.

As a former manager of document review projects, there was one thing that kept me awake at night more than any other.  It wasn’t the sometimes impossible client or partner driven deadlines.  It wasn’t the difficulty of managing groups of attorneys, most of whom no longer had the will or morale to be concerned with improving their eDiscovery skills or knowledge after years of soulless and thankless work.  That particular problem was certainly the second most concerning issue I encountered, and it led to me becoming very passionate about how people should be managed, developed, and mentored.  But that is a topic for another time and another post in the near future.

The concern that kept me awake more than any other was the fear that someone on my team would miss something, a critical fact for the case or some indicator of privilege, because they didn’t know how to spot the signs of hidden content, and did not know how to root it out when they did.  As anyone who has been in the industry well knows, it is a delicate balance of speed and precision, and you sometimes need to sacrifice one at the expense of the other depending upon the circumstances of a particular assignment.  Speed is always appreciated and reviewers not keeping pace will be dropped.  A blatant lack of precision in favor of speed usually becomes obvious after a short time and those reviewers will also be dropped.  These are the nuts and bolts of grading document reviewers, and it is easy to get lost in the all too unreliable metrics.  But the metrics are a somewhat superficial evaluation of talent, and are more or less establishing acceptable minimums rather than evaluating top end performers.  The numbers make it easy to put people in boxes with regard to their performance, and that makes us feel like we are managing.  There is a need to get beyond those numbers to see who really “gets it” and to leverage that understanding to inform the decisions of the rest of the team.  It is no easy task, however, when most cases are working under very difficult timelines, and with a temporary project goal as the focus.

As I alluded to above, the world of document review is also a very transient one.  Project assignments are often short-term, and upon short notice.  I worked in the industry for about nine years, and I worked with at least a few hundred attorneys on various projects during that time.  I know that might not sound like much, but I was on only a couple of temporary assignments before having the good fortune to land a permanent position with a great firm for seven of those nine years.  Even with a permanent position, there were always many attorneys coming and going.  The constant exposure to all of these different project groups really opened my eyes as to the unbelievably wide-ranging computer literacy and Microsoft Office skills that the average contract attorney in the industry may have.  While the quick and necessary answer to the problem of a lack of computer literacy was often to try to teach the individual enough to get them functioning and ready for review, I became increasing obsessed with the contradiction that you can have these same individuals effectively reviewing documents that all to often require some specialized knowledge just to make sure they can see what the document is really saying.  Whether idealistic or not, I became fixated during the last few years of my term in that role upon the idea that these attorneys must be taught the necessary skill to effectively review documents.

Now, I am not saying that every document has some kind of secret hidden code embedded where only a select few know how to find it.  What I am saying that that on any given day, on any given project, a reviewer is likely to encounter Word, PowerPoint, or Excel documents where some significant amount of data is hidden or obscured from view.  The intent in “hiding” this data is not always something sinister.  In fact, it is usually quite the opposite.  Data within a spreadsheet is typically hidden from view with the very simple intention of presenting a cleaner and more attractive document that is not full of distractions.  I am also not here to entertain the arguments that data processing tools make it so that reviewers don’t need to worry much about this issue since almost all content can be “un-hidden” during processing.  As someone who has been there and done that, I can say with certainty that this assertion is false.  Some of the work can be done during processing, but a reviewer still must be able to at least spot signs of potentially concerning hidden data.

It was with these thoughts rattling around my head that I set out to develop a presentation a couple of years ago to teach our team members about the dangers of hidden data in Office documents, and how to mitigate the associated risks through effective identification and review of that data.  I considered that initial presentation to be a success, even though it definitely bored some in the audience to sleep over the course of those 50 minutes.  However, that presentation, in both format and substance, was not really something that could be shared publicly for a variety of reasons.  So, I figured it is about time that I take what I consider to the be most important aspects of the presentation and distill them down into something fit for public consumption.  That is my goal for parts two through five of this entry.  If I have not bored to you to sleep yet with my rambling, please do stay tuned and I will try to get that entry in order.  This really is a critical eDiscovery topic in my mind, and I think it is issues like this one that are at the heart of many of the opinions authored in the recent years expressing concern over attorney eDiscovery competency.  Some small part of me wants to think that this should be a non-issue these days, but just like the inability to effectively image Excel documents, I am afraid it lives on.  Thanks again for reading thus far.

Stay tuned…

Advertisements

One thought on “Detecting Hidden Data In Office Documents — Soapbox (Part 1 of 5)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s