2 thoughts on “Accessing hidden metadata in Office documents for eDiscovery

  1. Fine post! If we look at the image you shared of the post that has comments and changes revealed, we see that there are collaborative/editorial comments that are communications between you and a reviewer (or perhaps you making notes to yourself). In either instance, they are substantive with respect to the document and, as you note, cannot be altered once a lit hold attaches because they are part-and-parcel of the evidence, not unlike handwritten notes in the margins of a paper record. Though I also tend to refer to this content “application metadata,” it’s really user-supplied content so, technically, it’s not metadata at all. But what we call it isn’t as important as making sure it sees the light of day in e-discovery.

    So, what do most producing parties do with this information in discovery? For the most part, they image the document to TIFF without exposing the comments and changes on the page. The result is that the comments and such are not seen or reviewed for responsiveness or privilege. It is simply discarded in the imaging process, and that “redaction” of substantive content happens without claim of privilege or confidentiality. It isn’t logged or disclosed. The requesting party never gets this content, and the producing party quietly pretends it wasn’t there.

    “But wait,” our hypothetical opponent says, “Our practice is to create a second TIFF image with the content revealed, because we are better at this than those other sloppy folks.” Splendid! So, how then do they restore searchability to the information from the content revealed version; you know, the searchability they stripped away when they converted the document to TIFF? Check the text load file they supply to restore searchability, and you’ll likely discover that it’s useless junk. All those little dot leaders extending from the text to the comment or change? I bet you’ll find that they’ve inserted them between the letters in the text, so now a search for “hidden personal metadata” had better be framed as a search for “h..i.d.d..e.n….p.e..r.so…n.al….m.e..t.a.d..a..t.a,” if you have any hope to find what you’re seeking. I kid you not.

    So, somehow you discover this (because every requesting party checks the text load file against the image text, right?), and you get the court to send the other side back to the vendor. This time they claim they’ve nailed because this time they’ve extracted the comments and changes directly from the native document and put it in the text load file. Now, it’s searchable, sure, but, oh joy, the changes and comments are all bunched together with no way to determine where they’d been placed in the document. They’ve dropped to the bottom of the page like needles beneath a dry Christmas tree, and if you want to know where they go, you have to do the search and look to the image to figure out the context (though a search within a multipage document may not bring you to the right).

    That’s State of the Art in the TIFF production world, and one of e-discovery’s dirty little secrets. And people wonder why I insist on production of MS Office documents in their native forms!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s