Tag Archives: preservation

“Act Reasonably” — Two Court-Issued Checklists Outlining Defensible, Targeted ESI Collection

Recently two separate and prominent courts — the federal court for the Northern District of California and the Delaware Court of Chancery (which is the primary court of equity for Delaware registered corporations) issued eDiscovery preservation guidelines. This is not unprecedented as other courts have issued similar written guidance in the form of general guidance or even more enforceable local rules of court specifically addressing eDiscovery protocols. What I found particularly interesting, however, is both courts provided fairly specific guidance on the scope of collection and preservation. In the case of the California court, which notes that its “guidelines are designed to establish best practices for evidence preservation in the digital age,” the Court offers a checklist for Rule 26(f) “meet and confer” conferences with good detail on suggested ESI preservation protocols. The Delaware Court of Chancery also issued a detailed checklist or “sample collection outline.” ESI preservation checklists are useful practice guides, and these are sanctioned by two separate influential courts.

This is important as the largest expense directly associated with eDiscovery is the cost of overly inclusive preservation and collection, which leads to increased volume charges and attorney review costs. To the surprise of many, properly targeted preservation initiatives are permitted by the courts and can be enabled by adroit software that is able to quickly and effectively access and search these data sources throughout the enterprise.

The value of targeted preservation is recognized in the Committee Notes to the FRCP amendments, which urge the parties to reach agreement on the preservation of data and the keywords used to identify responsive materials. (Citing the Manual for Complex Litigation (MCL) (4th) §40.25 (2)).  And In re Genetically Modified Rice Litigation, 2007 WL 1655757 (June 5, 2007 E.D.Mo.), the court noted that “[p]reservation efforts can become unduly burdensome and unreasonably costly unless those efforts are targeted to those documents reasonably likely to be relevant or lead to the discovery of relevant evidence.”

The checklist from the California Northern District and the guidelines issued by the Delaware court are consistent with these principles as they call for the specification of date ranges, custodian names and search terms for any ESI to be preserved. The Northern District checklist, for instance, provides for the identification of specific custodians and job titles of custodians whose ESI is to  be preserved, and also specific search phrases search terms “that will be used to identify discoverable ESI and filter out ESI that is not subject to discovery.”

However, many lawyers shy away from a targeted collection strategy over misplaced defensibility concerns, optioning instead for full disk imaging and other broad collection efforts that exponentially escalate litigation costs. The fear by some is that there always may be that one document that could be missed. However, in my experience of following eDiscovery case law over the past decade, the situations where litigants face exposure on the preservation front typically involve an absence of a defensible process. When courts sanction parties, it is usually because there is not a reasonable legal hold procedure in place, where the process is ad hoc and made up on the fly and/or not effectively executed. I am personally unaware of a published decision involving a fact pattern where a company featured a reasonable collection and preservation process involving targeted collection executed pursuant to standard operating procedures, yet was sanctioned because one or two relevant documents slipped through the cracks.

This is because the duty to preserve requires reasonable efforts, not infallible means, to collect potentially relevant information. As succinctly stated by the Delaware court: “Parties are not required to preserve every shred of information. Act reasonably.”

Another barrier standing in the way of defensible and targeted collection is that searching and performing early case assessment at the point of collection is not feasible in the decentralized global enterprise with traditional eDiscovery and information management tools. What is needed to address these challenges for the de-centralized enterprise is a field-deployable search and eDiscovery solution that operates in distributed and virtualized environments on-demand within these distributed global locations where the data resides. In order to meet such a challenge, the eDiscovery and search solution must immediately and rapidly install, execute and efficiently operate locally, including in a virtual environment, where the site data is located, without rigid hardware requirements or on-site physical access.

This ground breaking capability is what X1 Rapid Discovery provides. Its ability to uniquely deploy and operate in the IaaS cloud also means that the solution can install anywhere within the wide-area network, remotely and on-demand. Importantly, the search index is created virtually in the location proximity of the data subject to collection. This enables even globally decentralized enterprises to perform targeted search and collection efforts in an efficient, defensible and highly cost effective manner. Or, in the words of the Delaware court — the ability to act reasonably.

Leave a comment

Filed under Case Law, Cloud Data, Enterprise eDiscovery, IaaS

No Legal Duty or Business Reason to Boil the Ocean for eDiscovery Preservation

As an addendum to my previous blog post on the unique eDiscovery and search burdens associated with the de-centralized enterprise, one tactic I have seen attempted by some CIOs to address this daunting challenge is to try to constantly migrate disparate data from around the globe into a central location. Just this past week, I spoke to a CIO that was about to embark on a Quixotic endeavor to centralize hundreds of terabytes of data so that it could be available for search and eDiscovery collection when needed. The CIO strongly believed he had no other choice as traditional information management and electronic discovery tools are not architected and not suited to address large and disparate volumes of data located in hundreds of offices and work sites across the globe that all store information locally. But boiling the ocean through data migration and centralization is extremely expensive, disruptive and frankly unworkable.

Industry analyst Barry Murphy succinctly makes this point:

Centralization runs counter to the realities of the working world where information must be distributed globally across a variety of devices and applications.  The amount of information we create is overwhelming and the velocity with which that information moves increases daily.  To think that an organization can find one system in which to manage all its information is preposterous. At the same time, the FRCPs essentially put the burden on organizations to be accountable for all information, able to conduct eDiscovery on a moment’s notice.  As we’ve seen, the challenge is daunting.

As I wrote earlier this month, properly targeted preservation initiatives are permitted by the courts and can be enabled by effective software that is able to quickly and effectively access and search these data sources throughout the enterprise.  The value of targeted preservation was recognized in the Committee Notes to the FRCP amendments, which urge the parties to reach agreement on the preservation of data and the keywords used to identify responsive materials. (Citing the Manual for Complex Litigation (MCL) (4th) §40.25 (2)).  And In re Genetically Modified Rice Litigation, 2007 WL 1655757 (June 5, 2007 E.D.Mo.), the court noted that “[p]reservation efforts can become unduly burdensome and unreasonably costly unless those efforts are targeted to those documents reasonably likely to be relevant or lead to the discovery of relevant evidence.”

What is needed to address both eDiscovery and enterprise search challenges for the de-centralized enterprise is a field-deployable search and eDiscovery solution that operates in distributed and virtualized environments on-demand within these distributed global locations where the data resides. This ground breaking capability is what X1 Rapid Discovery provides. Its ability to uniquely deploy and operate in the IaaS cloud also means that the solution can install anywhere within the wide-area network, remotely and on-demand. This enables globally de-centralized enterprises to finally address their overseas data in an efficient, expedient, defensible and highly cost-effective manner.

But I am interested in hearing if anyone has had success with the centralization model. In my 12 years in this business and the 8 years before that as a corporate attorney, I have yet to see an effective or even workable situation where a global enterprise has successfully centralized all of their electronically stored information into a single system consisting of hundreds of terabytes. If you can prove me wrong and point to such a verifiable scenario, I’ll buy you a $100 Starbucks gift certificate or a round of drinks for you and your friends at ILTA next week.  If you want to take the challenge of just meet up at ILTA next week in Washington, feel free to email me.

Leave a comment

Filed under Cloud Data, eDiscovery & Compliance, Enterprise eDiscovery, IaaS, Preservation & Collection

Authenticating Internet Web Pages as Evidence: a New Approach

By John Patzakis and Brent Botta

In recent posts, we have addressed the issue of evidentiary authentication of social media data. (See previous entries here and here). General Internet site data available through standard web browsing, instead of social media data provided by APIs or user credentials, presents slightly different but just as compelling challenges.

The Internet provides torrential amounts of evidence potentially relevant to litigation matters, with courts routinely facing proffers of data preserved from various websites. This evidence must be authenticated in all cases, and the authentication standard is no different for website data or chat room evidence than for any other. Under Federal Rule of Evidence 901(a), “The requirement of authentication … is satisfied by evidence sufficient to support a finding that the matter in question is what its proponent claims.” United States v. Simpson, 152 F.3d 1241, 1249 (10th Cir. 1998).

Ideally, a proponent of the evidence can rely on uncontroverted direct testimony from the creator of the web page in question. In many cases, however, that option is not available. In such situations, the testimony of the viewer/collector of the Internet evidence “in combination with circumstantial indicia of authenticity (such as the dates and web addresses), would support a finding” that the website documents are what the proponent asserts. Perfect 10, Inc. v. Cybernet Ventures, Inc. (C.D.Cal.2002) 213 F.Supp.2d 1146, 1154. (emphasis added) (See also, Lorraine v. Markel American Insurance Company, 241 F.R.D. 534, 546 (D.Md. May 4, 2007) (citing Perfect 10, and referencing MD5 hash values as an additional element of potential “circumstantial indicia” for authentication of electronic evidence).

One of the many benefits of X1 Social Discovery is its ability to preserve and display all the available “circumstantial indicia” – to borrow the Perfect 10 court’s term — to the user in order to present the best case possible for the authenticity of Internet-based evidence collected with the software. This includes collecting all available metadata and generating a MD5 checksum or “hash value” of the preserved data.

But html web pages pose unique authentication challenges and merely generating an MD5 checksum of the entire web page, or just the web page source file, provides limited value because web pages are constantly changing due to their very fluid and dynamic nature. In fact, a web page collected from the Internet in immediate succession would very likely calculate two different MD5 checksums. This is because web pages typically feature links to many external items that are dynamically loaded upon each page view. These external links take the form of cascading style sheets (CSS), graphical images, JavaScripts and other supporting files. This linked content can be stored on another server in the same domain, but is often located somewhere else on the Internet.

When the Web browser loads a web page, it consolidates all these items into one viewable page for the user. Since the Web page source file contains only the links to the files to be loaded, the MD5 checksum of the source file can remain unchanged even if the content of the linked files become completely different.  Therefore, the content of the linked items must be considered in the authenticity of the Web page. X1 Social Discovery addresses these challenges by first generating an MD5 checksum log representing each item that constitutes the Web page, including the main Web page’s source. Then an MD5 representing the content of all the items contained within the web page is generated and preserved.

To further complicate Web collections, entire sections of a Web page are often not visible to the viewer. These hidden areas serve various purposes, including metatagging for Internet search engine optimization. The servers that host Websites can either store static Web pages or dynamically created pages that usually change each time a user visits the Website, even though the actual content may appear unchanged.

In order to address this additional challenge, X1 Social Discovery utilizes two different MD5 fields for each item that makes a Web page.  The first is the acquisition hash that is from the actual collected information.  The second is the content hash.  The content hash is based on the actual “BODY” of a Web page and ignores the hidden metadata.  By taking this approach, the content hash will show if the user viewable content has actually changed, not just a hidden metadata tag provided by the server. To illustrate, below is a screenshot from the metadata view of X1 Social Discovery for website capture evidence, reflecting the generation of MD5 checksums for individual objects on a single webpage:

The time stamp of the capture and url of the web page is also documented in the case. By generating hash values of all individual objects within the web page, the examiner is better able to pinpoint any changes that may have occurred in subsequent captures. Additionally, if there is specific item appearing on the web page, such as an incriminating image, then is it is important to have an individual MD5 checksum of that key piece of evidence. Finally, any document file found on a captured web page, such as a pdf, Powerpoint, or Word document, will also be individually collected by X1 Social Discovery with corresponding acquisition and content hash values generated.

We believe this approach to authentication of website evidence is unique in its detail and presents a new standard. This authentication process supports the equally innovative automated and integrated web collection capabilities of X1 Social Discovery, which is the only solution of its kind to collect website evidence both through a one-off capture or full crawling, including on a scheduled basis, and have that information instantly reviewable in native file format through a federated search that includes multiple pieces of social media and website evidence in a single case. In all, X1 Social Discovery is a powerful solution to effectively collect from social media and general websites across the web for both relevant content and all available “circumstantial indicia.”

Leave a comment

Filed under Authentication, Best Practices, Preservation & Collection

Judge Peck: Cloud For Enterprises Not Cost-Effective Without Efficient eDiscovery Process

Hon. Andrew J. Peck
United States Magistrate Judge

Federal Court Magistrate Judge Andrew Peck of the New York Southern District is known for several important decisions affecting the eDiscovery field including the ongoing  Monique da Silva Moore v. Publicis Group SA, et al, case where he issued a landmark order authorizing the use of predictive coding, otherwise known as technology assisted review. His Da Silva Moore ruling is clearly an important development, but also very noteworthy are Judge Peck’s recent public comments on eDiscovery in the cloud.

eDiscovery attorney Patrick Burke, a friend and former colleague at Guidance Software, reports on his blog some interesting comments asserted on the May 22 Judges panel session at the 2012 CEIC conference. UK eDiscovery expert Chris Dale also blogged about the session, where Judge Peck noted that data stored in the cloud is considered accessible data under the Federal Rules of Civil Procedure (see, FRCP Rule 26(b)(2)(B)) and thus treated no differently by the courts in terms of eDiscovery preservation and production requirements as data stored within a traditional network. This brought the following cautionary tale about the costs associated with not having a systematic process for eDiscovery:

Judge Peck told the story of a Chief Information Security Officer who had authority over e-discovery within his multi-billion dollar company who, when told that the company could enjoy significant savings by moving to “the cloud”, questioned whether the cloud provider could accommodate their needs to adapt cloud storage with the organization’s e-discovery preservation requirements. The cloud provider said it could but at such an increased cost that the company would enjoy no savings at all if it migrated to the cloud.

In previous posts on this blog, we outlined how significant cost-benefits associated with cloud migration can be negated when eDiscovery search and retrieval of that data is required.  If an organization maintains two terabytes of documents in the Amazon or other IaaS cloud deployments, how do they quickly access, search, triage and collect that data in its existing cloud environment if a critical eDiscovery or compliance search requirement suddenly arises?  This is precisely the reason why we developed X1 Rapid Discovery, version 4. X1RD is a proven and now truly cloud-deployable eDiscovery and enterprise search solution enabling our customers to quickly identify, search, and collect distributed data wherever it resides in the Infrastructure as a Service (IaaS) cloud or within the enterprise. While it is now trendy for eDiscovery software providers to re-brand their software as cloud solutions, X1RD is now uniquely deployable anywhere, anytime in the IaaS cloud within minutes. X1RD also features the ability to leverage the parallel processing power of the cloud to scale up and scale down as needed. In fact, X1RD is the first pure eDiscovery solution (not including a hosted email archive tool) to meet the technical requirements and be accepted into the Amazon AWS ISV program.

As far as the major cloud providers, the ones who choose to solve this eDiscovery challenge (along with effective enterprise search) with best practices technology will not only drive significant managed services revenue but will enjoy a substantial competitive advantage over other cloud services providers.

1 Comment

Filed under Best Practices, Case Law, Cloud Data, Enterprise eDiscovery, IaaS, Preservation & Collection