Tag Archives: predictive coding

Key to Improving Predictive Coding Results: Effective ECA

Predictive Coding, when correctly employed, can significantly reduce legal review costs with generally more accurate results than other traditional legal review processes. However, the benefits associated with predictive coding are often undercut by the over-collection and over-inclusion of Electronically Stored Information (ESI) into the predictive coding process. This is problematic for two reasons.

The first reason is obvious, the more data introduced into the process, the higher the cost and burden. Some practitioners believe it is necessary to over-collect and subsequently over-include ESI to allow the predictive coding process to sort everything out. Many service providers charge by volume, so there can be economic incentives that conflict with what is best for the end-client. In some cases, the significant cost savings realized through predictive coding are erased by eDiscovery costs associated with overly aggressive ESI inclusion on the front end.

The second reason why ESI over-inclusion is detrimental is less obvious, and in fact counter intuitive to many. Some discovery practitioners believe as much data as possible needs to be put through the predictive coding process in order to “better train” the machine learning algorithms. However this is contrary to what is actually true. The predictive coding process is much more effective when the initial set of data has a higher richness (also referred to as “prevalence”) ratio. In other words, the higher the rate of responsive data in the initial data set, the better. It has always been understood that document culling is very important to successful, economical document review, and that includes predictive coding.

Robert Keeling, a senior partner at Sidley Austin and the co-chair of the firm’s eDiscovery Task Force, is a widely recognized legal expert in the areas of predictive coding and technology assisted review.  At Legal Tech New York earlier this year, he presented at an Emerging Technology Session: “Predictive Coding: Deconstructing the Secret Sauce,” where he and his colleagues reported on a comprehensive study of various technical parameters that affect the outcome of a predictive coding effort.  According to Robert, the study revealed many important findings, one of them being that a data set with a relatively high richness ratio prior to being ingested into the predictive coding process was an important success factor.

To be sure, the volume of ESI is growing exponentially and will only continue to do so. The costs associated with collecting, processing, reviewing, and producing documents in litigation are the source of considerable pain for litigants. The only way to reduce that pain to its minimum is to use all tools available in all appropriate circumstances within the bounds of reasonableness and proportionality to control the volumes of data that enter the discovery pipeline, including predictive coding.

Ideally, an effective early case assessment (ECA) capability can enable counsel to set reasonable discovery limits and ultimately process, host, review and produce less ESI.  Counsel can further use ECA to gather key information, develop a litigation budget, and better manage litigation deadlines. ECA also can foster cooperation and proportionality in discovery by informing the parties early in the process about where relevant ESI is located and what ESI is significant to the case. And with such benefits also comes a much more improved predictive coding process.

X1 Distributed Discovery (X1DD) uniquely fulfills this requirement with its ability to perform pre-collection early case assessment, instead of ECA after the costly, time consuming and disruptive collection phase, thereby providing a game-changing new approach to the traditional eDiscovery model.  X1DD enables enterprises to quickly and easily search across thousands of distributed endpoints from a central location.  This allows organizations to easily perform unified complex searches across content, metadata, or both and obtain full results in minutes, enabling true pre-collection ECA with live keyword analysis and distributed processing and collection in parallel at the custodian level. To be sure, this dramatically shortens the identification/collection process by weeks if not months, curtails processing and review costs from not over-collecting data, and provides confidence to the legal team with a highly transparent, consistent and systemized process. And now we know of another key benefit of an effective ECA process: much more accurate predictive coding.

Leave a comment

Filed under ECA, eDiscovery

Social Media Discovery Hotter Than Predictive Coding?

fire isolated over black backgroundIt was a great show last week at LegalTech New York. Definitely an increase in the number and quality of attendees and it was nice to see many friends and colleagues both old and new.

Also very noticeable were the many, many vendors sporting predictive coding (aka technology assisted review) messaging in their respective booths and various forms of marketing material. In fact, one industry colleague pointed me to a recent bold prediction offered by Recommind lawyer Howard Sklar, essentially proclaiming that predictive coding will have really hit the big time when a state bar organization issues an ethics opinion stating “that failure to use predictive coding is ethically questionable, if not unethical.” Sklar goes on to predict that such an opinion will come within the next 18 months.

I don’t disagree that such a development would be a big deal. But my question is, why stop at an advisory ethics opinion? What about an actual published court opinion where a sitting appellate judge decrees, without mincing words, that legal ethics obligations require lawyers to employ predictive coding? Now that would be huge. Something, in fact, like Griffin v. State, 192 Md. App. 518, 535 (2010), which addresses another hot topic in eDiscovery:

“[I]t should now be a matter of professional competence for attorneys to take the time to investigate social networking sites.”

Now to be fair, I must point out that Griffin v. State was reversed and remanded on other grounds (419 Md. 343 (2011)), but I would argue the overall impact from an ethics and best practices standpoint is still there.

Sklar also points out three appellate level cases with written opinions that discuss the concept of predictive coding, without any definitive rulings compelling its use, but nonetheless discussing the concept. Two of the three are even retrievable on Westlaw. I think such appellate-level published decisions are important, which is why we highlight the several thousands of published court decisions in the past three years (see here and here) that have compelled the production of, admitted into evidence, or otherwise recognized the importance of social media evidence to the case at hand. New cases are being published every day, to the point where we candidly have stopped counting due to the deluge. So by the standard set by of my esteemed fellow eDiscovery lawyer Mr. Sklar, social media discovery is a very hot field indeed.

Leave a comment

Filed under Case Law, eDiscovery & Compliance, Social Media Investigations

Judge Peck: Cloud For Enterprises Not Cost-Effective Without Efficient eDiscovery Process

Hon. Andrew J. Peck
United States Magistrate Judge

Federal Court Magistrate Judge Andrew Peck of the New York Southern District is known for several important decisions affecting the eDiscovery field including the ongoing  Monique da Silva Moore v. Publicis Group SA, et al, case where he issued a landmark order authorizing the use of predictive coding, otherwise known as technology assisted review. His Da Silva Moore ruling is clearly an important development, but also very noteworthy are Judge Peck’s recent public comments on eDiscovery in the cloud.

eDiscovery attorney Patrick Burke, a friend and former colleague at Guidance Software, reports on his blog some interesting comments asserted on the May 22 Judges panel session at the 2012 CEIC conference. UK eDiscovery expert Chris Dale also blogged about the session, where Judge Peck noted that data stored in the cloud is considered accessible data under the Federal Rules of Civil Procedure (see, FRCP Rule 26(b)(2)(B)) and thus treated no differently by the courts in terms of eDiscovery preservation and production requirements as data stored within a traditional network. This brought the following cautionary tale about the costs associated with not having a systematic process for eDiscovery:

Judge Peck told the story of a Chief Information Security Officer who had authority over e-discovery within his multi-billion dollar company who, when told that the company could enjoy significant savings by moving to “the cloud”, questioned whether the cloud provider could accommodate their needs to adapt cloud storage with the organization’s e-discovery preservation requirements. The cloud provider said it could but at such an increased cost that the company would enjoy no savings at all if it migrated to the cloud.

In previous posts on this blog, we outlined how significant cost-benefits associated with cloud migration can be negated when eDiscovery search and retrieval of that data is required.  If an organization maintains two terabytes of documents in the Amazon or other IaaS cloud deployments, how do they quickly access, search, triage and collect that data in its existing cloud environment if a critical eDiscovery or compliance search requirement suddenly arises?  This is precisely the reason why we developed X1 Rapid Discovery, version 4. X1RD is a proven and now truly cloud-deployable eDiscovery and enterprise search solution enabling our customers to quickly identify, search, and collect distributed data wherever it resides in the Infrastructure as a Service (IaaS) cloud or within the enterprise. While it is now trendy for eDiscovery software providers to re-brand their software as cloud solutions, X1RD is now uniquely deployable anywhere, anytime in the IaaS cloud within minutes. X1RD also features the ability to leverage the parallel processing power of the cloud to scale up and scale down as needed. In fact, X1RD is the first pure eDiscovery solution (not including a hosted email archive tool) to meet the technical requirements and be accepted into the Amazon AWS ISV program.

As far as the major cloud providers, the ones who choose to solve this eDiscovery challenge (along with effective enterprise search) with best practices technology will not only drive significant managed services revenue but will enjoy a substantial competitive advantage over other cloud services providers.

Leave a comment

Filed under Best Practices, Case Law, Cloud Data, Enterprise eDiscovery, IaaS, Preservation & Collection