Tag Archives: ECA

Key to Improving Predictive Coding Results: Effective ECA

Predictive Coding, when correctly employed, can significantly reduce legal review costs with generally more accurate results than other traditional legal review processes. However, the benefits associated with predictive coding are often undercut by the over-collection and over-inclusion of Electronically Stored Information (ESI) into the predictive coding process. This is problematic for two reasons.

The first reason is obvious, the more data introduced into the process, the higher the cost and burden. Some practitioners believe it is necessary to over-collect and subsequently over-include ESI to allow the predictive coding process to sort everything out. Many service providers charge by volume, so there can be economic incentives that conflict with what is best for the end-client. In some cases, the significant cost savings realized through predictive coding are erased by eDiscovery costs associated with overly aggressive ESI inclusion on the front end.

The second reason why ESI over-inclusion is detrimental is less obvious, and in fact counter intuitive to many. Some discovery practitioners believe as much data as possible needs to be put through the predictive coding process in order to “better train” the machine learning algorithms. However this is contrary to what is actually true. The predictive coding process is much more effective when the initial set of data has a higher richness (also referred to as “prevalence”) ratio. In other words, the higher the rate of responsive data in the initial data set, the better. It has always been understood that document culling is very important to successful, economical document review, and that includes predictive coding.

Robert Keeling, a senior partner at Sidley Austin and the co-chair of the firm’s eDiscovery Task Force, is a widely recognized legal expert in the areas of predictive coding and technology assisted review.  At Legal Tech New York earlier this year, he presented at an Emerging Technology Session: “Predictive Coding: Deconstructing the Secret Sauce,” where he and his colleagues reported on a comprehensive study of various technical parameters that affect the outcome of a predictive coding effort.  According to Robert, the study revealed many important findings, one of them being that a data set with a relatively high richness ratio prior to being ingested into the predictive coding process was an important success factor.

To be sure, the volume of ESI is growing exponentially and will only continue to do so. The costs associated with collecting, processing, reviewing, and producing documents in litigation are the source of considerable pain for litigants. The only way to reduce that pain to its minimum is to use all tools available in all appropriate circumstances within the bounds of reasonableness and proportionality to control the volumes of data that enter the discovery pipeline, including predictive coding.

Ideally, an effective early case assessment (ECA) capability can enable counsel to set reasonable discovery limits and ultimately process, host, review and produce less ESI.  Counsel can further use ECA to gather key information, develop a litigation budget, and better manage litigation deadlines. ECA also can foster cooperation and proportionality in discovery by informing the parties early in the process about where relevant ESI is located and what ESI is significant to the case. And with such benefits also comes a much more improved predictive coding process.

X1 Distributed Discovery (X1DD) uniquely fulfills this requirement with its ability to perform pre-collection early case assessment, instead of ECA after the costly, time consuming and disruptive collection phase, thereby providing a game-changing new approach to the traditional eDiscovery model.  X1DD enables enterprises to quickly and easily search across thousands of distributed endpoints from a central location.  This allows organizations to easily perform unified complex searches across content, metadata, or both and obtain full results in minutes, enabling true pre-collection ECA with live keyword analysis and distributed processing and collection in parallel at the custodian level. To be sure, this dramatically shortens the identification/collection process by weeks if not months, curtails processing and review costs from not over-collecting data, and provides confidence to the legal team with a highly transparent, consistent and systemized process. And now we know of another key benefit of an effective ECA process: much more accurate predictive coding.

Leave a comment

Filed under ECA, eDiscovery

Changing the Game for Rule 26(f) Meet and Confer Efforts with Pre-Collection Early Data Assessment

One of the most important provisions of the Federal Rules of Civil Procedure that impact eDiscovery is Rule 26(f), which requires the parties’ counsel to “meet and confer” in Meet and Conferadvance of the pre-trial scheduling conference on key discovery matters, including the preservation, disclosure and exchange of potentially relevant electronically stored information (ESI).  With the risks and costs associated with eDiscovery, this early meeting of counsel is a critically important means to manage and control the cost of eDiscovery, and to prevent the failure to preserve relevant ESI.

A key authority on the Rule 26(f) eDiscovery topics to be addressed is the “Suggested Protocol for Discovery of Electronically Stored Information,” provided by Magistrate Judge Paul W. Grimm and his joint bar-court committee. Under Section 8 of the Model Protocol, the topics to be discussed at the Rule 26(f) conference include: “Search methodologies for retrieving or reviewing ESI such as identification of the systems to be searched;” and “the use of key word searches, with an agreement on the words or terms to be searched” and “limitations on the time frame of ESI to be searched; limitations on the fields or document types to be searched.”

However, Rule 26(f) conferences occur early on in the litigation, typically within weeks of the case’s filing. As such, attorneys representing enterprises are essentially flying blind at this pre-collection stage, without any real visibility into the potentially relevant ESI across an organization. This is especially true in regard to unstructured, distributed data, which is invariably the majority of ESI that is ultimately collected in a given matter.

Ideally, an effective early data assessment (EDA) capability can enable counsel to set reasonable discovery limits and ultimately process, host, review and produce less ESI.  Counsel can further use EDA to gather key information, develop a litigation budget, and better manage litigation deadlines. EDA also can foster cooperation and proportionality in discovery by informing the parties early in the process about where relevant ESI is located and what ESI is significant to the case.

The problem is any keyword protocols are mostly guesswork at the early stage of litigation, as under current eDiscovery practices, the costly and time consuming step of actual data collection must occur before pre-processing EDA can take place. When you hear eDiscovery practitioners talk about EDA, they are invariably speaking of a post-collection, pre-review process. But without requisite pre-collection visibility into distributed ESI, counsel typically resort to directing broad collection efforts, resulting in much greater costs, burden and delays.

What is clearly needed is the ability to perform pre-collection early data assessment, instead of EDA after the costly, time consuming and disruptive collection phase.  X1 Distributed Discovery (X1DD) offers a game-changing new approach to the traditional eDiscovery model.  X1DD enables enterprises to quickly and easily search across thousands of distributed endpoints from a central location.  This allows organizations to easily perform unified complex searches across content, metadata, or both and obtain full results in minutes, enabling true pre-collection EDA with live keyword analysis and distributed processing and collection in parallel at the custodian level. This dramatically shortens the identification/collection process by weeks if not months, curtails processing and review costs from not over-collecting data, and provides confidence to the legal team with a highly transparent, consistent and systemized process.

A recent webinar featuring Duff & Phelps Managing Director and 20-year eDiscovery and computer forensics veteran Erik Laykin included a live demonstration of X1DD searching across 20 distributed endpoints in a manner of seconds. In reaction to this demonstration, Laykin commented “the ability to instantaneously search for keywords across the enterprise for a small or large group of custodians is in its own right a killer application. This particular feature gives you instantaneous answers to one of the key questions folks have been wrestling with for quite some time.”

You can now view a recording of last month’s webinar: eDiscovery Collection: Existing Challenges and a Game Changing Solution, which features an overview of the existing broken state of enterprise eDiscovery collection, culminating with a demonstration of X1 Distributed Discovery. The recorded demo will help illustrate how pre-collection EDA can greatly strengthen counsel’s approach to eDiscovery collection and meet and confer processes.

Leave a comment

Filed under eDiscovery, Preservation & Collection

The Global De-Centralized Enterprise: An Un-Met eDiscovery Challenge

Enterprises with data situated within a multitude of segmented networks across North America and the rest of the world face unique challenges for eDiscovery and compliance-related investigation requirements. In particular, the wide area networks of large project engineering, oil & gas, and systems integration firms typically contain terabytes of geographically disparate information assets in often harsh operating environments with very limited network bandwidth. Information management and eDiscovery tools that require data centralization or run on expensive and inflexible hardware appliances cannot, by their very nature, address critical project information in places like Saudi Arabia, China, or the Alaskan North Slope.

Despite vendor marketing hype, network bandwidth constraints coupled with the requirement to migrate data to a single repository render traditional information management and eDiscovery tools ineffective to address de-centralized global enterprise data. As such, the global decentralized enterprise represents a major gap for in-house eDiscovery processes, resulting in significant expense and inefficiencies. The case of U.S. ex rel. McBride v. Halliburton Co. [1]  illustrates this pain point well. In McBride, Magistrate Judge John Facciola’s instructive opinion outlines Halliburton’s eDiscovery struggles to collect and process data from remote locations:

Since the defendants employ persons overseas, this data collection may have to be shipped to the United States, or sent by network connections with finite capacity, which may require several days just to copy and transmit the data from a single custodian . . . (Halliburton) estimates that each custodian averages 15–20 gigabytes of data, and collection can take two to ten days per custodian. The data must then be processed to be rendered searchable by the review tool being used, a process that can overwhelm the computer’s capacity and require that the data be processed by batch, as opposed to all at once. [2]

Halliburton represented to the court that they spent hundreds of thousands of dollars on eDiscovery for only a few dozen remotely located custodians. The need to force-collect the remote custodians’ entire set of data and then sort it out through the expensive eDiscovery processing phase instead of culling, filtering and searching the data at the point of collection drove up the costs.

Despite the burdens associated with the electronic discovery of distributed data across the four corners of the earth, such data is considered accessible under the Federal Rules of Civil Procedure and thus must be preserved and collected if relevant to a legal matter. However, the good news is that the preservation and collection efforts can and should be targeted to only potentially relevant information limited to only custodians and sources with a demonstrated potential connection to the litigation matter in question.

This is important as the biggest expense associated with eDiscovery is the cost of overly inclusive preservation and collection. Properly targeted preservation initiatives are permitted by the courts and can be enabled by adroit software that is able to quickly and effectively access and search these data sources throughout the enterprise. The value of targeted preservation is recognized in the Committee Notes to the FRCP amendments, which urge the parties to reach agreement on the preservation of data and the key words, date ranges and other metadata to identify responsive materials. [3]  And In re Genetically Modified Rice Litigation, the court noted that “[p]reservation efforts can become unduly burdensome and unreasonably costly unless those efforts are targeted to those documents reasonably likely to be relevant or lead to the discovery of relevant evidence.” [4]

However, such targeted collection and ECA in place is not feasible in the decentralized global enterprise with current eDiscovery and information management tools. What is needed to address these challenges for the de-centralized enterprise is a field-deployable search and eDiscovery solution that operates in distributed and virtualized environments on-demand within these distributed global locations where the data resides. In order to meet such a challenge, the eDiscovery and search solution must immediately and rapidly install, execute and efficiently operate in a localized virtualized environment, including public or private cloud deployments, where the site data is located, without rigid hardware requirements or on-site physical access.

This is impossible if the solution is fused to hardware appliances or otherwise requires a complex on-site installation process. After installation, the solution must be able to index the documents and other data locally and serve up those documents for remote but secure access, search and review through a web browser. As the “heavy lifting” (indexing, search, and document filtering) is all performed locally, this solution can effectively operate in some of the harshest local environments with limited network bandwidth. The data is not only collected and culled within the local area network, but is also served up for full early case assessment and first pass review on site, so that only a much smaller data set of potentially relevant data is ultimately transmitted to a central location.

This ground breaking capability is what X1 Rapid Discovery provides. Its ability to uniquely deploy and operate in the IaaS cloud also means that the solution can install anywhere within the wide-area network, remotely and on-demand. This enables globally decentralized enterprises to finally address their overseas data in an efficient, expedient defensible and highly cost effective manner.

If you have any thoughts or experiences with the unique eDiscovery challenges of the de-centralized global enterprise, feel free to email me. I welcome the collaboration.

___________________________________________

[1] 272 F.R.D. 235 (2011)

[2] Id at 240.

[3] Citing the Manual for Complex Litigation (MCL) (4th) §40.25 (2)):

[4] 2007 WL 1655757 (June 5, 2007 E.D.Mo.)

Leave a comment

Filed under eDiscovery & Compliance, Enterprise eDiscovery

X1 Rapid Discovery: First Enterprise eDiscovery Solution Supporting IaaS Cloud

Today I am pleased to announce our launch of  X1 Rapid Discovery, version 4. X1RD is a proven and now truly cloud-deployable eDiscovery and enterprise search solution enabling our customers to quickly identify, search, and collect distributed data wherever it resides in the Infrastructure as a Service (IaaS) cloud or within the enterprise. X1RD is a sister product to our acclaimed X1 Social Discovery, which we launched last year. Version 3 of X1 Rapid Discovery is a proven early case assessment and enterprise search application, but is now IaaS cloud deployable and features a new interface.

I know what you may be thinking — another eDiscovery CEO re-branding the company’s software as cloud. But hear me out on this. Sure, X1RD can serve as a hosted SaaS solution like many other tools (SaaS hosting has been around for over a decade), but the big news here is that X1RD is now deployable anywhere, anytime in the IaaS cloud within minutes. X1RD also features the ability to leverage the parallel processing power of the cloud to scale up and scale down as needed. In fact, X1RD is the first pure eDiscovery solution (not including a hosted email archive tool) to meet the technical requirements and be accepted into the Amazon AWS ISV program.

So what does this mean? Allow me to illustrate these ground-breaking capabilities through the following two growingly common scenarios faced by organizations today:

Scenario 1: A F1000 company maintains 2 terabytes of data up in the Amazon EC2 or S3 (storage) cloud and suddenly must find the comparatively small amount of relevant data within those 2TB as quickly as possible to respond to a critical investigation requirement. There is no time to spend several weeks downloading the entire 2TB out of the cloud through the thin pipe or waiting for Amazon personnel to copy the entire data set to hard drives and ship it back. What is urgently needed is the ability to quickly install eDiscovery software to index, search and review that data in the very IaaS cloud environment where it exists. That way only the small data set (say 10 gigabytes) of relevant data is identified and then finally exported. That is what X1 Rapid Discovery delivers.

Scenario 2: The same investigation sends the company’s eDiscovery consultant overseas to collect data at a subsidiary site. Upon the collection of the first 200 gigabytes, the attorneys insist  that the data must be quickly indexed for detailed, iterative searching in order to better inform the remaining on-site collection effort. However, the collection team left their large ECA appliance they normally use at home as it doesn’t travel well nor would it pass foreign customs. However, in this case there are several options with X1RD. If an eDiscovery software solution is truly a cloud-capable solution, then it can quickly install anywhere, including the IaaS cloud or on available hardware on-site. So the team can either locate available hardware resources with Windows OS or upload the data to a private or public IaaS cloud environment and operate a virtual eDiscovery lab with X1RD.

X1RD can just as easily be installed behind the firewall as in the cloud, but right now, all of our demos and proof of concepts are being performed in the IaaS cloud. But don’t just take our word for it, we would be happy to demonstrate this for you by remotely installing in your public or private IaaS cloud environment and collecting, indexing and searching your data. We are up for the challenge!

> Register for our live webinar on May 2 to see a demo of X1 Rapid Discovery and to hear from eDiscovery expert, Barry Murphy, on his view of the current eDiscovery market, with respect to the cloud.

Leave a comment

Filed under Cloud Data, eDiscovery & Compliance, Enterprise eDiscovery, IaaS