Category Archives: ESI

Federal Judge: Custodian Self-Collection of ESI is Unethical and Violates Federal Rules of Civil Procedure

By John Patzakis

In E.E.O.C. v. M1 5100 Corp., (S.D. Fla. July 2, 2020), Federal District Judge Matthewman excoriated defense counsel for allowing the practice of unsupervised custodian ESI self-collection, declaring that the practice “greatly troubles and concerns the court.” In this EEOC age discrimination case, two employees of the defendant corporation were permitted to identify and collect their own ESI in an unsupervised manner. Despite no knowledge of the process the client undertook to gather information (which resulted in only 22 pages of documents produced), counsel signed the responses to the RFP’s in violation of FRCP Rule 26(g), which requires that the attorney have knowledge and supervision of the process utilized to collect data from their client in response to discovery requirements.Gavel and books

This notable quote from the opinion provides a very strong legal statement against the practice of ESI custodian self-collection:

“The relevant rules and case law establish that an attorney has a duty and obligation to have knowledge of, supervise, or counsel the client’s discovery search, collection, and production. It is clear to the Court that an attorney cannot abandon his professional and ethical duties imposed by the applicable rules and case law and permit an interested party or person to ‘self-collect’ discovery without any attorney advice, supervision, or knowledge of the process utilized. There is simply no responsible way that an attorney can effectively make the representations required under Rule 26(g)(1) and yet have no involvement in, or close knowledge of, the party’s search, collection and production of discovery…Abdicating completely the discovery search, collection and production to a layperson or interested client without the client’s attorney having sufficient knowledge of the process, or without the attorney providing necessary advice and assistance, does not meet an attorney’s obligation under our discovery rules and case law. Such conduct is improper and contrary to the Federal Rules of Civil Procedure.”

In his ruling, Judge Matthewman stated that he “will not permit an inadequate discovery search, collection and production of discovery, especially ESI, by any party in this case.” He gave the defendant “one last chance to comply with its discovery search, collection and production obligations.”  He then also ordered “the parties to further confer on or before July 9, 2020, to try to agree on relevant ESI sources, custodians, and search terms, as well as on a proposed ESI protocol.” The Court reserved ruling on monetary and evidentiary sanctions pending the results of Defendants second chance efforts.

A Defensible Yet Streamlined Process Is Optimal

EEOC v. M1 5100, is yet another court decision disallowing custodian self-collection of ESI and underscoring the importance of a well-designed and defensible eDiscovery collection process. At the other end of the spectrum, full disk image collection is another preservation option that, while being defensible, is very costly, burdensome and disruptive to operations. Previously in this blog, I discussed at length the numerous challenges associated with full disk imaging.

The ideal solution is a systemized, uniform and defensible process for ESI collection, which also enables targeted and intelligent data collection in support of proportionality principles. Such a capability is only attainable with the right enterprise technology. With X1 Distributed Discovery (X1DD), parties can perform targeted search and collection of the ESI of hundreds of endpoints over the internal network without disrupting operations. The search results are returned in minutes, not weeks, and thus can be highly granular and iterative, based upon multiple keywords, date ranges, file types, or other parameters. This approach typically reduces the eDiscovery collection and processing costs by at least one order of magnitude (90%), thereby bringing much needed feasibility to enterprise-wide eDiscovery collection that can save organizations millions while improving compliance by maintaining metadata, generating audit logs and establishing chain of custody.

And in line with the Judge’s guidance outlined in EEOC v. M1 5100, X1DD provides a repeatable, verifiable and documented process for the requisite defensibility. For a demonstration or briefing on X1 Distributed Discovery, please contact us.

Leave a comment

Filed under Best Practices, Case Law, collection, eDiscovery, ESI, Uncategorized

Lawson v. Spirit Aerosystems: Federal Court Blasts “Bloated” ESI Collection, Rendered TAR Ineffective

By John Patzakis

Technology Assisted Review (TAR), when correctly employed, can significantly reduce legal review costs with generally more accurate results than other traditional legal review processes. However, the benefits associated with TAR are often undercut by the over-collection and over-inclusion of Electronically Stored Information (ESI) into the TAR process. These challenges played out in spades in the recent decision in Lawson v. Spirit Aerosystems, where a Kansas federal judge issued a detailed ruling outlining the parties’ eDiscovery battles, use of Technology Assisted Review (TAR), and whether further TAR costs should be shifted to the Plaintiff. The ex-CEO of Spirit Aerosystems brought his suit accusing Spirit of unlawfully withholding $50 million in retirement benefits over his alleged violation of a non- compete agreement.

Lessons Learned from New Technology-Assisted Review Case Law ...

The Lawson court outlined two ways in particular how ESI over-collection can detrimentally impact TAR. First, the more data introduced into the process, the higher the cost and burden. Some practitioners believe it is necessary to over-collect and subsequently over-include ESI to allow the TAR process to sort everything out. Many service providers charge by volume, so there can be economic incentives that conflict with what is best for the end-client. In some cases, the significant cost savings realized through TAR are erased by eDiscovery costs associated with overly aggressive ESI inclusion on the front end. Per the judge in Lawson, “the TAR set was unnecessarily voluminous because it consisted of the bloated ESI collection” due to overbroad collection parameters.

The court also outlined how the TAR process is much more effective when the initial set of data has a higher richness (also referred to as “prevalence”) ratio. In other words, the higher the rate of responsive data in the initial data set, the better. It has always been understood that document culling is very important to successful, economical document review, and that includes TAR. As noted by Lawson court, “the ‘richness’ of the dataset…can also be a key driver of TAR expenses. This is because TAR is not as simple as loading the dataset and pushing a magic button to identify the relevant and responsive documents. Rather, the parties must devote the resources (usually a combination of attorneys and contract reviewers) necessary to “educate” or “train” the predictive algorithm, typically through an ongoing process…” According to the courts’ decision, the inefficiencies in the process resulted in an estimated TAR bill of $600,000 involving the review of approximately 200 GBs of data. This is far too expensive for TAR to be feasible as a standard litigation process, and the problems all started with the “bloated” ESI collection.

To be sure, the volume of ESI is growing exponentially and will only continue to do so. The costs associated with collecting, processing, reviewing, and producing documents in litigation are the source of considerable pain for litigants, including the Plaintiff in Lawson, who will, per the courts’ ruling, incur at least a substantial amount of the TAR bill under the cost-shifting order. The only way to reduce that pain to its minimum is to use all tools available in all appropriate circumstances within the bounds of reasonableness and proportionality to control the volumes of data that enter the discovery pipeline, including TAR.

Ideally, an effective and targeted collection capability can enable parties to ultimately process, host, review and produce less ESI.  This capability should enable a pre-collection early case assessment capability (ECA) to foster cooperation and proportionality in discovery by informing the parties early in the process about where relevant ESI is located and what ESI is significant to the case. And with such benefits also comes a much more improved TAR process. X1 Distributed Discovery (X1DD) uniquely fulfills this requirement with its ability to perform pre-collection early case assessment, instead of ECA after the costly, time consuming and disruptive collection phase, thereby providing a game-changing new approach to the traditional eDiscovery model.  X1DD enables enterprises to quickly and easily search across hundreds of distributed endpoints from a central location.  This allows organizations to easily perform unified complex searches across content, metadata, or both and obtain full results in minutes, enabling true pre-collection ECA with live keyword analysis and distributed processing and collection in parallel at the custodian level. To be sure, this dramatically shortens the identification/collection process by weeks if not months, curtails processing and review costs from not over-collecting data, and provides confidence to the legal team with a highly transparent, consistent and systemized process. And now we know of another key benefit of an effective collection and ECA process: much more accurate and feasible technology assisted review.

Leave a comment

Filed under Best Practices, Case Law, Case Study, collection, ECA, eDiscovery, Enterprise eDiscovery, ESI

How to Implement an Effective eDiscovery Search Term Strategy

By Mandi Ross and John Patzakis

A key Federal Rules of Civil Procedure provision that greatly impacts eDiscovery processes is Rule 26(f), which requires the parties’ counsel to “meet and confer” in advance of the pre-trial scheduling conference on key discovery matters, including the preservation, disclosure and exchange of potentially relevant electronically stored information (ESI). With the risks and costs associated with eDiscovery, this early meeting of counsel is a critically important means to manage and control the cost of eDiscovery, and to ensure relevant ESI is preserved.

A very good authority on the Rule 26(f) eDiscovery conference is the “Suggested Protocol for Discovery of Electronically Stored Information,” provided by then Magistrate Judge Paul W. Grimm and his joint bar-court committee. Under Section 8 of the Model Protocol, the topics to be discussed at the Rule 26(f) conference include: “Search methodologies for retrieving or reviewing ESI such as identification of the systems to be searched;” “the use of key word searches, with an agreement on the words or terms to be searched;” “limitations on the time frame of ESI to be searched;” and “limitations on the fields or document types to be searched.”x1-collection-img

Optimizing the process of developing keyword searches, however, is no easy task, especially without the right technology and expertise. The typical approach of brainstorming a list of terms that may be relevant and running the search on a dataset to be reviewed results in a wide range of inefficiencies. Negotiations over proper usage of search terms may become onerous and contentious. Judges are often tasked with making determinations regarding the aptness of the methodology, and many are reluctant to do so. Thus, the use of outside expertise leveraging indexing in place technology is beneficial in building an effective and comprehensive search term strategy.

The courts agree. In Victor Stanley v. Creative Pipe, U.S. District Court Judge Paul Grimm explains, “Selection of the appropriate search and information retrieval technique requires careful advance planning by persons qualified to design effective search methodology.”

Building a sound search strategy is akin to constructing a building. First, lay the foundation with a clear understanding of the claims and defenses of the case and the types of documents that will support a legal strategy. Once a solid foundation is built, the structure of language, logical expressions, and metadata are blended as necessary to create the appropriate set of robust Boolean searches. These searches then target the retrieval of responsive documents, and consistently achieve a staggering 80 percent reduction in data volumes to be reviewed.

It’s quite simple. If a document does not contain the defined language, then the document is unlikely to be relevant. The best way to find the language specific to the claims and defenses is to create a linguistic narrative of the case. This not only helps construct a roadmap for a comprehensive strategy designed to reduce the volume of data, it also creates a thorough categorization system for organization and prioritization of review. The approach is straightforward, flexible, and adaptive to client objectives, whether during early case assessment, linear or technology-assisted review, or anything in between.

The narrative search approach includes the following steps:

  1. Issue Analysis: Create an unambiguous definition of each issue that characterizes the claims being made and the defenses being offered.
  2. Logical Expression Definition: Define the specific expressions that encapsulate each issue. There may be multiple expressions required to convey the full meaning of the issue.
  3. Component Identification and Expansion: Distill each logical expression into specific components. These components form the basis for the expansion effort, which is the identification of words that convey the same conceptual meaning (synonyms).
  4. Search Strategies: Determine the appropriate parameters to be used for proximity, as well as developing a strategy for searching non-standard, structured data, such as spreadsheets, non-text, or database files.
  5. Test Precision and Recall: In tandem with the case team, review small sample sets to refine the logical expression statements to improve precision and recall.

The effectuation of this process requires the right technology that enables its application in real time. The ability to index data in place is a game changer, as it provides legal teams early insight into the data and validates search term sampling and testing instantly, without first requiring data collection. This is in contrast to the outdated, costly, and time-consuming process involving manual data collection and subsequent migration into a physical eDiscovery processing platform. The latter process negates counsel’s ability to conduct any meaningful application of search term proportionality, without first incurring significant expense and loss of time.

X1 Distributed Discovery enables enterprises to quickly and easily search across thousands of distributed endpoints from a central location. This allows organizations to easily perform unified complex searches across content, metadata, or both, and obtain full results in minutes, enabling true pre-collection search analytics with live keyword analysis and distributed processing and collection in parallel at the custodian level. This dramatically shortens the identification/collection process by weeks if not months, curtails processing and review costs by not over-collecting data, and provides confidence to the legal team with a highly transparent, consistent and systemized process.

Led by an experienced consulting team that leverages cutting-edge technology, this innovative narrative methodology, created by the experts at Prism Litigation Technology, enriches common search terms by adding layers of linguistic and data science expertise to create a fully defensible, transparent, and cogent approach to eDiscovery. For more on this workflow, please see the white paper: Don’t Stop Believin’: The Staying Power of Search Term Optimization.


Mandi Ross is the CEO of Prism Litigation Technology (www.prismlit.com)

John Patzakis is Chief Legal Officer and Executive Chairman at X1 (www.X1.com)

Leave a comment

Filed under Best Practices, ECA, eDiscovery, Enterprise eDiscovery, ESI, Preservation & Collection

Remote Collection: The Apple Pay of eDiscovery in a COVID-19 World

By: Craig Carpenter

I often continue doing things just because that’s the way I’ve always done them.  There is a level of comfort that comes from familiarity, and to be honest as I age I realize I can get more set in my ways (as my children often tell me), eschewing new ways of doing things – even if they are quicker or more efficient.  Sometimes it takes a major disruption to force change, as the eDiscovery market saw with accelerated adoption of Predictive Coding in the wake of the Great Recession.  This is true in many industries, including consumer products: witness the accelerated adoption of “contactless payment” like Apple Pay during the COVID-19 pandemic.  It has been available for years, but adopted mainly by younger generations while us old folks clung to credit cards and, in some cases, cash (gasp!).  But COVID-19 has changed this dynamic for many, myself included, as the prospect of touching a credit card machine is now unacceptable.  Whereas using Apple Pay was a ‘nice-to-have’ before COVID-19, it has become a ‘must-have’ now.  This type of resistance to change is arguably even more commonplace in the legal world, where convention and comfort often reign supreme.  How we have been conducting eDiscovery collection for years is a perfect example of clinging to outdated methods – but with the advent of COVID-19, this too is about to change for good.

Collection of digital evidence in legal proceedings was an implicit requirement under the Federal Rules of Civil Procedure (FRCP) long before it was codified explicitly in the 2006 amendments with the addition of Electronically Stored Information (ESI) under amended Rule 34(a)) as a “new” category.  I distinctly remember conducting discovery in 1998 and 1999 as a 3rd year law student and then 1st year associate for a Bay Area law firm: it was the proverbial “banker box” process, with all discovery in paper form.  In those days, even email messages and Word Perfect documents were simply printed out to be Bates stamped and reviewed in hard copy by hand.  Document review has always been tedious, but at least back then the volumes were significantly lower than they are these days.

During this timeframe, however, email and the dissemination of ever-greater volumes of electronic information it facilitated was exploding.  This, of course, meant that evidence (in the forensic context) and relevant information for eDiscovery was increasingly digital in nature.  So when discovery practitioners went looking for tools to help them preserve and collect digital information, where did they turn?  To the forensic world, of course, as the more stringent requirements and processes of criminal proceedings and evidence necessitated the development of such tools earlier than had been needed in civil discovery.  And if a tool was good enough for criminal proceedings, it should be plenty good enough for those in the civil world.  Thus, forensic tools like Guidance’s Encase® and AccessData’s FTK® which were built for law enforcement crossed over into the civil world.

However, the needs of the data collection process for civil discovery were and remain quite different from those of the criminal world:

  • On average civil discovery involves far more “custodians” (owners or stewards of information) than criminal proceedings, e.g. 5-15 custodians in civil matters vs. 1, maybe 2, in criminal
  • Whereas a typical criminal proceeding focuses on the communication media of one or occasionally a few alleged perpetrators (i.e. their cell phone, laptop, social media), civil discovery is typically significantly broader given the greater number of corporation applications and data repositories, including corporate email, file shares, ‘loose files’ (e.g. Word or Excel documents only stored locally), cloud storage repositories like Dropbox or Google Vault
  • Due to the larger number of custodians and typically broader data types to be searched, the volume of information in civil discovery is usually significantly greater than in a criminal proceeding
  • In handling criminal evidence there is a presumption that the alleged perpetrator may have tried to hide, alter or destroy evidence; absent very unusual circumstances, no such presumption exists in civil discovery
  • While confiscation of devices (laptops, desktops, cell phones, records) is the standard in criminal proceedings, the opposite is true in civil discovery. Custodians need their devices so they can do their jobs
  • Collection of evidence in criminal proceedings is handled by law enforcement (e.g. upon arrest or as part of a ‘dawn raid’ type of event), while the parties themselves conduct civil discovery (as a business process typically handled by legal or outsourced to service providers)

These differences were insignificant when data volumes were small and the data was relatively easy to get to, as was the case for many years.  And as the first technology on the market, forensic tools and vendors did a great job of building and defending their incumbency, through certifications, “court-cited workflows” and knowledge bases widely advertising their deep expertise in forensic collection as practiced by a cadre of forensic examiners leveraging their technical abilities into lucrative careers – thereby creating a significant barrier to entry for non-forensic eDiscovery collection tools and practitioners.

In spite of this strong incumbency, almost all corporate legal departments have long wanted a better approach to collection than forensic tools offered; many of their outside counsel have felt similarly.  They have long felt collection using forensic tools and workflows were and remain deeply flawed for eDiscovery in a number of ways:

  • Chronic overcollection: as forensic tools were built to capture all information, including things like slack space which can be important in criminal proceedings but are almost never even in scope in civil matters, the volume of data collected is far greater than needed. While service providers charging hourly professional services time and monthly per-GB hosting fees may not mind, for clients paying to collect/filter/host/review/produce knowingly unnecessary data this makes no sense and adds significant cost to the entire process, each and every time
  • Weeks or months-long process: because forensic tools must process data on a server before searching or culling it, they require physical access to a device (e.g. via a USB port). There is an option to copy entire drives with GBs of data through a VPN connection, but this approach has never worked well, if at all.  Given the coordination needed to gain physical access to devices which may be located in myriad different cities or countries, as well as the need to complete collection before paring down or even searching of data can begin, what should take hours or days instead takes weeks if not months
  • Highly disruptive: as each forensic image is being taken of each laptop or desktop, the user of each such machine must stop whatever they are doing and surrender their machine to the forensic staff for a day or more. Even if there is a spare laptop available, it will often have none of their ‘stuff’ on it.  Needless to say, this highly intrusive process makes each such worker far less productive and is very disruptive
  • “Recreating the wheel” every time: when the next matter arrives, can forensic examiners simply use the data from the last collection? Unfortunately, no, as each custodian has presumably created and received new data, necessitating the whole process from before be repeated.  Forensic collection quite literally recreates the wheel with every collection

By contrast, remote collection is designed specifically for civil eDiscovery.  It is built for a distributed workforce and requires no physical access to any devices.  A small software agent is installed on each device which creates its own local index; legal staff can then simply search this index for whatever ESI they want to find.  This distributed architecture facilitates ‘Pre-Case Assessment’, where search terms are sampled on data in-place, before any ESI is collected.  This turns the forensic collection workflow on its head, as analysis can be done from the very beginning of the preservation/collection process, allowing lawyers to gain insight far earlier in any proceeding and supporting a surgical collection process, leading to far lower data volumes (and therefore much lower eDiscovery costs).  And because remote collection can be an entirely cloud-based process, no hardware or specialized staff is required – in fact, collections can be done without IT ever being involved.

Why hasn’t the industry adopted remote collection before now?  Because everyone involved in the process except the client was benefited from it: forensic experts, service providers and forensic technology providers.  They had a strong incentive to keep things as they had always been, to the client’s detriment.  In a COVID-19 world, however, even these groups must change their workflows as physical access to devices has not only fallen out of favor – it is now impossible and perhaps even dangerous.  What remote employee would want a stranger to come to their home and take their laptop for hours?  That scenario is simply no longer an option.  Similarly to how touching a point-of-sale machine went from a minor inconvenience to a wildly irresponsible and even dangerous activity when Apple Pay is a far better approach, forensic collection in eDiscovery is in the process of giving way to remote collection.  Clients will be much better off for it.

Leave a comment

Filed under Best Practices, collection, Corporations, ESI, Information Access, Preservation & Collection, Uncategorized

Remote ESI Collection and Data Audits in the Time of Social Distancing

By John Patzakis

The vital global effort to contain the COVID-19 pandemic will likely disrupt our lives and workflows for some time. While our personal and business lives will hopefully return to normal soon, the trend of an increasingly remote and distributed workforce is here to stay. This “new normal” will necessitate relying on the latest technology and updated workflows to comply with legal, privacy, and information governance requirements.

From an eDiscovery perspective, the legacy manual collection workflow involving travel, physical access and one-time mass collection of custodian laptops, file servers and email accounts is a non-starter under current travel ban and social distancing policies, and does not scale for the new era of remote and distributed workforces going forward. In addition to the public health constraints, manual collection efforts are expensive, disruptive and time-consuming as many times an “overkill” method of forensic image collection process is employed, thus substantially driving up eDiscovery costs.

When it comes to technical approaches, endpoint forensic crawling methods are now a non-starter. Network bandwidth constraints coupled with the requirement to migrate all endpoint data back to the forensic crawling tool renders the approach ineffective, especially with remote workers needing to VPN into a corporate network.  Right now, corporate network bandwidth is at a premium, and the last thing a company needs is their network shut down by inefficient remote forensic tools.

For example, with a forensic crawling tool, to search a custodian’s laptop with 10 gigabytes of email and documents, all 10 gigabytes must be copied and transmitted over the network, where it is then searched, all of which takes at least several hours per computer. So, most organizations choose to force collect all 10 gigabytes. The case of U.S. ex rel. McBride v. Halliburton Co.  272 F.R.D. 235 (2011), Illustrates this specific pain point well. In McBride, Magistrate Judge John Facciola’s instructive opinion outlines Halliburton’s eDiscovery struggles to collect and process data from remote locations:

“Since the defendants employ persons overseas, this data collection may have to be shipped to the United States, or sent by network connections with finite capacity, which may require several days just to copy and transmit the data from a single custodian . . . (Halliburton) estimates that each custodian averages 15–20 gigabytes of data, and collection can take two to ten days per custodian. The data must then be processed to be rendered searchable by the review tool being used, a process that can overwhelm the computer’s capacity and require that the data be processed by batch, as opposed to all at once.”

Halliburton represented to the court that they spent hundreds of thousands of dollars on eDiscovery for only a few dozen remotely located custodians. The need to force-collect the remote custodians’ entire set of data and then sort it out through the expensive eDiscovery processing phase, instead of culling, filtering and searching the data at the point of collection drove up the costs.

Solving this collection challenge is X1 Distributed Discovery, which is specially designed to address the challenges presented by remote and distributed workforces.  X1 Distributed Discovery (X1DD) enables enterprises to quickly and easily search across up to thousands of distributed endpoints and data servers from a central location.  Legal and compliance teams can easily perform unified complex searches across both unstructured content and metadata, obtaining statistical insight into the data in minutes, and full results with completed collection in hours, instead of days or weeks. The key to X1’s scalability is its unique ability to index and search data in place, thereby enabling a highly detailed and iterative search and analysis, and then only collecting data responsive to those steps. blog-relativity-collect-v3

X1DD operates on-demand where your data currently resides — on desktops, laptops, servers, or even the cloud — without disruption to business operations and without requiring extensive or complex hardware configurations. After indexing of systems has completed (typically a few hours to a day depending on data volumes), clients and their outside counsel or service provider may then:

  • Conduct Boolean and keyword searches of relevant custodial data sources for ESI, returning search results within minutes by custodian, file type and location.
  • Preview any document in-place, before collection, including any or all documents with search hits.
  • Remotely collect and export responsive ESI from each system directly into a Relativity® or RelativityOne® workspace for processing, analysis and review or any other processing or review platform via standard load file. Export text and metadata only or full native files.
  • Export responsive ESI directly into other analytics engines, e.g. Brainspace®, H5® or any other platform that accepts a standard load file.
  • Conduct iterative “search/analyze/export-into-Relativity” processes as frequently and as many times as desired.

To learn more about this capability purpose-built for remote eDiscovery collection and data audits, please contact us.

Leave a comment

Filed under Best Practices, Case Law, Case Study, ECA, eDiscovery, eDiscovery & Compliance, Enterprise eDiscovery, ESI, Information Governance, Preservation & Collection, Relativity