Federal Judge: Custodian Self-Collection of ESI is Unethical and Violates Federal Rules of Civil Procedure

By John Patzakis

In E.E.O.C. v. M1 5100 Corp., (S.D. Fla. July 2, 2020), Federal District Judge Matthewman excoriated defense counsel for allowing the practice of unsupervised custodian ESI self-collection, declaring that the practice “greatly troubles and concerns the court.” In this EEOC age discrimination case, two employees of the defendant corporation were permitted to identify and collect their own ESI in an unsupervised manner. Despite no knowledge of the process the client undertook to gather information (which resulted in only 22 pages of documents produced), counsel signed the responses to the RFP’s in violation of FRCP Rule 26(g), which requires that the attorney have knowledge and supervision of the process utilized to collect data from their client in response to discovery requirements.Gavel and books

This notable quote from the opinion provides a very strong legal statement against the practice of ESI custodian self-collection:

“The relevant rules and case law establish that an attorney has a duty and obligation to have knowledge of, supervise, or counsel the client’s discovery search, collection, and production. It is clear to the Court that an attorney cannot abandon his professional and ethical duties imposed by the applicable rules and case law and permit an interested party or person to ‘self-collect’ discovery without any attorney advice, supervision, or knowledge of the process utilized. There is simply no responsible way that an attorney can effectively make the representations required under Rule 26(g)(1) and yet have no involvement in, or close knowledge of, the party’s search, collection and production of discovery…Abdicating completely the discovery search, collection and production to a layperson or interested client without the client’s attorney having sufficient knowledge of the process, or without the attorney providing necessary advice and assistance, does not meet an attorney’s obligation under our discovery rules and case law. Such conduct is improper and contrary to the Federal Rules of Civil Procedure.”

In his ruling, Judge Matthewman stated that he “will not permit an inadequate discovery search, collection and production of discovery, especially ESI, by any party in this case.” He gave the defendant “one last chance to comply with its discovery search, collection and production obligations.”  He then also ordered “the parties to further confer on or before July 9, 2020, to try to agree on relevant ESI sources, custodians, and search terms, as well as on a proposed ESI protocol.” The Court reserved ruling on monetary and evidentiary sanctions pending the results of Defendants second chance efforts.

A Defensible Yet Streamlined Process Is Optimal

EEOC v. M1 5100, is yet another court decision disallowing custodian self-collection of ESI and underscoring the importance of a well-designed and defensible eDiscovery collection process. At the other end of the spectrum, full disk image collection is another preservation option that, while being defensible, is very costly, burdensome and disruptive to operations. Previously in this blog, I discussed at length the numerous challenges associated with full disk imaging.

The ideal solution is a systemized, uniform and defensible process for ESI collection, which also enables targeted and intelligent data collection in support of proportionality principles. Such a capability is only attainable with the right enterprise technology. With X1 Distributed Discovery (X1DD), parties can perform targeted search and collection of the ESI of hundreds of endpoints over the internal network without disrupting operations. The search results are returned in minutes, not weeks, and thus can be highly granular and iterative, based upon multiple keywords, date ranges, file types, or other parameters. This approach typically reduces the eDiscovery collection and processing costs by at least one order of magnitude (90%), thereby bringing much needed feasibility to enterprise-wide eDiscovery collection that can save organizations millions while improving compliance by maintaining metadata, generating audit logs and establishing chain of custody.

And in line with the Judge’s guidance outlined in EEOC v. M1 5100, X1DD provides a repeatable, verifiable and documented process for the requisite defensibility. For a demonstration or briefing on X1 Distributed Discovery, please contact us.

Leave a comment

Filed under Best Practices, Case Law, collection, eDiscovery, ESI, Uncategorized

Lawson v. Spirit Aerosystems: Federal Court Blasts “Bloated” ESI Collection, Rendered TAR Ineffective

By John Patzakis

Technology Assisted Review (TAR), when correctly employed, can significantly reduce legal review costs with generally more accurate results than other traditional legal review processes. However, the benefits associated with TAR are often undercut by the over-collection and over-inclusion of Electronically Stored Information (ESI) into the TAR process. These challenges played out in spades in the recent decision in Lawson v. Spirit Aerosystems, where a Kansas federal judge issued a detailed ruling outlining the parties’ eDiscovery battles, use of Technology Assisted Review (TAR), and whether further TAR costs should be shifted to the Plaintiff. The ex-CEO of Spirit Aerosystems brought his suit accusing Spirit of unlawfully withholding $50 million in retirement benefits over his alleged violation of a non- compete agreement.

Lessons Learned from New Technology-Assisted Review Case Law ...

The Lawson court outlined two ways in particular how ESI over-collection can detrimentally impact TAR. First, the more data introduced into the process, the higher the cost and burden. Some practitioners believe it is necessary to over-collect and subsequently over-include ESI to allow the TAR process to sort everything out. Many service providers charge by volume, so there can be economic incentives that conflict with what is best for the end-client. In some cases, the significant cost savings realized through TAR are erased by eDiscovery costs associated with overly aggressive ESI inclusion on the front end. Per the judge in Lawson, “the TAR set was unnecessarily voluminous because it consisted of the bloated ESI collection” due to overbroad collection parameters.

The court also outlined how the TAR process is much more effective when the initial set of data has a higher richness (also referred to as “prevalence”) ratio. In other words, the higher the rate of responsive data in the initial data set, the better. It has always been understood that document culling is very important to successful, economical document review, and that includes TAR. As noted by Lawson court, “the ‘richness’ of the dataset…can also be a key driver of TAR expenses. This is because TAR is not as simple as loading the dataset and pushing a magic button to identify the relevant and responsive documents. Rather, the parties must devote the resources (usually a combination of attorneys and contract reviewers) necessary to “educate” or “train” the predictive algorithm, typically through an ongoing process…” According to the courts’ decision, the inefficiencies in the process resulted in an estimated TAR bill of $600,000 involving the review of approximately 200 GBs of data. This is far too expensive for TAR to be feasible as a standard litigation process, and the problems all started with the “bloated” ESI collection.

To be sure, the volume of ESI is growing exponentially and will only continue to do so. The costs associated with collecting, processing, reviewing, and producing documents in litigation are the source of considerable pain for litigants, including the Plaintiff in Lawson, who will, per the courts’ ruling, incur at least a substantial amount of the TAR bill under the cost-shifting order. The only way to reduce that pain to its minimum is to use all tools available in all appropriate circumstances within the bounds of reasonableness and proportionality to control the volumes of data that enter the discovery pipeline, including TAR.

Ideally, an effective and targeted collection capability can enable parties to ultimately process, host, review and produce less ESI.  This capability should enable a pre-collection early case assessment capability (ECA) to foster cooperation and proportionality in discovery by informing the parties early in the process about where relevant ESI is located and what ESI is significant to the case. And with such benefits also comes a much more improved TAR process. X1 Distributed Discovery (X1DD) uniquely fulfills this requirement with its ability to perform pre-collection early case assessment, instead of ECA after the costly, time consuming and disruptive collection phase, thereby providing a game-changing new approach to the traditional eDiscovery model.  X1DD enables enterprises to quickly and easily search across hundreds of distributed endpoints from a central location.  This allows organizations to easily perform unified complex searches across content, metadata, or both and obtain full results in minutes, enabling true pre-collection ECA with live keyword analysis and distributed processing and collection in parallel at the custodian level. To be sure, this dramatically shortens the identification/collection process by weeks if not months, curtails processing and review costs from not over-collecting data, and provides confidence to the legal team with a highly transparent, consistent and systemized process. And now we know of another key benefit of an effective collection and ECA process: much more accurate and feasible technology assisted review.

Leave a comment

Filed under Best Practices, Case Law, Case Study, collection, ECA, eDiscovery, Enterprise eDiscovery, ESI

CCPA and GDPR UPDATE: Unstructured Enterprise Data in Scope of Compliance Requirements

An earlier version of this article appeared on Legaltech News

By John Patzakis

A core requirement of both the GDPR and the similar California Consumer Privacy Act (CCPA), which becomes enforceable on July 1, is the ability to demonstrate and prove that personal data is being protected. This requires information governance capabilities that allow companies to efficiently identify and remediate personal data of EU and California residents. For instance, the UK Information Commissioner’s Office (ICO) provides that “The GDPR places a high expectation on you to provide information in response to a SAR (Subject Access Request). Whilst it may be challenging, you should make extensive efforts to find and retrieve the requested information.”CCPA GDPR

However, recent Gartner research notes that approximately 80% of information stored by companies is “dark data” that is in the form of unstructured, distributed data that can pose significant legal and operational risks. With much of the global workforce now working remotely, this is of special concern and nearly all the company data maintained and utilized by remote employees is in the form of unstructured data. Unstructured enterprise data generally refers to searchable data such as emails, spreadsheets and documents on laptops, file servers, and social media.

The GDPR

An organization’s GDPR compliance efforts need to address any personal data contained within unstructured electronic data throughout the enterprise, as well as the structured data found in CRM, ERP and various centralized records management systems. Personal data is defined in the GDPR as: “any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”

Under the GDPR, there is no distinction between structured versus unstructured electronic data in terms of the regulation’s scope. There is a separate guidance regarding “structured” paper records (more on that below). The key consideration is whether a data controller or processor has control over personal data, regardless of where it is located in the organization. Nonetheless, there is some confusion about the scope of the GDPR’s coverage across structured as well as unstructured electronic data systems.

The UK ICO is a key government regulator that interprets and enforces the GDPR, and has recently issued important draft guidance on the scope of GDPR data subject access rights, including as it relates to unstructured electronic information. Notably, the ICO notes that large data sets, including data analytics outputs and unstructured data volumes, “could make it more difficult for you to meet your obligations under the right of access. However, these are not classed as exemptions, and are not excuses for you to disregard those obligations.”

Additionally the ICO guidance advises that “emails stored on your computer are a form of electronic record to which the general principles (under the GDPR) apply.” In fact, the ICO notes that home computers and personal email accounts of employees are subject to GDPR if they contain personal data originating from the employers networks or processing activities. This is especially notable under the new normal of social distancing, where much of a company’s data (and associated personal information) is being stored on remote employee laptops.

The ICO also provides guidance on several related subjects that shed light on its stance regarding unstructured data:

Archived Data: According to the ICO, data stored in electronic archives is generally subject to the GDPR, noting that there is no “technology exemption” from the right of access. Enterprises “should have procedures in place to find and retrieve personal data that has been electronically archived or backed up.” Further, enterprises “should use the same effort to find information to respond to a SAR as you would to find archived or backed-up data for your own purposes.”

Deleted Data: The ICO’s view on deleted data is that it is generally within the scope of GDPR compliance, provided that there is no intent to, or a systematic ability to readily recover that data. The ICO says it “will not seek to take enforcement action against an organisation that has failed to use extreme measures to recreate previously ‘deleted’ personal data held in electronic form. We do not require organisations to use time and effort reconstituting information that they have deleted as part of their general records management.”

However, under this guidance organizations that invest in and deploy re-purposed computer forensic tools that feature automated un-delete capabilities may be held to a higher standard. Deploying such systems can reflect intent to as well as having the systematic technical ability to recover deleted data.

Paper Records: Paper records that are part of a “structured filing system” are subject to the GDPR. Specifically, if an enterprise holds “information about the requester in non-electronic form (e.g. in paper files or on microfiche records)” then such hard-copy records are considered personal data accessible via the right of access,” if such records are “held in a ‘filing system.” This segment of the guidance reflects that references to “unstructured data” in European parlance usually pertains to paper records. The ICO notes in separate guidance that “the manual processing of unstructured personal data, such as unfiled handwritten notes on paper” are outside the scope of GDPR.

GDPR Article 4 defines a “filing system” as meaning “any structured set of personal data which are accessible according to specific criteria, whether centralized, decentralized or dispersed on a functional or geographical basis.” The only form of “unstructured data” that would not be subject to GDPR would be unfiled paper records like handwritten notes or legacy microfiche.

The CCPA  

The California Attorney General (AG) released a second and presumably final round of draft regulations under the California Consumer Privacy Act (CCPA) that reflect how unstructured electronic data will be treated under the Act. The proposed rules outline how the California AG is interpreting and will be enforcing the CCPA. Under § 999.313(d)(2), data from archived or backup systems are—unlike the GDPR—exempt from the CCPA’s scope, unless those archives are restored and become active. Additional guidance from the Attorney General states: “Allowing businesses to delete the consumer’s personal information on archived or backup systems at the time that they are accessed or used balances the interests of consumers with the potentially burdensome costs of deleting information from backup systems that may never be utilized.”

What is very notable is that the only technical exception to the CCPA is unrestored archived and back-up data. Like the GDPR, there is no distinction between unstructured and structured electronic data. In the first round of public comments, an insurance industry lobbying group argued that unstructured data be exempted from the CCPA. As reflected by revised guidance, that suggestion was rejected by the California AG.

For the GDPR, the UK ICO correctly advises that enterprises “should ensure that your information management systems are well-designed and maintained, so you can efficiently locate and extract information requested by the data subjects whose personal data you process and redact third party data where it is deemed necessary.” This is why Forrester Research notes that “Data Discovery and Classification are the foundation for GDPR compliance.”

Establish and Enforce Data Privacy Policies

So to achieve GDPR and CCPA compliance, organizations must first ensure that explicit policies and procedures are in place for handling personal information. Once established, it is important to demonstrate to regulators that such policies and procedures are being followed and operationally enforced. A key first step is to establish a data map of where and how personal data is stored in the enterprise. This exercise is actually required under the GDPR Article 30 documentation provisions.

An operational data audit and discovery capability across unstructured data sources allows enterprises to efficiently map, identify, and remediate personal information in order to respond to regulators and data subject access requests from EU and California citizens. This capability must be able to search and report across several thousand endpoints and other unstructured data sources, and return results within minutes instead of weeks or months as is the case with traditional crawling tools. This includes laptops of employees working from home.

These processes and capabilities are not only required for data privacy compliance but are also needed for broader information governance and security requirements, anti-fraud compliance, and e-discovery.

Implementing these measures proactively, with routine and consistent enforcement using solutions such as X1 Distributed GRC, will go a long way to mitigate risk, respond efficiently to data subject access requests, and improve overall operational effectiveness through such overall information governance improvements.

Leave a comment

Filed under CaCPA, compliance, Corporations, Cyber security, Cybersecurity, Data Audit, GDPR, Information Governance, Information Management, Uncategorized

How to Implement an Effective eDiscovery Search Term Strategy

By Mandi Ross and John Patzakis

A key Federal Rules of Civil Procedure provision that greatly impacts eDiscovery processes is Rule 26(f), which requires the parties’ counsel to “meet and confer” in advance of the pre-trial scheduling conference on key discovery matters, including the preservation, disclosure and exchange of potentially relevant electronically stored information (ESI). With the risks and costs associated with eDiscovery, this early meeting of counsel is a critically important means to manage and control the cost of eDiscovery, and to ensure relevant ESI is preserved.

A very good authority on the Rule 26(f) eDiscovery conference is the “Suggested Protocol for Discovery of Electronically Stored Information,” provided by then Magistrate Judge Paul W. Grimm and his joint bar-court committee. Under Section 8 of the Model Protocol, the topics to be discussed at the Rule 26(f) conference include: “Search methodologies for retrieving or reviewing ESI such as identification of the systems to be searched;” “the use of key word searches, with an agreement on the words or terms to be searched;” “limitations on the time frame of ESI to be searched;” and “limitations on the fields or document types to be searched.”x1-collection-img

Optimizing the process of developing keyword searches, however, is no easy task, especially without the right technology and expertise. The typical approach of brainstorming a list of terms that may be relevant and running the search on a dataset to be reviewed results in a wide range of inefficiencies. Negotiations over proper usage of search terms may become onerous and contentious. Judges are often tasked with making determinations regarding the aptness of the methodology, and many are reluctant to do so. Thus, the use of outside expertise leveraging indexing in place technology is beneficial in building an effective and comprehensive search term strategy.

The courts agree. In Victor Stanley v. Creative Pipe, U.S. District Court Judge Paul Grimm explains, “Selection of the appropriate search and information retrieval technique requires careful advance planning by persons qualified to design effective search methodology.”

Building a sound search strategy is akin to constructing a building. First, lay the foundation with a clear understanding of the claims and defenses of the case and the types of documents that will support a legal strategy. Once a solid foundation is built, the structure of language, logical expressions, and metadata are blended as necessary to create the appropriate set of robust Boolean searches. These searches then target the retrieval of responsive documents, and consistently achieve a staggering 80 percent reduction in data volumes to be reviewed.

It’s quite simple. If a document does not contain the defined language, then the document is unlikely to be relevant. The best way to find the language specific to the claims and defenses is to create a linguistic narrative of the case. This not only helps construct a roadmap for a comprehensive strategy designed to reduce the volume of data, it also creates a thorough categorization system for organization and prioritization of review. The approach is straightforward, flexible, and adaptive to client objectives, whether during early case assessment, linear or technology-assisted review, or anything in between.

The narrative search approach includes the following steps:

  1. Issue Analysis: Create an unambiguous definition of each issue that characterizes the claims being made and the defenses being offered.
  2. Logical Expression Definition: Define the specific expressions that encapsulate each issue. There may be multiple expressions required to convey the full meaning of the issue.
  3. Component Identification and Expansion: Distill each logical expression into specific components. These components form the basis for the expansion effort, which is the identification of words that convey the same conceptual meaning (synonyms).
  4. Search Strategies: Determine the appropriate parameters to be used for proximity, as well as developing a strategy for searching non-standard, structured data, such as spreadsheets, non-text, or database files.
  5. Test Precision and Recall: In tandem with the case team, review small sample sets to refine the logical expression statements to improve precision and recall.

The effectuation of this process requires the right technology that enables its application in real time. The ability to index data in place is a game changer, as it provides legal teams early insight into the data and validates search term sampling and testing instantly, without first requiring data collection. This is in contrast to the outdated, costly, and time-consuming process involving manual data collection and subsequent migration into a physical eDiscovery processing platform. The latter process negates counsel’s ability to conduct any meaningful application of search term proportionality, without first incurring significant expense and loss of time.

X1 Distributed Discovery enables enterprises to quickly and easily search across thousands of distributed endpoints from a central location. This allows organizations to easily perform unified complex searches across content, metadata, or both, and obtain full results in minutes, enabling true pre-collection search analytics with live keyword analysis and distributed processing and collection in parallel at the custodian level. This dramatically shortens the identification/collection process by weeks if not months, curtails processing and review costs by not over-collecting data, and provides confidence to the legal team with a highly transparent, consistent and systemized process.

Led by an experienced consulting team that leverages cutting-edge technology, this innovative narrative methodology, created by the experts at Prism Litigation Technology, enriches common search terms by adding layers of linguistic and data science expertise to create a fully defensible, transparent, and cogent approach to eDiscovery. For more on this workflow, please see the white paper: Don’t Stop Believin’: The Staying Power of Search Term Optimization.


Mandi Ross is the CEO of Prism Litigation Technology (www.prismlit.com)

John Patzakis is Chief Legal Officer and Executive Chairman at X1 (www.X1.com)

Leave a comment

Filed under Best Practices, ECA, eDiscovery, Enterprise eDiscovery, ESI, Preservation & Collection