Category Archives: eDiscovery & Compliance

Case Law Update: Federal Court Endorses Targeted Search Term Based ESI Collection

By John Patzakis

A recent decision from the Southern District of New York provides that the parties’ have obligations to conduct reasonable searches during discovery, but such searches may be targeted. The court invoked the proportionality concepts within the Federal Rules of Civil Procedure, which govern the production of Electronically Stored Information (“ESI”). In Raine Grp. v. Reign Capital, (S.D.N.Y. Feb. 22, 2022), the plaintiff, “a merchant bank with over 100 employees,” sued defendant “Reign Capital LLC, a two-person real estate development and management firm, for trademark infringement and unfair competition based on Defendant’s” name. After unsuccessful meet and confer efforts to negotiate an ESI protocol, the Court ruled on two key issues in dispute—the scope of the plaintiff’s search and collection obligations and the formulation of certain search terms.

The court, in its written decision, first articulated a party’s general obligations under the Federal Rules of Civil Procedure, noting that Federal Rules of Civil Procedure 26 and 34 “require parties to conduct a reasonable search for documents that are relevant to the claims and defenses.” The court further noted that under Rule 26(a), “Parties have an affirmative obligation to search for documents which they may use to support their claims or defenses.” In meeting these obligations, the court provided that a producing party may utilize search methodologies, specifically mentioning search terms. The court observed that, “in this instance, the producing party must include and utilize search terms it believes are needed to fulfill its obligations under Rule 26 in addition to considering additional search terms requested by the requesting party.” The court—in addressing the concept of reasonable, proportional discovery under the Rules—continued: “In other words, the producing party must search custodians and locations it identifies on its own as sources for relevant information as part of its obligations under Rules 26 and 34.” Importantly, the court noted that “an ESI protocol and search terms work in tandem with the parties’ obligations under the Federal Rules…”

Additionally, the court advised the plaintiff to search not only the relevant custodians’ direct data sources, but also “other sources of data such as shared drives that are not particular to a specific custodian that should be searched as part of Plaintiffs’ obligations under Rule 26. Plaintiff is expected to conduct a reasonable search of such non-custodian sources likely to have relevant information.” The court here is making an important point about shared network drives, and that the parties have a duty to search them for relevant information. We have previously blogged about the importance of network file shares and how to effectively conduct eDiscovery on those critical data sources.

In regard to the formation of search terms, the court, explained that “[s]earch terms, while helpful, must be carefully crafted. Poorly crafted terms may return thousands of irrelevant documents and increase, rather than minimize the burden of locating relevant and responsive ESI. They also can miss documents containing a word that has the same meaning or that is misspelled.” The court further correctly advised that overly broad search terms “are typically not sufficiently targeted to find relevant documents. Modifiers are often needed to hone in on truly relevant documents.” This decision is very important as the court endorses the concept of utilizing highly targeted search terms and other parameters to defensibly collect and preserve potentially relevant ESI.

Additionally, this decision illustrates the necessity of an iterative, in-place search and collection process. None of the cost-saving, targeted collection efforts outlined by the court can be realized without an operational capability to effectuate them. Ideally, the producing party can employ a defensible, targeted, and iterative search and collection process in place, prior to collection to effectuate the proportional discovery process approved by the court in this decision. However, without such a capability, the alternative is an expensive, over-collection effort, where the data is searched post collection. Enabling the search iteration and targeted collection upstream brings dramatic cost savings, risk reduction, and other process efficiencies.

Leave a comment

Filed under Best Practices, Case Law, eDiscovery & Compliance, Enterprise Search, Preservation & Collection

ILTA eDiscovery Survey Highlights Targeted ESI Collection as the Preferred Methodology

By John Patzakis


The International Legal Technology Association (ILTA) recently published a very informative and comprehensive law firm eDiscovery practice survey, “2021 Litigation and Practice Support Survey.” ILTA received responses from litigation support professionals from 82 different law firms ranging in size from medium to large, on a variety of subjects, including eDiscovery practice trends and software tool usage. While the survey addresses a variety of aspects of legal tech and litigation, the survey reveals a couple of very notable insights regarding ESI collection in the enterprise.

The first important insight reflects that targeted ESI collection is the clear preferred method over forensic collection for litigation support purposes. Fifty-nine percent of respondents preferred “targeted collection (non-forensic)” as their standard methodology, while 13 percent still preferred forensic imaging. Forensic collection is rightfully on the decline as a method of ESI collection, as legal counsel seeks to leverage proportionality concepts that greatly reduce cost, time and risk associated with otherwise inefficient eDiscovery.

However, attaining the benefits of targeted collection requires the ability to operationalize workflows as far upstream in the eDiscovery process as possible. For instance, when you’re engaging in data over-collection, which in turn runs up of a lot of human time and processing costs, the ship has largely sailed before you are able to perform early case assessments and data relevancy analysis, as much of the discovery costs have already been incurred at that point. The case law and the Federal Rules provide that the duty to preserve only applies to potentially relevant information, but unless you have the right operational processes in place, you’re losing out on the ability to attain the benefits of proportionality. That is why we see forensic imaging, the epitome of data over-collection, on a steep decline.

The second notable takeaway was that network file shares and “loose files” were the most common form of collection data sources, even outpacing email. Network file shares are a significant challenge with data volumes, typically 10 to 20 terabytes, but can be much higher. Nearly every company and government agency maintains such large file shares, sometimes hundreds of them, depending on the size of the organization. Large network file shares can be found on premise or in a company’s cloud environment.

Traditional eDiscovery collection methods fail to efficiently address these large file shares, due to significant logistical challenges. The data cannot simply be searched in-place by traditional forensics tools or other crawling methods. Consequently, the data is typically copied in bulk and then migrated to another location for processing, where the data is finally indexed and then searched and culled. This approach does not enable the targeted, proportional collection methods preferred by law firms, as noted above.

To accomplish the goals of both targeted collection and addressing large file shares, index and search in-place technology should be utilized. Indexing and search in-place in this context means that a software-based indexing technology (as opposed to an expensive and cumbersome stand-alone hardware appliance) is deployed directly onto the file server or an adjacent computing resource. This indexing occurs without a bulk data transfer of the data. Once indexed, the searches are performed in a few seconds, with complex Boolean operators, metadata filters and regular expression searches. The searches can be iterated and repeated without limitation, which is critical for large data sets.

These capabilities supporting targeted and proportional collection of loose files, emails, and large network file shares are uniquely provided in the X1 Enterprise Platform.

1 Comment

Filed under Best Practices, eDiscovery, eDiscovery & Compliance, Enterprise eDiscovery, ESI, law firm, Social Media Investigations

Post Pandemic, Corporate eDiscovery Undergoes a Permanent Paradigm Shift

By John Patzakis

While the pandemic disrupted the workplace during its height, it is now becoming clear that a more permanent transformation has taken place. Employees and their electronic information assets are far more geographically dispersed. This is requiring corporate legal departments to rethink how they conduct eDiscovery, as the old model based upon data over-collection is no longer tenable. Instead, corporations are favoring a more targeted approach to ESI collection.

Industry analyst Greg Buckles of the eDiscovery Journal recently provided a good analysis on this topic:

“The sheer volume of raw custodial collections has put pressure on discovery professionals to use an iterative selective collection strategy. That puts the corporate legal team closer to scoping and collection activities than most have been. For too long corporate legal has felt uncomfortable pushing back on overly burdensome or broad discovery requests from opposing or retained counsel. The recent development of proportionality frameworks, guidelines and tools has the potential to empower corporate legal to make defensible cost-risk arguments.”

Buckles further observes that “some of my clients have drastically cut their eDiscovery related expenses through these kinds of initiatives.” He terms this as a “grand enterprise reboot” that “brings (corporate legal) to the table with a fresh perspective.”

Most core eDiscovery costs (outside of attorney review) stem from over-collection of ESI. While direct collection costs can seem inexpensive, law firm Nelson Mullins notes that “over preservation tends to have its own costs relating to storage of large amounts of electronically stored information (ESI) and the resources needed to manage it; leads to increased downstream e-discovery costs associated with collection, processing, and review.”

As outlined by Buckles, proportionality-based eDiscovery is an important principle that all corporate attorneys should be leveraging. Under Federal Rule of Civil Procedure 26(b)(1), parties may discover any non-privileged material that is relevant to any party’s claim or defense and proportional to the needs of the case. However, attorneys representing enterprises are essentially flying blind on this analysis when it matters most. Prior to the custodian data being actually collected, processed and analyzed, attorneys do not have any real visibility into the potentially relevant ESI across an organization. This is especially true in regard to unstructured, distributed data, which is invariably the majority of ESI that is ultimately collected in a given matter.

If accurate pre-collection data insight were available to counsel, that game-changing factor would enable counsel to set reasonable discovery limits and ultimately process, host, review and produce much less ESI. Counsel can further use pre-collection proportionality analysis to gather key information, develop a litigation budget, and better manage litigation deadlines. Such insights can also foster cooperation by informing the parties early in the process about where relevant ESI is located, and what keywords and other search parameters can identify and pinpoint relevant ESI.

A solution to these challenges is the utilization of index and search in-place technology. Indexing and search in-place in this context means that a software-based indexing technology is deployed directly onto file servers, laptops or even in the cloud to address cloud-based data sources. This indexing occurs without a bulk data transfer of the data. Once indexed, the searches are performed in a few seconds, with complex Boolean operators, metadata filters and regular expression searches. The searches can be iterated and repeated without limitation, which is critical for large data sets.

But it is important that the technology employed truly enables index-in-place, with the indexes deployed directly onto the laptops, file shares or cloud servers where the data exists. Some providers will market their tools as such, but the indexing and searching actually takes place in their platform at a central location. Data must first be copied and collected off of laptops and file servers and migrated over the network to get the indexing engines. This does not scale for eDiscovery. For information about X1’s index-in-place technology, X1 Enterprise Platform, please visit us here.

Leave a comment

Filed under Best Practices, collection, compliance, eDiscovery, eDiscovery & Compliance, Enterprise eDiscovery, ESI, Preservation & Collection, proportionality

Dark Data is an Unmet Cyber Security Challenge

By John Patzakis

Enterprises today are creating and storing massive volumes of unstructured, data distributed across the enterprise at a very fast pace. IT experts refer to this data type as “dark data.” Research advisory firm Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” according to Rahul Telang, professor of information systems at Carnegie Mellon University, “[o]ver 90% of the data in business is dark data.”

Dark data exists due to organizational silos and a highly distributed and mobile workforce, a trend that proliferated during the COVID pandemic and has now solidified as the new normal. As a result, there is a proliferation of unmanaged data stored in file shares, laptops, unarchived email accounts, shared cloud drives such as OneDrive and Dropbox and many other repositories. According to Anthony Juliano, CTO of Landmark Ventures, “dark data is exploding rapidly with the dissolution of the perimeter; it’s a largely unaddressed risk vector. A vast majority of the CIOs and CISOs I speak with are now prioritizing solving this problem not only going forward, but also backwards – and it’s not easy.”

Cyber security platforms generally have a good handle on perimeter integrity, encryption, and other key priorities such as zero day network attacks and malware. However, while these measures are clearly important, distributed dark data is largely a blind spot for cybersecurity tech, and as such organizations have very little visibility into the content of such data. GDPR, CCPA and other recent privacy regulatory requirements add increased urgency to this challenge.

CISOs and legal and compliance executives often aspire to implement information governance and security programs like defensible deletion, data migration, and data audits across their unstructured data to detect risks and remediate non-compliance. However, without an actual and scalable technology platform to effectuate these goals, those aspirations remain just that.

One tactic attempted by some CIOs to attempt to address this daunting challenge is to periodically migrate disparate data from around the global enterprise into a central location, such as an archiving platform. But boiling the ocean through data migration and centralization is extremely expensive, highly disruptive, and frankly unworkable for numerous reasons. While such a concept may seem like a good idea when drawn up on the whiteboard, originations quickly learn that you cannot just migrate hundreds of terabytes of distributed dark data to an archive, mainly due to network bandwidth and other logistical constraints, as well as the reality that you are merely copying and duplicating the data being migrated, which actually makes the situation worse.

Another tactic is data loss prevention (DLP). Again, this approach is thwarted by the new normal of a distributed, global workforce. Additionally, DLP tools are traditionally hampered by an inability to have deep content insight to unstructured data, resulting in false positives, inaccurate classification and unacceptable disruption to employee and business workflows.

What has always been needed is gaining immediate visibility into unstructured distributed data across the enterprise in-place, through the ability to search and report across several thousand endpoints, file shares and other unstructured data sources, and return results within minutes instead of days or weeks. None of the other approaches outlined above come close to meeting this requirement and in fact actually perpetuate information security and governance failures.

Born and bred to address global eDiscovery challenges, X1 Enterprise platform (X1E) represents a unique approach to dark data, by enabling enterprises to quickly and easily search across multiple distributed endpoints and data servers in place through a true distributed, parallelized computing architecture. Legal, security and compliance teams can easily perform unified complex searches across both unstructured content and metadata, obtaining statistical insight into the data in minutes, instead of days or weeks. With X1E, organizations can also automatically migrate, collect, or take other action on the data as a result of the search parameters. Built on our award-winning and patented X1 Search technology, X1E is the first product to offer true and massively scalable distributed searching that is executed in its entirety on the end-node computers for data audits across an organization. This game-changing capability vastly reduces costs while greatly mitigating risk and disruption to operations.

Leave a comment

Filed under CaCPA, Cyber security, eDiscovery & Compliance, GDPR, Information Governance, Information Management