Dark Data is an Unmet Cyber Security Challenge

By John Patzakis

Enterprises today are creating and storing massive volumes of unstructured, data distributed across the enterprise at a very fast pace. IT experts refer to this data type as “dark data.” Research advisory firm Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” according to Rahul Telang, professor of information systems at Carnegie Mellon University, “[o]ver 90% of the data in business is dark data.”

Dark data exists due to organizational silos and a highly distributed and mobile workforce, a trend that proliferated during the COVID pandemic and has now solidified as the new normal. As a result, there is a proliferation of unmanaged data stored in file shares, laptops, unarchived email accounts, shared cloud drives such as OneDrive and Dropbox and many other repositories. According to Anthony Juliano, CTO of Landmark Ventures, “dark data is exploding rapidly with the dissolution of the perimeter; it’s a largely unaddressed risk vector. A vast majority of the CIOs and CISOs I speak with are now prioritizing solving this problem not only going forward, but also backwards – and it’s not easy.”

Cyber security platforms generally have a good handle on perimeter integrity, encryption, and other key priorities such as zero day network attacks and malware. However, while these measures are clearly important, distributed dark data is largely a blind spot for cybersecurity tech, and as such organizations have very little visibility into the content of such data. GDPR, CCPA and other recent privacy regulatory requirements add increased urgency to this challenge.

CISOs and legal and compliance executives often aspire to implement information governance and security programs like defensible deletion, data migration, and data audits across their unstructured data to detect risks and remediate non-compliance. However, without an actual and scalable technology platform to effectuate these goals, those aspirations remain just that.

One tactic attempted by some CIOs to attempt to address this daunting challenge is to periodically migrate disparate data from around the global enterprise into a central location, such as an archiving platform. But boiling the ocean through data migration and centralization is extremely expensive, highly disruptive, and frankly unworkable for numerous reasons. While such a concept may seem like a good idea when drawn up on the whiteboard, originations quickly learn that you cannot just migrate hundreds of terabytes of distributed dark data to an archive, mainly due to network bandwidth and other logistical constraints, as well as the reality that you are merely copying and duplicating the data being migrated, which actually makes the situation worse.

Another tactic is data loss prevention (DLP). Again, this approach is thwarted by the new normal of a distributed, global workforce. Additionally, DLP tools are traditionally hampered by an inability to have deep content insight to unstructured data, resulting in false positives, inaccurate classification and unacceptable disruption to employee and business workflows.

What has always been needed is gaining immediate visibility into unstructured distributed data across the enterprise in-place, through the ability to search and report across several thousand endpoints, file shares and other unstructured data sources, and return results within minutes instead of days or weeks. None of the other approaches outlined above come close to meeting this requirement and in fact actually perpetuate information security and governance failures.

Born and bred to address global eDiscovery challenges, X1 Enterprise platform (X1E) represents a unique approach to dark data, by enabling enterprises to quickly and easily search across multiple distributed endpoints and data servers in place through a true distributed, parallelized computing architecture. Legal, security and compliance teams can easily perform unified complex searches across both unstructured content and metadata, obtaining statistical insight into the data in minutes, instead of days or weeks. With X1E, organizations can also automatically migrate, collect, or take other action on the data as a result of the search parameters. Built on our award-winning and patented X1 Search technology, X1E is the first product to offer true and massively scalable distributed searching that is executed in its entirety on the end-node computers for data audits across an organization. This game-changing capability vastly reduces costs while greatly mitigating risk and disruption to operations.

Leave a comment

Filed under CaCPA, Cyber security, eDiscovery & Compliance, GDPR, Information Governance, Information Management

eDiscovery Services Are Undergoing a Major Transformation

By John Patzakis

Recent research from industry analyst Greg Buckles at the eDiscovery Journal highlights soaring valuations for eDiscovery tech firms.  For the first time in the history of the industry, multiple eDiscovery tech firms have gone public in a single year, and by my count, there are at least seven tech “Unicorns” (a company with at least a billion dollar valuation) in the space. Relativity leads the way with at least a $3.6 billion valuation based upon their latest financing.

Yet while technology-based providers are seeing escalating valuations, valuations and M&A activity for pure services firms are conversely softening. This is because tech automation is finally catching up to this space. Traditional eDiscovery services typically involve manual collection, followed by manual on-premise hardware-based processing, and finally manual upload to review. These inefficiencies extend projects by often weeks while dramatically increasing cost and risk with many manual data handoffs. However, the first half of the EDRM involving collection and processing are now far more automated than they were even a few years ago. For instance, the one aspect of eDiscovery tech that is actually seeing decreasing usage and revenues are standalone processing appliances. This is because these tools are dependent upon the efficient manual services model prior to ingestion and also post import.

However, the latest in eDiscovery collection technologies will now combine targeted collection with previously manual processing steps that are performed “on the fly” and in the background so that the data is automatically collected, processed and uploaded into a review platform such as Relativity in one fell swoop. Better yet, processing is now free with RelativityOne. The automation Relativity is engineering, including with their integration with X1, along with innovations by other review platforms, is rendering traditional eDiscovery processing tech obsolete, along with manual collection and processing services. The purchasers of eDiscovery services and software have clearly noticed and are demanding adaptation from vendors.  

So how can services firms adapt to the inevitable? Here are few strategies:

First, services firms should move upstream to focus on information governance and privacy consulting. The new generation of eDiscovery technology enables convergence with privacy (i.e. GDPR compliance) information security and many other information governance use cases. This convergence requires high-end strategic consulting to bring these processes together and operationalize them. This also enables services firms to develop direct and ongoing relationships with corporate law departments, IT and other key corporate stakeholders.

Second, data analytics consulting, which is already a prominent offering by many firms, is ripe for further expansion. This is because analytics for eDiscovery is becoming more advanced and user friendly, and thus is able to be applied across the eDiscovery workflow, including pre-collection analytics and information governance.

Third, services firms should find ways to develop or otherwise acquire their own differentiating tech or establish meaningful partnerships with tech platform providers. These partnerships should entail more than merely using the software, but the development of proprietary workflows or even technical integrations that enable unique service offerings.

At the end of the day, eDiscovery is a technical process that is subject to technology disruption just like any other technology-based services industry. eDiscovery services firms that not only adapt to but embrace this change as a strategic opportunity will be the ones who prosper the most.

Leave a comment

Filed under Best Practices, eDiscovery, eDiscovery & Compliance, GDPR, Information Governance, Preservation & Collection, Uncategorized

Pre-Collection Keyword Searches: Where Angels May Fear to Tread but Not Attorneys with the Right eDiscovery Software

By John Patzakis

One of the key cases involving the principles of proportionality under Federal Rule of Civil Procedure 26(b)(1), is McMaster v. Kohl’s Dep’t Stores, Inc., (E.D. Mich. July 24, (2020).  McMaster generally supports the application of a process that effectively applies proportionality on an operational basis through an iterative exercise to identify relevant custodians, their data sources, applicable date ranges, file types and agreed upon keywords. Such a targeted, automated and proportional collection process can be applied to collect only the data that is responsive to this specific criteria.

However, the main ESI dispute in McMaster was that the attorneys could not agree on a list of search terms and sought a ruling of the courts to decide on which search terms should be used. As noted by the Magistrate Judge in McMaster, “Here is another case in which the Court is called upon to decide whose competing list of search terms is better suited for the search of large amounts of electronically stored information”, citing United States v. O’Keefe, 537 F. Supp. 2d 14, 23–24 (D.D.C. 2008), which stated: “for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”  Judge Whalen stated: “I, for one, have no interest in going where angels fear to tread. Therefore, if the parties cannot agree on appropriately limited search terms, they will share the cost of retaining an expert to assist them. If they still cannot agree, then Plaintiff may renew his motion regarding the search terms and will provide the Court with an expert report substantiating his position.”

The parties had been engaged in a Rule 26(f) exercise, which requires the parties’ counsel to “meet and confer” in advance of the pre-trial scheduling conference on key discovery matters, including the preservation, disclosure and exchange of potentially relevant electronically stored information (ESI). A very good authority on the Rule 26(f) eDiscovery conference is the “Suggested Protocol for Discovery of Electronically Stored Information,” provided by then Magistrate Judge Paul W. Grimm and his joint bar-court committee. Under Section 8 of the Model Protocol, the topics to be discussed at the Rule 26(f) conference include: “Search methodologies for retrieving or reviewing ESI such as identification of the systems to be searched;” “the use of key word searches, with an agreement on the words or terms to be searched;” “limitations on the time frame of ESI to be searched;” and “limitations on the fields or document types to be searched.”

Kelly Twigger, one of the best and brightest eDiscovery attorneys in the field in my opinion, commented in a recent webinar that eDiscovery collection capabilities that enable an iterative search and collection process in place would allow her to make more much informed decisions on keyword strategies. Twigger noted that software that provides keyword hits and other analytics on custodian laptops, fileservers and other network and cloud sources prior to collection, would enable her “to be able to define and agree upon the right search terms” with the requesting party. Twigger pointed out that while attorneys and judges rightfully avoid “where angels fear to tread” — agreeing on keywords without any visibility into the data — that concern can be alleviated when the right processes and technology are used.  

And such technology is important, because optimizing the process of developing keyword searches is no easy task. The typical approach of blindly brainstorming a list of terms that may be relevant and running the search on a dataset to be reviewed results in a wide range of inefficiencies. Negotiations over proper usage of search terms may become onerous and contentious. Judges are often tasked with making determinations regarding the aptness of the methodology, and, as we see in McMaster, are very reluctant to do so. Thus, the use of outside expertise and leveraging indexing in place technology is beneficial in building an effective and comprehensive pre-collection search term strategy and enabling you to tread where angels fear to.

1 Comment

Filed under Best Practices, Case Law, eDiscovery, Enterprise eDiscovery, ESI, law firm, Preservation & Collection

Relativity and X1 Publish Joint Legal Whitepaper on ESI Collection Best Practices

By John Patzakis

Relativity and X1 have published a joint legal whitepaper on the topic of full-disk imaging as a disfavored collection practice in civil litigation, with Relativity eDiscovery attorney David Horrigan as the lead author. The paper delves into all the legal reasons, including detailed analysis of case law, the Federal Rules of Civil Procedure, and the Sedona Principles establishing why forensic collection is not required in civil litigation. The paper primarily focuses on the principles of proportionality in its legal analysis as well as case law issued prior to the 2015 amendment to the Federal Rules of Civil Procedure, which gave greater prominence and clarification of the proportionality rules.


This is an important topic as a key problem in eDiscovery that drives inefficiencies and higher costs is that default collection methods often involve full-disk imaging—a forensic examination of an entire computer—when searching for responsive data. As the whitepaper notes, “it turns out full-disk imaging is not required for most eDiscovery collections. In fact, courts often disfavor the practice.”


A copy of the whitepaper can be found here.

Leave a comment

Filed under Authentication, Best Practices, Case Law, eDiscovery, ESI, law firm, Preservation & Collection, proportionality