Category Archives: GDPR

In-Place Data Analytics For Unstructured Data is No Longer Science Fiction

By John Patzakis

AI-driven analytics supercharges compliance investigations, data security, privacy audits and eDiscovery document review.  AI machine learning employs mathematical models to assess enormous datasets and “learn” from feedback and exposure to gain deep insights into key information. This enables the identification of discrete and hidden patterns in millions of emails and other electronic files to categorize and cluster documents by concepts, content, or topic. This process goes beyond keyword searching to identify anomalies, internal threats, or other indicators of relevant behavior. The enormous volume and scope of corporate data being generated has created numerous opportunities for investigators seeking deep information insights in support of internal compliance, civil litigation and regulatory matters.

The most effective use of AI in investigations couple continuous active learning technology with concept clustering to discover the most relevant data in documents, emails, text and other sources.  As AI continues to learn and improve over time, the benefits of an effectively implemented approach will also increase. In-house and outside counsel and compliance teams are now relying on AI technology in response to government investigations, but also increasingly to identify risks before they escalate to that stage.

Stock Photo - Digital Image used in blog

However, logistical and cost barriers have traditionally stymied organizations from taking advantage of AI in a systematic and proactive basis, especially regarding unstructured data, which, according to industry studies, constitutes 80 percent or more of all data (and data risk) in the enterprise. As analytics engines ingest the text from documents and emails, the extracted text must be “mined” from their native originals. And the natives must first be collected and migrated to a centralized processing appliance. This arduous process is expensive and time consuming, particularly in the case of unstructured data, which must be collected from the “wild” and then migrated to a central location, creating a stand-alone “data lake.”

Due to these limitations, otherwise effective AI capabilities are utilized typically only on very large matters on a reactive basis that limits its benefits to the investigation at hand and the information within the captive data lake.  Thus, ongoing active learning is not generally applied across multiple matters or utilized proactively. And because that captive information consists of migrated copies of the originals, there is a very limited ability to act on data insights as the original data remains in its actual location in the enterprise.

So the ideal architecture for the enterprise would be to move the data analytics “upstream” where all the unstructured data resides, which would not only save up to millions per year in investigation, data audit and eDiscovery costs, but would enable proactive utilization for compliance auditing, security and policy breaches and internal fraud detection.  However, analytics engines require considerable computing resources, with the leading AI solutions typically necessitating tens of thousands of dollars’ worth of high end hardware for a single server instance. So these computing workloads simply cannot be forward deployed to laptops and multiple file servers, where the bulk of unstructured data and associated enterprise risk exists.

But an alternative architecture solves this problem. A process that extracts text from unstructured, distributed data in place, and systematically sends that data at a massive scale to the analytics platform, with the associated metadata and global unique identifiers for each item.  As mentioned, one of the many challenges with traditional workflows is the massive data transfer associated with ongoing data migration of electronic files and emails, the latter of which must be sent in whole containers such as PST files. This process alone can take weeks, choke network bandwidth and is highly disruptive to operations. However, the load associated with text/metadata only is less than 1 percent of the full native item. So the possibilities here are very compelling. This architecture enables very scalable and proactive compliance, information security, and information governance use cases. The upload to AI engines would take hours instead of weeks, enabling continual machine learning to improve processes and accuracy over time and enable immediate action to taken on identified threats or otherwise relevant information.

The only solution that we are aware of that fulfills this vision is X1 Distributed GRC. X1’s unique distributed architecture upends the traditional collection process by indexing at the distributed endpoints, enabling direct pipeline of extracted text to the analytics platform. This innovative technology and workflow results in far faster and more precise collections and a more informed strategy in any matter.

Deployed at each end point or centrally in virtualized environments, X1 Enterprise allows practitioners to query many thousands of devices simultaneously, utilize analytics before collecting and process while collecting directly into myriad different review and analytics applications like RelativityOne and Brainspace. X1 Enterprise empowers corporate eDiscovery, compliance, investigative, cybersecurity and privacy staff with the ability to find, analyze, collect and/or delete virtually any piece of unstructured user data wherever it resides instantly and iteratively, all in a legally defensible fashion.

X1 displayed these powerful capabilities with ComplianceDS in a recent webinar with a brief but substantive demo of our X1 Distributed GRC solution, emphasizing our innovative support of analytics engines through our game-changing ability to extract text in place with direct feed into AI solutions.

Here is a link to the recording with a direct link to the 5 minute demo portion.

Leave a comment

Filed under Best Practices, collection, compliance, Corporations, eDiscovery & Compliance, Enterprise eDiscovery, Enterprise Search, GDPR, Uncategorized

GDPR Fines Issued for Failure to Essentially Perform Enterprise eDiscovery

By John Patzakis

The European General Data Protection Regulation (GDPR) came into full force in May 2018. Prior to that date, what I consistently heard from most of the compliance community was general fear and doubt about massive fines, with the solution being to re-purpose existing compliance templates and web-based dashboards. However, many organizations have learned the hard way that “paper programs” alone fall far short of the requirements under the GDPR. This is because the GDPR requires that an organization have absolute knowledge of where all EU personal data is stored across the enterprise, and be able to search for, identify and remove it when required.GDPR-stamp

Frequent readers of this blog may recall we banged the Subject Access Request drum prior to May 2018. We noted an operational enterprise search and eDiscovery was required to effectively comply with many of the core data discovery-focused requirements of GDPR. Under the GDPR, a European resident can request — potentially on a whim — that all data an enterprise holds on them be identified and also be removed. Organizations are required to establish a capability to respond to these Subject Access Requests (SARs). Forrester Research notes that “Data Discovery and classification are the foundation of GDPR compliance.” This is because, according to Forrester, GDPR effectively requires that an organization be able to identify and actually locate, with precision, personal data of EU data subjects across the organization.

Failure to respond to SARs has already led to fines and enforcement actions against several companies, including Google and the successor entity to Cambridge Analytica. This shows that many organizations are failing to understand the operational reality of GDPR compliance. This point is effectively articulated by a recent practice update from the law firm of DLA Piper on the GDPR, which states: “The scale of fines and risk of follow-on private claims under GDPR means that actual compliance is a must. GDPR is not a legal and compliance challenge – it is much broader than that, requiring organizations to completely transform the way that they collect, process, securely store, share and securely wipe personal data (emphasis added).”

These GDPR requirements can only be complied with through an effective enterprise eDiscovery search capability:

To achieve GDPR compliance, organizations must ensure that explicit policies and procedures are in place for handling personal information, and just as importantly, the ability to prove that those policies and procedures are being followed and operationally enforced. What has always been needed is gaining immediate visibility into unstructured distributed data across the enterprise, through the ability to search and report across several thousand endpoints and other unstructured data sources, and returning results within minutes instead of days or weeks. The need for such an operational capability is further heightened by the urgency of GDPR compliance.

X1 Distributed GRC represents a unique approach, by enabling enterprises to quickly and easily search across multiple distributed endpoints and data servers from a central location.  Legal and compliance teams can easily perform unified complex searches across both unstructured content and metadata, obtaining statistical insight into the data in minutes, instead of days or weeks. With X1, organizations can also automatically migrate, collect, delete, or take other action on the data as a result of the search parameters.  Built on our award-winning and patented X1 Search technology, X1 Distributed GRC is the first product to offer true and massively scalable distributed searching that is executed in its entirety on the end-node computers for data audits across an organization. This game-changing capability vastly reduces costs while effectuating that all-too-elusive actual compliance with information governance programs, including GDPR.

1 Comment

Filed under Best Practices, compliance, Data Audit, GDPR, Uncategorized

Three Key eDiscovery Preservation Lessons from Small v. University Medical Center

Small v. University Medical Center is a recent 123-page decision focused exclusively on issues and challenges related to preservation of electronically stored information in a large enterprise. Its an important ESI preservation case with some very instructive takeaways for organizations and their counsel.  In Small, Plaintiffs brought an employment wage & hour class action against University Medical Center of Southern Nevada (UMC). Such wage & hour employment matters invariably involve intensive eDiscovery, and this case was no exception. When it became evident that UMC was struggling mightily with their ESI preservation and collection obligations, the Nevada District Court appointed a special master, who proved to be tech-savvy with a solid understanding of eDiscovery issues.Case Law

In August 2014, the special master issued a report, finding that UMC’s destruction of relevant information “shock[ed] the conscious.” Among other things, the special master recommended that the court impose a terminating sanction in favor of the class action plaintiffs. The findings of the special master included the following:

  • UMC had no policy for issuing litigation holds, and no such hold was issued for at least the first eight months of this litigation.
  • UMC executives were unaware of their preservation duties, ignoring them altogether, or at best addressing them “in a hallway in passing.”
  • Relevant ESI from laptops, desktops and local drives were not preserved until some 18 months into this litigation.
  • ESI on file servers containing policies and procedures regarding meal breaks and compensation were not preserved.
  • These issues could have been avoided using best practices and if chain-of-custody paperwork had been completed.
  • All of UMC’s multiple ESI vendors repeatedly failed to follow best practices

After several years of considering and reviewing the special master’s detailed report and recommendations, the court finally issued its final discovery order last month. The court concurred with the special master’s findings, holding that UMC and its counsel failed to take reasonable efforts to identify, preserve, collect, and produce relevant information. The court imposed monetary sanctions against UMC, including the attorney fees and costs incurred by opposing counsel. Additionally, the court ordered that should the matter proceed to trial, the jury would be instructed that “the court has found UMC failed to comply with its legal duty to preserve discoverable information… and failed to comply with a number of the court’s orders,” and that “these failures resulted in the loss or destruction of some ESI relevant to the parties’ claims and defenses and responsive to plaintiffs’ discovery requests, and that the jury may consider these findings with all other evidence in the case for whatever value it deems appropriate.” Such adverse inference instructions are invariably highly impactful if not effectively dispositive in a jury trial.

There are three key takeaways from Small:

  1. UMC’s Main Failing was Lacking an Established Process

UMC’s challenges all centered on its complete lack of an existing process to address eDiscovery preservation. UMC and their counsel could not identify the locations of potentially relevant ESI because there was no data map. ESI was not timely preserved because no litigation hold process existed. And when the collection did finally occur under the special master’s order, it was highly reactive and very haphazard because UMC had no enterprise-capable collection capability.

When an organization does not have a systematic and repeatable process in place, the risks and costs associated with eDiscovery increase exponentially. Such a failure also puts outside counsel in a very difficult situation, as reflected by this statement from the Small Court: “One of the most astonishing assertions UMC made in its objection to the special master’s R & R is that UMC did not know what to preserve. UMC and its counsel had a legal duty to figure this out. Collection and preservation of ESI is often an iterative process between the attorney and the client.”

Some commentators have focused on the need to conduct custodian questionnaires, but a good process will obviate or at least reduce your reliance on often unreliable custodians to locate potentially relevant ESI.

  1. UMC Claims of Burden Did Not Help Their Cause

UMC tried arguing that it was too burdensome and costly for them to collect ESI from hundreds of custodians, claiming that it took IT six hours to merely search the email account of a single custodian. Here at X1, I wear a couple of hats, including compliance and eDiscovery counsel. In response to a recent GDPR audit, we searched dozens of our email accounts in seconds. This capability not only dramatically reduces our costs, but also our risk by allowing us to demonstrate diligent compliance.

In the eDiscovery context, the ability to quickly pinpoint potentially responsive data enables corporate counsel to better represent their client. For instance, they are then able to intelligently negotiate keywords and overall preservation scope with opposing counsel, instead of flying blind. Also, with their eDiscovery house in order, they can focus on more strategic priorities in the case, including pressing the adversary on their discovery compliance, with the confidence that your client does not live in a glass house.

Conversely, the Small opinion documents several meet and confer meetings and discovery hearings where UMC’s counsel was clearly at a significant disadvantage, and progressively lost credibility with the court because they didn’t know what they didn’t know.

  1. Retaining Computer Forensics Consultants Late in the Game Did Not Save the Day

Eventually UMC retained forensic collection consultants several months after the duty to preserve kicked in. This reflects an old school reactive, “drag the feet” approach some organizations still take, where they try to deflect preservation obligations and then, once opposing counsel or the court force the issue, scramble and retain forensic consultants to parachute in.  In this situation it was already too late, as much the data had already been spoliated. And because of the lack of a process, including a data map, the collection efforts were disjointed and a haphazard. The opinion also reflects that this reactive fire drill resulted in significant data over-collection at significant cost to UMC.

In sum, Small v. University Medical Center is a 123 page illustration of what often happens when an organization does not have a systematic eDiscovery process in place. An effective process is established through the right people, processes and technology, such as the capabilities of the X1 Distributed Discovery platform. A complete copy of the court opinion can be accessed here: Small v. University Medical Center

1 Comment

Filed under Best Practices, Case Law, compliance, Corporations, eDiscovery, eDiscovery & Compliance, Enterprise eDiscovery, GDPR, Information Governance, Information Management, Preservation & Collection

When your “Compliance” and eDiscovery Processes Violate the GDPR

Time to reevaluate tools that rely on systemic data duplication

The European Union (EU) General Data Protection Regulation (GDPR) became effective in May 2018. To briefly review, the GDPR applies to the processing of “personal data” of EU citizens and residents (a.k.a. “data subjects”).” Personal data” is broadly defined to include “any information relating to an identified or identifiable natural person.” That could include email addresses and transactional business communications that are tied to a unique individual. GDPR is applicable to any organization that provides goods and services to individuals located in the EU on a regular enough basis, or maintains electronic records of their employees who are EU residents.

In additional to an overall framework of updated privacy policies and procedures, GDPR requires the ability to demonstrate and prove that personal data is being protected. Essential components for such compliance are data audit and discovery capabilities that allow companies to efficiently search and identify the information necessary, both proactively, and also reactively to respond to regulators and EU private citizen’s requests. As such, any GDPR compliance programs are ultimately hollow without consistent, operational execution and enforcement through an effective eDiscovery information governance platform.

However, some content management and archiving tool providers are repurposing their messaging with GDPR compliance. For example, an industry executive contact recently recounted a meeting with such a vendor, where their tool involved duplicating all of the emails and documents in the enterprise and then migrating all those copies to a central server cluster. That way, the tool could theoretically manage all the documents and emails centrally. Putting aside the difficulty of scaling up that process to manage and sync hundreds of terabytes of data in a medium-sized company (and petabytes in a Fortune 500), this anecdote underscores a fundamental flaw in tools that require systemic data duplication in order to search and manage content.

Under the GDPR, data needs to be minimized, not systematically duplicated en masse. It would be extremely difficult under such an architecture to sync up and remediate non-compliant documents and emails back at the original location. So at the end the day, this proposed solution would actually violate the GDPR by making duplicate copies of data sets that would inevitably include non-compliant information, without any real means to sync up remediation.Desktop_virtualization

The same is true for the much of the traditional eDiscovery workflows, which require numerous steps involving data duplication at every turn. For instance, data collection is often accomplished through misapplied forensic tools that operate by a broadly collecting copies through over collection. As the court said in In re Ford Motor Company, 345 F.3d 1315 (11th Cir. 2003): “[E]xamination of a hard drive inevitably results in the production of massive amounts of irrelevant, and perhaps privileged, information…” Even worse, the collected data is then re-duplicated one or often two more times by the examiner for archival purposes. And then the data is sent downstream for processing, which results in even more data duplication. Load files are created for further transfers, which are also duplicated.

Chad Jones of D4 explains on a recent webinar and in his follow-on blog post about how such manual and inefficient handoffs throughout the discovery process greatly increase risk as well as cost. Like antiquated factories spewing tons of pollution, outdated eDiscovery processes spin out a lot of superfluous data duplication. Much of that data likely contains non-compliant information, thus “polluting” your organization, including through your eDiscovery services vendors, with increased GDPR and other regulatory risk.

In light of the above, when evaluating your compliance and eDiscovery software, organizations should keep in mind these five key requirements to keep in line with GDPR and good overall information governance:

  1. Search data in place. Data on laptops and file servers need to be in searched in place. Tools that require copy and migration to central locations to search and manage are part of the problem, not the solution.
  1. Delete Data in Place. GDPR requires that non-compliance data be deleted on demand. Purging data on managed archives does not suffice if other copies are on laptops, unmanaged servers and other unstructured sources. Your search in place solution should also delete in place.
  1. Data Minimization. GDPR requires that organizations minimize data as opposed to exploding data through mass duplication.
  1. Targeted and Efficient Data Collection: Only potentially relevant data should be collected for eDiscovery and data audits. Over-collection leads to much greater cost and risk.
  1. Seamless integration with attorney review platforms, to bypass the processing steps which requires manual handoffs and load files.

X1 Data Audit & Compliance is a ground-breaking platform that meets these criterion while enabling system-wide data discovery supporting GDPR and many other information governance requirements.   Please visit here to learn more.

Leave a comment

Filed under Best Practices, compliance, eDiscovery, eDiscovery & Compliance, Enterprise eDiscovery, GDPR, Information Governance, Information Management, Uncategorized

Assessing GDPR 30 Days In: A Report from the Field

Enforcement of the EU General Data Protection Regulation (GDPR) began May 25, 2018, and this new development is significantly reshaping the information governance landscape for organizations worldwide that control, process or store the data of European residents. Yesterday, X1 hosted a live webinar featuring GDPR experts Jay Kramer, a partner at Lewis Brisbois in the firm’s cybersecurity and privacy group, and Marty Provin, executive vice president at Jordan Lawrence.

Kramer provided a “battlefield report” about what he is seeing from the field and hearing from his various clients, with three main observations:

  1. Many are still late to the game. Kramer noted that he has several clients contacting him well after the May 25 enforcement date to begin the process of GDPR compliance.
  1. GDPR compliance maps to best practices. Becoming GDPR ready is a good business decision because it establishes transparency, data privacy and security processes that companies should be doing anyway.
  1. Now that the law has gone into effect, organizations that have been proactive are quickly transitioning from readiness to operational compliance and enforcement. For instance, many organizations are finding themselves responding to data subject access requests.

Kramer also noted that while much focus has been on potential fines levied under GDPR, organizations need to be aware that individuals can file complaints with the supervisory authorities under article 77, or even bring their own private actions, citing article 82. These claims have already been brought in the form of class actions, and Kramer expressed concern that many more claims could be fanned by “privacy trolls” – similar in concept to “patent trolls” – or by disgruntled customers or ex-employees.

Marty Provin outlined the importance of information governance and data classification in support GDPR compliance, especially from a standpoint of the need to operationalize policies and procedures in order to identify non-compliant data throughout your organization, and properly respond to regulatory requirements and data subject access requests. Kramer seconded that point, noting that the GDPR requires that an organization have absolute knowledge of where all EU personal data is stored across the enterprise and be able to remove or minimize it when required.

This readiness is achieved through planning, data mapping, and data classification. Provin provided an informative overview of these processes, based upon his extensive experience implementing such best practices for his clients over the past 20 years. Marty observed that it is also important to have a solution like X1 Data Audit and Compliance to search and identify documents, emails and other records across your enterprise that are non-compliant with GDPR. Such a capability is essential to address both the proactive and reactive components of GDPR.

The final segment of the webinar included a live demonstration of a proactive data audit across numerous computers to find PII of EU data subjects. The second half of the demonstration illustrated an effective response to an actual data subject access request in the form of a request by an individual to have their data erased.

In addition to comprehensive search, the demo highlighted the ability of X1 to also report in a detailed fashion and then take action on identified data by migrating it or even delete in place, including within email containers.

A recording of this informative and timely webinar is available for viewing here.

 

 

Leave a comment

Filed under Best Practices, eDiscovery & Compliance, GDPR, Information Governance, Records Management, Uncategorized