Category Archives: Information Governance

Dark Data is an Unmet Cyber Security Challenge

By John Patzakis

Enterprises today are creating and storing massive volumes of unstructured, data distributed across the enterprise at a very fast pace. IT experts refer to this data type as “dark data.” Research advisory firm Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” according to Rahul Telang, professor of information systems at Carnegie Mellon University, “[o]ver 90% of the data in business is dark data.”

Dark data exists due to organizational silos and a highly distributed and mobile workforce, a trend that proliferated during the COVID pandemic and has now solidified as the new normal. As a result, there is a proliferation of unmanaged data stored in file shares, laptops, unarchived email accounts, shared cloud drives such as OneDrive and Dropbox and many other repositories. According to Anthony Juliano, CTO of Landmark Ventures, “dark data is exploding rapidly with the dissolution of the perimeter; it’s a largely unaddressed risk vector. A vast majority of the CIOs and CISOs I speak with are now prioritizing solving this problem not only going forward, but also backwards – and it’s not easy.”

Cyber security platforms generally have a good handle on perimeter integrity, encryption, and other key priorities such as zero day network attacks and malware. However, while these measures are clearly important, distributed dark data is largely a blind spot for cybersecurity tech, and as such organizations have very little visibility into the content of such data. GDPR, CCPA and other recent privacy regulatory requirements add increased urgency to this challenge.

CISOs and legal and compliance executives often aspire to implement information governance and security programs like defensible deletion, data migration, and data audits across their unstructured data to detect risks and remediate non-compliance. However, without an actual and scalable technology platform to effectuate these goals, those aspirations remain just that.

One tactic attempted by some CIOs to attempt to address this daunting challenge is to periodically migrate disparate data from around the global enterprise into a central location, such as an archiving platform. But boiling the ocean through data migration and centralization is extremely expensive, highly disruptive, and frankly unworkable for numerous reasons. While such a concept may seem like a good idea when drawn up on the whiteboard, originations quickly learn that you cannot just migrate hundreds of terabytes of distributed dark data to an archive, mainly due to network bandwidth and other logistical constraints, as well as the reality that you are merely copying and duplicating the data being migrated, which actually makes the situation worse.

Another tactic is data loss prevention (DLP). Again, this approach is thwarted by the new normal of a distributed, global workforce. Additionally, DLP tools are traditionally hampered by an inability to have deep content insight to unstructured data, resulting in false positives, inaccurate classification and unacceptable disruption to employee and business workflows.

What has always been needed is gaining immediate visibility into unstructured distributed data across the enterprise in-place, through the ability to search and report across several thousand endpoints, file shares and other unstructured data sources, and return results within minutes instead of days or weeks. None of the other approaches outlined above come close to meeting this requirement and in fact actually perpetuate information security and governance failures.

Born and bred to address global eDiscovery challenges, X1 Enterprise platform (X1E) represents a unique approach to dark data, by enabling enterprises to quickly and easily search across multiple distributed endpoints and data servers in place through a true distributed, parallelized computing architecture. Legal, security and compliance teams can easily perform unified complex searches across both unstructured content and metadata, obtaining statistical insight into the data in minutes, instead of days or weeks. With X1E, organizations can also automatically migrate, collect, or take other action on the data as a result of the search parameters. Built on our award-winning and patented X1 Search technology, X1E is the first product to offer true and massively scalable distributed searching that is executed in its entirety on the end-node computers for data audits across an organization. This game-changing capability vastly reduces costs while greatly mitigating risk and disruption to operations.

Leave a comment

Filed under CaCPA, Cyber security, eDiscovery & Compliance, GDPR, Information Governance, Information Management

eDiscovery Services Are Undergoing a Major Transformation

By John Patzakis

Recent research from industry analyst Greg Buckles at the eDiscovery Journal highlights soaring valuations for eDiscovery tech firms.  For the first time in the history of the industry, multiple eDiscovery tech firms have gone public in a single year, and by my count, there are at least seven tech “Unicorns” (a company with at least a billion dollar valuation) in the space. Relativity leads the way with at least a $3.6 billion valuation based upon their latest financing.

Yet while technology-based providers are seeing escalating valuations, valuations and M&A activity for pure services firms are conversely softening. This is because tech automation is finally catching up to this space. Traditional eDiscovery services typically involve manual collection, followed by manual on-premise hardware-based processing, and finally manual upload to review. These inefficiencies extend projects by often weeks while dramatically increasing cost and risk with many manual data handoffs. However, the first half of the EDRM involving collection and processing are now far more automated than they were even a few years ago. For instance, the one aspect of eDiscovery tech that is actually seeing decreasing usage and revenues are standalone processing appliances. This is because these tools are dependent upon the efficient manual services model prior to ingestion and also post import.

However, the latest in eDiscovery collection technologies will now combine targeted collection with previously manual processing steps that are performed “on the fly” and in the background so that the data is automatically collected, processed and uploaded into a review platform such as Relativity in one fell swoop. Better yet, processing is now free with RelativityOne. The automation Relativity is engineering, including with their integration with X1, along with innovations by other review platforms, is rendering traditional eDiscovery processing tech obsolete, along with manual collection and processing services. The purchasers of eDiscovery services and software have clearly noticed and are demanding adaptation from vendors.  

So how can services firms adapt to the inevitable? Here are few strategies:

First, services firms should move upstream to focus on information governance and privacy consulting. The new generation of eDiscovery technology enables convergence with privacy (i.e. GDPR compliance) information security and many other information governance use cases. This convergence requires high-end strategic consulting to bring these processes together and operationalize them. This also enables services firms to develop direct and ongoing relationships with corporate law departments, IT and other key corporate stakeholders.

Second, data analytics consulting, which is already a prominent offering by many firms, is ripe for further expansion. This is because analytics for eDiscovery is becoming more advanced and user friendly, and thus is able to be applied across the eDiscovery workflow, including pre-collection analytics and information governance.

Third, services firms should find ways to develop or otherwise acquire their own differentiating tech or establish meaningful partnerships with tech platform providers. These partnerships should entail more than merely using the software, but the development of proprietary workflows or even technical integrations that enable unique service offerings.

At the end of the day, eDiscovery is a technical process that is subject to technology disruption just like any other technology-based services industry. eDiscovery services firms that not only adapt to but embrace this change as a strategic opportunity will be the ones who prosper the most.

Leave a comment

Filed under Best Practices, eDiscovery, eDiscovery & Compliance, GDPR, Information Governance, Preservation & Collection, Uncategorized

Architecting a New Paradigm in Legal Governance

By Michael Rasmussen

Editor’s note: Today we are featuring a guest blog post from Michael Rasmussen, the GRC Pundit & Analyst at GRC 20/20 Research, LLC.

Exponential growth and change in business strategy, risks, regulations, globalization, distributed operations, competitive velocity, technology, and business data encumbers organizations of all sizes. Gone are the years of simplicity in business operations.

Managing the complexity of business from a legal and privacy perspective, governing information that is pervasive throughout the organization, and keeping continuous business and legal change in sync is a significant challenge for boards, executives, as well as the legal professionals in the legal department. Organizations need an integrated strategy, process, information, and technology architecture to govern legal, meet legal commitments, and manage legal uncertainty and risk in a way that is efficient, effective, and agile and extends into the broader enterprise GRC architecture.

In my previous blog, Operationalizing GRC in Context of Legal & Privacy: The Last Mile of GRC, I began this discussion, and here I aim to expound on it further from a legal context.

Legal today is more than legal matters, actions, and contracts. Today’s legal organization has to respond to incident/breach reporting and notification laws in a timely and compliant manner, respond to Data Subject Access Requests (DSAR), harmonize and monitor retentions obligations, conduct eDiscovery, manage legal holds on data, and continuously monitor regulations and legislation and apply them to a business context.

In today’s global business environment, a broad spectrum of economic, political, social, legal, and regulatory changes are continually bombarding the organization. The organization continues to see exponential growth of regulatory requirements and legal obligations (often conflicting and overlapping) that must be met, which multiply as the organization expands global operations, products, and services. This requires an integrated approach to legal governance, risk management, and compliance (GRC) with a goal to reliably achieve objectives while addressing uncertainty and act with integrity.[1] This includes adherence to mandatory legal requirements and voluntary organizational values and the boundaries each organization establishes. The legal department, with responsibility for understanding matter management, issue identification, investigations, policy management, reporting and filing, legal risk, and the regulatory obligations faced by the organization, is a critical player in GRC (what is understood as Enterprise or Integrated GRC), as well as improving GRC within the legal function itself.

A successful legal management information architecture will be able to connect information across risk management and business systems. This requires a robust and adaptable legal information architecture that can model the complexity of legal information, discovery, transactions, interactions, relationship, cause and effect, and the analysis of information, which can integrate and manage a range of business systems and external data. Key to this information architecture is a clear data inventory and map of information that informs the organization of what data it has, who in the organization owns it, what regulatory retention obligations are attached to it, and what third parties have access to it. This is a fundamental requirement for applying process and effectively operationalizing an organization’s GRC activities, as detailed in the previous blog.

There can and should be an integrated technology architecture that extends GRC technology and operationalizes it in a legal and privacy context. This connects the fabric of the legal processes, information, discovery, and other technologies together across the organization. This is a hub of operationalizing GRC and requires that it be able to integrate and connect with a variety of other business systems, such as specialized legal discovery solutions and integrate with broader enterprise GRC technology.

The right technology architecture choice for an organization involves the integration of several components into a core enterprise GRC and Legal GRC architecture – which can facilitate the integration and correlation of legal information, discovery, analytics, and reporting. Organizations suffer when they take a myopic view of GRC technology that fails to connect all the dots and provide context to discovery, business analytics, objectives, and strategy in the real-time that a business operates in. 

Extending and operationalizing GRC processes and technology in context of legal and privacy enables the organization to use its resources wisely to prevent undesirable outcomes and maximize advantages while striving to achieve its objectives. A key focus is to provide legal assurance that processes are designed to mitigate the most significant legal issues and are operating as designed. Effective management of legal risk and exposure is critical to the board and executive management, who need a reliable way to provide assurance to stakeholders that the enterprise plans to both preserve and create value. Mature GRC enables the organization to weigh multiple inputs from both internal and external contexts and use a variety of methods to analyze legal risk and provide analytics and modeling.


[1] This is the OCEG definition of GRC.

Leave a comment

Filed under Best Practices, CaCPA, eDiscovery & Compliance, GDPR, Information Governance, Information Management, Uncategorized

Operationalizing GRC in Context of Legal & Privacy: The Last Mile of GRC

By Michael Rasmussen

Editor’s note: Today we are featuring a guest blog post from Michael Rasmussen, the GRC Pundit & Analyst at GRC 20/20 Research, LLC.

At its core, GRC is the capability to reliably achieve objectives [GOVERNANCE], address uncertainty [RISK MANAGEMENT], and act with integrity [COMPLIANCE]. GRC is something organizations do, not something they purchase. They govern, they manage risk, and they comply with obligations. However, there is technology to enable GRC related processes, such as legal and privacy, to be more efficient, effective, and agile.

However, too often the focus on GRC technology is limited to the process management of forms, workflow, tasks, and reporting. These are critical and important elements, but the role of technology for GRC is so much broader to operationalize GRC activities that are labor intensive, particularly in the context of legal and privacy. Simply managing forms, workflow, and tasks are no longer enough. Organizations need to start thinking how they can integrate eDiscovery and data/information governance solutions within their core GRC architecture.

What is needed is the ability to search, find, monitor, interact, and control data throughout the business environment. GRC platforms are excellent at managing forms, workflow, tasks, analytics, and reporting. But behind the scenes there are still labor-intensive tasks or disconnected solutions that actually find, control, and assess the disposition of sensitive data in the enterprise. eDiscovery and information governance solutions have been disconnected and not strategically leveraged for GRC purposes. Together, the core GRC platform that integrates with eDiscovery and information governance technologies builds exponential economies in efficiency, effectiveness, and agility.

Specifically, an integrated GRC solution that weds the core GRC platform with eDiscovery and information governance technology delivers full value to an organization that:

  • Discovers the attributes and metadata of data no matter where it lives within the environment as a key component of GRC processes for legal and privacy compliance.
  • Enables 360° awareness to assessments by discovering the information needed to conduct and deliver assessments effectively into the core GRC platform.
  • Delivers a centralized console to interact with data/information and metadata of files on devices across the organization (such as network file shares, OneDrive, and Dropbox data).
  • Automates the ability to interact with downstream endpoints/systems to provide the ability to search the content of records for keywords and perform analysis using regular expressions and classifiers.
  • Controls data wherever it is with the ability to get to the data and analyze it from a centralized console.

An integrated approach that brings together the core GRC platform with eDiscovery and information governance technology enables the organization to discover, manage, monitor, and control data right from the central GRC platform console. It enables the organization to get centralized and accessible insight into where sensitive information is, how it is being used, and what can be done with it.

  • For example. Within the GRC platform I can initiate a search based on key words or patterns (e.g., social security number). The eDiscovery/information governance solution then finds where that information is throughout the enterprise and delivers a list of records back to the GRC platform for analysis and monitoring.

This enables an integrated GRC architecture that brings 360° contextual awareness into information across the enterprise. It delivers enhanced efficiency in time saved and money saved chasing information through disconnected solutions and processes, it provides greater effectiveness through insight and control of information and enables greater agility across a dynamic environment to be responsive to issues of information governance. Together, a GRC platform with eDiscovery/information governance capabilities enables and delivers more complete and accurate data governance and privacy assessments, integrated findings, with the ability to manage remediation tasks from one central place.

Leave a comment

Filed under Best Practices, CaCPA, Data Audit, eDiscovery & Compliance, GDPR, Information Governance, Information Management

CCPA and GDPR UPDATE: Unstructured Enterprise Data in Scope of Compliance Requirements

An earlier version of this article appeared on Legaltech News

By John Patzakis

A core requirement of both the GDPR and the similar California Consumer Privacy Act (CCPA), which becomes enforceable on July 1, is the ability to demonstrate and prove that personal data is being protected. This requires information governance capabilities that allow companies to efficiently identify and remediate personal data of EU and California residents. For instance, the UK Information Commissioner’s Office (ICO) provides that “The GDPR places a high expectation on you to provide information in response to a SAR (Subject Access Request). Whilst it may be challenging, you should make extensive efforts to find and retrieve the requested information.”CCPA GDPR

However, recent Gartner research notes that approximately 80% of information stored by companies is “dark data” that is in the form of unstructured, distributed data that can pose significant legal and operational risks. With much of the global workforce now working remotely, this is of special concern and nearly all the company data maintained and utilized by remote employees is in the form of unstructured data. Unstructured enterprise data generally refers to searchable data such as emails, spreadsheets and documents on laptops, file servers, and social media.

The GDPR

An organization’s GDPR compliance efforts need to address any personal data contained within unstructured electronic data throughout the enterprise, as well as the structured data found in CRM, ERP and various centralized records management systems. Personal data is defined in the GDPR as: “any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”

Under the GDPR, there is no distinction between structured versus unstructured electronic data in terms of the regulation’s scope. There is a separate guidance regarding “structured” paper records (more on that below). The key consideration is whether a data controller or processor has control over personal data, regardless of where it is located in the organization. Nonetheless, there is some confusion about the scope of the GDPR’s coverage across structured as well as unstructured electronic data systems.

The UK ICO is a key government regulator that interprets and enforces the GDPR, and has recently issued important draft guidance on the scope of GDPR data subject access rights, including as it relates to unstructured electronic information. Notably, the ICO notes that large data sets, including data analytics outputs and unstructured data volumes, “could make it more difficult for you to meet your obligations under the right of access. However, these are not classed as exemptions, and are not excuses for you to disregard those obligations.”

Additionally the ICO guidance advises that “emails stored on your computer are a form of electronic record to which the general principles (under the GDPR) apply.” In fact, the ICO notes that home computers and personal email accounts of employees are subject to GDPR if they contain personal data originating from the employers networks or processing activities. This is especially notable under the new normal of social distancing, where much of a company’s data (and associated personal information) is being stored on remote employee laptops.

The ICO also provides guidance on several related subjects that shed light on its stance regarding unstructured data:

Archived Data: According to the ICO, data stored in electronic archives is generally subject to the GDPR, noting that there is no “technology exemption” from the right of access. Enterprises “should have procedures in place to find and retrieve personal data that has been electronically archived or backed up.” Further, enterprises “should use the same effort to find information to respond to a SAR as you would to find archived or backed-up data for your own purposes.”

Deleted Data: The ICO’s view on deleted data is that it is generally within the scope of GDPR compliance, provided that there is no intent to, or a systematic ability to readily recover that data. The ICO says it “will not seek to take enforcement action against an organisation that has failed to use extreme measures to recreate previously ‘deleted’ personal data held in electronic form. We do not require organisations to use time and effort reconstituting information that they have deleted as part of their general records management.”

However, under this guidance organizations that invest in and deploy re-purposed computer forensic tools that feature automated un-delete capabilities may be held to a higher standard. Deploying such systems can reflect intent to as well as having the systematic technical ability to recover deleted data.

Paper Records: Paper records that are part of a “structured filing system” are subject to the GDPR. Specifically, if an enterprise holds “information about the requester in non-electronic form (e.g. in paper files or on microfiche records)” then such hard-copy records are considered personal data accessible via the right of access,” if such records are “held in a ‘filing system.” This segment of the guidance reflects that references to “unstructured data” in European parlance usually pertains to paper records. The ICO notes in separate guidance that “the manual processing of unstructured personal data, such as unfiled handwritten notes on paper” are outside the scope of GDPR.

GDPR Article 4 defines a “filing system” as meaning “any structured set of personal data which are accessible according to specific criteria, whether centralized, decentralized or dispersed on a functional or geographical basis.” The only form of “unstructured data” that would not be subject to GDPR would be unfiled paper records like handwritten notes or legacy microfiche.

The CCPA  

The California Attorney General (AG) released a second and presumably final round of draft regulations under the California Consumer Privacy Act (CCPA) that reflect how unstructured electronic data will be treated under the Act. The proposed rules outline how the California AG is interpreting and will be enforcing the CCPA. Under § 999.313(d)(2), data from archived or backup systems are—unlike the GDPR—exempt from the CCPA’s scope, unless those archives are restored and become active. Additional guidance from the Attorney General states: “Allowing businesses to delete the consumer’s personal information on archived or backup systems at the time that they are accessed or used balances the interests of consumers with the potentially burdensome costs of deleting information from backup systems that may never be utilized.”

What is very notable is that the only technical exception to the CCPA is unrestored archived and back-up data. Like the GDPR, there is no distinction between unstructured and structured electronic data. In the first round of public comments, an insurance industry lobbying group argued that unstructured data be exempted from the CCPA. As reflected by revised guidance, that suggestion was rejected by the California AG.

For the GDPR, the UK ICO correctly advises that enterprises “should ensure that your information management systems are well-designed and maintained, so you can efficiently locate and extract information requested by the data subjects whose personal data you process and redact third party data where it is deemed necessary.” This is why Forrester Research notes that “Data Discovery and Classification are the foundation for GDPR compliance.”

Establish and Enforce Data Privacy Policies

So to achieve GDPR and CCPA compliance, organizations must first ensure that explicit policies and procedures are in place for handling personal information. Once established, it is important to demonstrate to regulators that such policies and procedures are being followed and operationally enforced. A key first step is to establish a data map of where and how personal data is stored in the enterprise. This exercise is actually required under the GDPR Article 30 documentation provisions.

An operational data audit and discovery capability across unstructured data sources allows enterprises to efficiently map, identify, and remediate personal information in order to respond to regulators and data subject access requests from EU and California citizens. This capability must be able to search and report across several thousand endpoints and other unstructured data sources, and return results within minutes instead of weeks or months as is the case with traditional crawling tools. This includes laptops of employees working from home.

These processes and capabilities are not only required for data privacy compliance but are also needed for broader information governance and security requirements, anti-fraud compliance, and e-discovery.

Implementing these measures proactively, with routine and consistent enforcement using solutions such as X1 Distributed GRC, will go a long way to mitigate risk, respond efficiently to data subject access requests, and improve overall operational effectiveness through such overall information governance improvements.

Leave a comment

Filed under CaCPA, compliance, Corporations, Cyber security, Cybersecurity, Data Audit, GDPR, Information Governance, Information Management, Uncategorized