eDiscovery Collection of Large File Shares: An Unaddressed Major Pain Point

By John Patzakis

One of the major unaddressed challenges for eDiscovery and other digital investigations involves very large file servers that host shared documents. The data volumes for these file shares is typically 10 to 20 Terabytes but can be much higher. Nearly every company and government agency maintain such large file shares, sometimes hundreds of them, depending on the size of the organization. The main purpose of a file share server is to enable multiple users to access the stored files and storage space on the file repository. These servers operate as the ubiquitous central storage place of internal company files for both collaboration and data archiving purposes. As such, they are heavily used and invariably contain numerous documents with highly relevant or otherwise important information.

Traditional eDiscovery collection methods fail to efficiently address these large file shares, due to significant logistical challenges. The data cannot simply be searched in place by traditional forensics tools or other crawling methods. Consequently, the data is typically copied in bulk and then migrated to another location for processing, where the data is finally indexed and then searched and culled. There are many problems with this approach.

First, it is very time-consuming and expensive. The process involves the over-collection of a massive amount of data, and it typically takes weeks for the copying and transfer of many terabytes of data to occur. Additionally, file shares are where companies’ most sensitive data typically resides. These repositories are often rife with trade secrets, intellectual property, and sensitive personal information. There is substantial risk in having such data copied in bulk and then shipped out of the company’s possession to a third party for eDiscovery processing.

A solution to these challenges is the utilization of index and search in-place technology. Indexing and search in-place in this context means that a software-based indexing technology (as opposed to an expensive and cumbersome stand-alone hardware appliance) is deployed directly onto the file server or an adjacent computing resource. This indexing occurs without a bulk data transfer of the data. Once indexed, the searches are performed in a few seconds, with complex Boolean operators, metadata filters and regular expression searches. The searches can be iterated and repeated without limitation, which is critical for large data sets.

Recently X1 released unique and unprecedented support for large file shares to address this exact eDiscovery workflow. X1 can be deployed directly onto a large file share in question, or to a virtual machine in near proximity to the target file servers or multiple file servers. Searches can be directed to a lone file server, or federated across multiple file servers and other endpoints, including those in different geographic locations across the enterprise. This functionality can be deployed remotely, on demand without physical access being required. This is essential for geographically diverse organizations including sensitive matters overseas. Once a targeted and responsive data set is identified through this in-place search and analysis process, the data can be exported directly to Relativity or a load file generated for upload to another review platform.

As mentioned, the searching can be full-text (including regular expression) or metadata only. In a recent matter involving over 100 Terabytes of data, X1 first generated a metadata and hash value only index, which allows for immediate de-duplication, file type filtering, and culling by date range and other parameters. This facilitated the culling of the data set by 70 percent as a first step, which then allowed for the full indexing of the data subset. This capability supports both eDiscovery and data governance and privacy workflows.

X1 large file share indexing service can be deployed on premise, or in the cloud. It can also address large volumes of cloud based data on service such as Dropbox and OneDrive.  This support of large file shares is an extension of the X1 Distributed Discovery Platform.

For more information about this unique capability, please contact us for a demonstration.

Leave a comment

Filed under Uncategorized

Operationalizing GRC in Context of Legal & Privacy: The Last Mile of GRC

By Michael Rasmussen

Editor’s note: Today we are featuring a guest blog post from Michael Rasmussen, the GRC Pundit & Analyst at GRC 20/20 Research, LLC.

At its core, GRC is the capability to reliably achieve objectives [GOVERNANCE], address uncertainty [RISK MANAGEMENT], and act with integrity [COMPLIANCE]. GRC is something organizations do, not something they purchase. They govern, they manage risk, and they comply with obligations. However, there is technology to enable GRC related processes, such as legal and privacy, to be more efficient, effective, and agile.

However, too often the focus on GRC technology is limited to the process management of forms, workflow, tasks, and reporting. These are critical and important elements, but the role of technology for GRC is so much broader to operationalize GRC activities that are labor intensive, particularly in the context of legal and privacy. Simply managing forms, workflow, and tasks are no longer enough. Organizations need to start thinking how they can integrate eDiscovery and data/information governance solutions within their core GRC architecture.

What is needed is the ability to search, find, monitor, interact, and control data throughout the business environment. GRC platforms are excellent at managing forms, workflow, tasks, analytics, and reporting. But behind the scenes there are still labor-intensive tasks or disconnected solutions that actually find, control, and assess the disposition of sensitive data in the enterprise. eDiscovery and information governance solutions have been disconnected and not strategically leveraged for GRC purposes. Together, the core GRC platform that integrates with eDiscovery and information governance technologies builds exponential economies in efficiency, effectiveness, and agility.

Specifically, an integrated GRC solution that weds the core GRC platform with eDiscovery and information governance technology delivers full value to an organization that:

  • Discovers the attributes and metadata of data no matter where it lives within the environment as a key component of GRC processes for legal and privacy compliance.
  • Enables 360° awareness to assessments by discovering the information needed to conduct and deliver assessments effectively into the core GRC platform.
  • Delivers a centralized console to interact with data/information and metadata of files on devices across the organization (such as network file shares, OneDrive, and Dropbox data).
  • Automates the ability to interact with downstream endpoints/systems to provide the ability to search the content of records for keywords and perform analysis using regular expressions and classifiers.
  • Controls data wherever it is with the ability to get to the data and analyze it from a centralized console.

An integrated approach that brings together the core GRC platform with eDiscovery and information governance technology enables the organization to discover, manage, monitor, and control data right from the central GRC platform console. It enables the organization to get centralized and accessible insight into where sensitive information is, how it is being used, and what can be done with it.

  • For example. Within the GRC platform I can initiate a search based on key words or patterns (e.g., social security number). The eDiscovery/information governance solution then finds where that information is throughout the enterprise and delivers a list of records back to the GRC platform for analysis and monitoring.

This enables an integrated GRC architecture that brings 360° contextual awareness into information across the enterprise. It delivers enhanced efficiency in time saved and money saved chasing information through disconnected solutions and processes, it provides greater effectiveness through insight and control of information and enables greater agility across a dynamic environment to be responsive to issues of information governance. Together, a GRC platform with eDiscovery/information governance capabilities enables and delivers more complete and accurate data governance and privacy assessments, integrated findings, with the ability to manage remediation tasks from one central place.

Leave a comment

Filed under Best Practices, CaCPA, Data Audit, eDiscovery & Compliance, GDPR, Information Governance, Information Management

Traditional eDiscovery Processing is Now Obsolete

By John Patzakis

eDiscovery can be a very expensive process and time consuming when traditional methods are employed. With legacy processes, from the time ESI collection starts, it often takes weeks for the data to finally end up in review. Time is money, and this dramatically increases costs as well as risk.

ESI processing is a dedicated and often expensive step in the EDRM workflow. The majority of ESI processing consists of data culling and filtering, deduplication, text extraction, metadata preservation, and then staging the data for upload into a review platform, often in the form of a load (DAT) file.  Using ESI processing methods that involve on-premise hardware appliances that are not integrated with the collection process and do not integrate with review platforms like Relativity significantly increase cost and time delays. This means practitioners have to spend the often several weeks that are required by other cumbersome solutions through manual collections and multiple hand-offs.

However, the latest in collection technologies will now combine targeted collection with these processing steps that are performed “on the fly” and in the background so that the data is automatically collected, processed and uploaded into a review platform such as Relativity in one fell swoop.

The graphic below is an illustration contrasting the challenges associated with traditional eDiscovery processes, with the far more efficient new paradigm. When you engage in manual collection, and then manual on-premise hardware-based processing, and finally manual upload to review, you are extending the process by often weeks, you are dramatically increasing cost and risk with many manual data handoffs.

Providing a contrast to traditional methods, a recent Relativity webinar featured the integration of the X1 Distributed Discovery platform with its RelativityOne Collect solution. A live demonstration performed by Relativity Product Manager Greg Evans highlighted in real time how the integration dramatically improves the enterprise eDiscovery process by enabling a targeted and efficient search and collection process, with full and integrated ESI processing. Within minutes, data collected from endpoints with X1 is populated straight into a Relativity workspace, fully processed and ready for review, without any human interaction once the collection is started.

So in terms of the big picture, this X1/Relativity integration not only streamlines enterprise ESI collection, but it relegates ESI processing to a completely automated background function as an afterthought. That’s what disruption looks like.

A recording of the X1/Relativity integration webinar can be accessed here.

Leave a comment

Filed under Best Practices, collection, eDiscovery, Enterprise eDiscovery, ESI, Uncategorized

Relativity Highlights Its X1 Integration for ESI Collection

By John Patzakis

Recently, Relativity hosted a live webinar featuring the integration of the X1 Distributed Discovery platform with its RelativityOne Collect solution. This X1/Relativity integration enables game-changing efficiencies in the eDiscovery process by accelerating speed to review, and providing an end-to-end process from identification through production. As stated by Relativity Chief Product Officer Chris Brown: “Our exciting new partnership with X1 highlights our continued commitment to providing a streamlined user experience from collection to production…RelativityOne users will be able to combine X1’s innovative endpoint technology with the performance of our SaaS platform, eliminating the cumbersome process of manual data hand-offs and allowing them to get to the pertinent data in their case – faster.”

The webinar featured a live demonstration showing X1 quickly collecting data across multiple custodians and seamlessly importing that data into RelativityOne in minutes. Relativity Collect currently supports Office 365 and Slack sources, and Relativity Product Manager Greg Evans noted that “this X1 integration will now enable Relativity Collect to also reach emails and files on laptops, servers,” and other network sources. The webinar outlined how the Relativity/X1 integration streamlines eDiscovery processes by collapsing the many hand-offs built into current EDRM workflows to provide greater speed and defensibility. Evans also said that new normal of web-enabled collections of remote custodians and data sources was a major driver for the Relativity/X1 alliance, as “remote collections now represent 90 percent of all eDiscovery collections happening right now.”

Adam Rogers, of Complete Discovery Source, a customer of both X1 and RelativityOne, highlighted a recent major multi-national litigation where the X1 and Relativity integration was critical to the success of the project. Adam noted that the effort would have taken about 30 days utilizing traditional methods, “but with this X1 and Relativity integration, we cut it down to 3 days, because with X1, we were able to index everything in-place, search, analyze and categorize that data right away, and then release that data to Relativity for review.”

The live demonstration performed by Greg Evans highlighted in real time how the integration improves the enterprise eDiscovery collection and ECA process by enabling a targeted and efficient search and collection process, with immediate pre-collection visibility into custodial data. X1 Distributed Discovery enhances the eDiscovery workflow with integrated culling and deduplication, thereby eliminating the need for expensive and cumbersome electronically stored information (ESI) processing tools. That way, the ESI can be populated straight into Relativity from an X1 collection.

The X1 and Relativity integration addresses several pain points in the existing eDiscovery process. For one, there is currently an inability to quickly and remotely search across and access distributed unstructured data in-place, meaning eDiscovery teams have to spend weeks or even months to collect data as required by other cumbersome solutions. Additionally, using ESI processing methods that involve appliances that are not integrated with the collection will significantly increase cost and time delays.

So in terms of the big picture, with this integration providing a complete platform for efficient data search, eDiscovery and review across the enterprise, organizations will save a lot of time, save a lot of money, and be able to make faster and better decisions. When you accelerate the speed to review and eliminate over-collection, you are going to have much better early insight into your data and increase efficiencies on many levels.

A recording of the X1/Relativity integration webinar can be accessed here.

Leave a comment

Filed under Best Practices, collection, ECA, eDiscovery, Enterprise eDiscovery, ESI

A New Framework for Defining and Approaching Information Governance

By Michael Rasmussen

Editor’s note: Today we are featuring a guest blog post from Michael Rasmussen, the GRC Pundit & Analyst at GRC 20/20 Research, LLC.

Information governance has become a critical objective for organizations. In the context of the pervasive use of information throughout the enterprise, operational reliance on information, and increased regulation and liability of information, organizations are building structured approaches to information governance. This is to ensure the proper collection, use, and control of sensitive information – intellectual property, proprietary information, regulated data, personal information – across the organizations. Privacy regulations such as the California Consumer Protection Act (CCPA) and the EU Global Data Protection Regulation (GDPR) are making information governance even a greater priority.

Over the years we have seen a lot of definitions for ‘Information Governance.’ From the straightforward, like the Information Governance Initiatives:

  • “Information governance is the activities and technologies that organizations employ to maximize the value of their information while minimizing associated risks and costs.”

To the more complex, like Gartner’s:

  • “The specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival and deletion of information. It includes the processes, roles, standards, and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals.”

However, both of these definitions do not quite deliver a clear understanding to the business on what information governance is. One is too light, the other too complex.

I am proposing a new definition for information governance which is a modification of the official definition of GRC (governance, risk management, and compliance) by the Open Compliance and Ethics Group (OCEG) . . .

  • Information governance is a capability to reliably achieve the objectives, while addressing uncertainty, and act with integrity in the collection, creation, use, storage, and disposition of information throughout the organization and its extended business relationships.

Information governance is essentially what we could call Information GRC. It starts with governance being the capability to reliably achieve objectives of information. After all, information is collected and stored for a purpose. In this context, the organization needs to manage the uncertainty to this information (risk and exposure) throughout its lifecycle. Finally, the organization needs to act with integrity to ensure the information is used for it authorized and intended purposes and not misused. However, the modern organization is not about brick and mortar wall but involves an extended array of third-party relationships that interact with that information as well and information governance extends across traditional business boundaries and into these third-party relationships as well.

What needs to change is more than a definition, but also the framework and process of information governance. Reactive, manual, and ad hoc approaches to information governances result in the inevitability of failure and exposure of information. Organizations need a cohesive information governance strategy, process, and supporting technology architecture to govern and manage the lifecycle of information.

Technology plays a critical role in enabling information governance in this vision. The right technology should make the organization more:

  • Efficient in the human and financial capital resources to monitor and manage information.
  • Effective in the proper cataloging, monitoring, control, disposition, and meeting legal and regulatory requirements of information.
  • Agile in the ability to keep up with information governance in the context of business, regulatory, legal, and risk changes.
  • Visible where access and understanding of information and data is and how it is used.
  • Consistent where the information source is understood and those that can access, manipulate it, and use to ensure its integrity.
  • Available where the information is accessible to those that are authorized to use it when they need it.

The foundational step to information governance is discovery. Organizations need to know where their data is and from there, they can control it and take action on it. A critical element needed is the ability to access the data and analyze the data in-place wherever it resides so the organization can then take action on it. This allows the organization to act on any given use-case to the information (e.g., internal policy, data audit and regulatory adherence). To be able to access, analyze and act on data in-place provides immediate insight into critical information empowering faster decisions and resolutions. It also empowers information governance teams to respond to eDiscovery collections as well as data audit and compliance initiatives quickly and effectively.


Leave a comment

Filed under Uncategorized