Tag Archives: enterprise

Cloud Search: Not As Simple As You Think

By Barry Murphy

Corporations and Government agencies are moving data to the Cloud in droves.  No matter which analyst firm you look to on Cloud storage adoption, you will find consistent results:

  • Forrester Research reports that 40% of enterprises surveyed indicated they have already rolled out workloads on public clouds or have near-term plans to do so and that the number will increase to 50% this year.
  • IDC predicts that from 2013–2017 public IT cloud services will have a compound annual growth rate (CAGR) of 23.5%, five times that of the IT industry as a whole.
  • Gartner says Cloud Computing Will Become the Bulk of New IT Spend by 2016 and that spending on public Cloud services will have a CAGR of 17.7% from 2011 – 2016, with spending on Infrastructure-as-a-Service (IaaS) itself will have a CAGR of 41.3% in that time period.
  • In eDJ Group’s recent Cloud services adoption fast poll, Greg Buckles found that less than 5% of respondents reported that all information is kept on-premise on company infrastructure and cloud services are not being actively considered.

Cloud-icon_magnifying-glassNo matter where data is being stored, though, the fact remains that the ability to search that data will be critically important.  Workers still demand unified access to email, files, and SharePoint information, and they want fast-as-you-type search results regardless of where the data lives.  In addition, Legal teams require that search queries and collections execute within specific time-frames.  But, Cloud search is slow, as indexes live far from the information.  This results in frustrated workers and Legal teams afraid that eDiscovery cannot be completed in time.

Lest you think this is not a big deal, consider the following story.  When I was at eDJ, we worked with a very large enterprise client that wanted to move its collaboration system to the Cloud.  The problem was that the Cloud system the client was contracting with could not meet the Legal Department’s requirements for speed of query results and collection.  This significantly slowed down the movement to the Cloud until the client had worked with the Cloud vendor to ensure that search and collection could execute at the necessary speeds.  The delay frustrated an IT team anxious to reap the promised benefits of the Cloud and cost the project team significant man-hours.

This story highlights the need to granularly define search and eDiscovery requirements before moving data to the Cloud.  Most “cloud search” solutions pass queries through connectors, and then the Cloud vendor needs to figure out where in its vast data center the index lives, find the content, return the query result, and then the customer will need to download all the content.  The result is a slow search and another copy of the data downloaded on premise, which basically defeats the purpose of moving to the Cloud in the first place.

If a customer wanted to speed up search, it would have to essentially attach an appliance to a hot-air balloon and send it up to the Cloud provider so that the customer’s index could live on that appliance (or farm of appliances) in the Cloud providers data center, physically near the data.  There are many reasons, however, that a Cloud provider would not allow a customer to do that:

  • Long install process
  • Challenging pre-requisites
  • 3rd party installation concerns
  • Physical access
  • Specific hardware requirements
  • They only scale vertically

The solution to a faster search is a cloud-deployable search application, such as X1 Rapid Discovery.  This creates a win-win for Cloud providers and customers alike.  As enterprises move more and more information to the Cloud, it will be important to think about workers’ experiences with Cloud systems – and search is one of those user experiences that, if it is a bad one, can really negatively affect a project and cause user revolt.

 

Leave a comment

Filed under Cloud Data, Enterprise eDiscovery, Enterprise Search, Information Access, Virtualized Environment

“Act Reasonably” — Two Court-Issued Checklists Outlining Defensible, Targeted ESI Collection

Recently two separate and prominent courts — the federal court for the Northern District of California and the Delaware Court of Chancery (which is the primary court of equity for Delaware registered corporations) issued eDiscovery preservation guidelines. This is not unprecedented as other courts have issued similar written guidance in the form of general guidance or even more enforceable local rules of court specifically addressing eDiscovery protocols. What I found particularly interesting, however, is both courts provided fairly specific guidance on the scope of collection and preservation. In the case of the California court, which notes that its “guidelines are designed to establish best practices for evidence preservation in the digital age,” the Court offers a checklist for Rule 26(f) “meet and confer” conferences with good detail on suggested ESI preservation protocols. The Delaware Court of Chancery also issued a detailed checklist or “sample collection outline.” ESI preservation checklists are useful practice guides, and these are sanctioned by two separate influential courts.

This is important as the largest expense directly associated with eDiscovery is the cost of overly inclusive preservation and collection, which leads to increased volume charges and attorney review costs. To the surprise of many, properly targeted preservation initiatives are permitted by the courts and can be enabled by adroit software that is able to quickly and effectively access and search these data sources throughout the enterprise.

The value of targeted preservation is recognized in the Committee Notes to the FRCP amendments, which urge the parties to reach agreement on the preservation of data and the keywords used to identify responsive materials. (Citing the Manual for Complex Litigation (MCL) (4th) §40.25 (2)).  And In re Genetically Modified Rice Litigation, 2007 WL 1655757 (June 5, 2007 E.D.Mo.), the court noted that “[p]reservation efforts can become unduly burdensome and unreasonably costly unless those efforts are targeted to those documents reasonably likely to be relevant or lead to the discovery of relevant evidence.”

The checklist from the California Northern District and the guidelines issued by the Delaware court are consistent with these principles as they call for the specification of date ranges, custodian names and search terms for any ESI to be preserved. The Northern District checklist, for instance, provides for the identification of specific custodians and job titles of custodians whose ESI is to  be preserved, and also specific search phrases search terms “that will be used to identify discoverable ESI and filter out ESI that is not subject to discovery.”

However, many lawyers shy away from a targeted collection strategy over misplaced defensibility concerns, optioning instead for full disk imaging and other broad collection efforts that exponentially escalate litigation costs. The fear by some is that there always may be that one document that could be missed. However, in my experience of following eDiscovery case law over the past decade, the situations where litigants face exposure on the preservation front typically involve an absence of a defensible process. When courts sanction parties, it is usually because there is not a reasonable legal hold procedure in place, where the process is ad hoc and made up on the fly and/or not effectively executed. I am personally unaware of a published decision involving a fact pattern where a company featured a reasonable collection and preservation process involving targeted collection executed pursuant to standard operating procedures, yet was sanctioned because one or two relevant documents slipped through the cracks.

This is because the duty to preserve requires reasonable efforts, not infallible means, to collect potentially relevant information. As succinctly stated by the Delaware court: “Parties are not required to preserve every shred of information. Act reasonably.”

Another barrier standing in the way of defensible and targeted collection is that searching and performing early case assessment at the point of collection is not feasible in the decentralized global enterprise with traditional eDiscovery and information management tools. What is needed to address these challenges for the de-centralized enterprise is a field-deployable search and eDiscovery solution that operates in distributed and virtualized environments on-demand within these distributed global locations where the data resides. In order to meet such a challenge, the eDiscovery and search solution must immediately and rapidly install, execute and efficiently operate locally, including in a virtual environment, where the site data is located, without rigid hardware requirements or on-site physical access.

This ground breaking capability is what X1 Rapid Discovery provides. Its ability to uniquely deploy and operate in the IaaS cloud also means that the solution can install anywhere within the wide-area network, remotely and on-demand. Importantly, the search index is created virtually in the location proximity of the data subject to collection. This enables even globally decentralized enterprises to perform targeted search and collection efforts in an efficient, defensible and highly cost effective manner. Or, in the words of the Delaware court — the ability to act reasonably.

Leave a comment

Filed under Case Law, Cloud Data, Enterprise eDiscovery, IaaS

The Global De-Centralized Enterprise: An Un-Met eDiscovery Challenge

Enterprises with data situated within a multitude of segmented networks across North America and the rest of the world face unique challenges for eDiscovery and compliance-related investigation requirements. In particular, the wide area networks of large project engineering, oil & gas, and systems integration firms typically contain terabytes of geographically disparate information assets in often harsh operating environments with very limited network bandwidth. Information management and eDiscovery tools that require data centralization or run on expensive and inflexible hardware appliances cannot, by their very nature, address critical project information in places like Saudi Arabia, China, or the Alaskan North Slope.

Despite vendor marketing hype, network bandwidth constraints coupled with the requirement to migrate data to a single repository render traditional information management and eDiscovery tools ineffective to address de-centralized global enterprise data. As such, the global decentralized enterprise represents a major gap for in-house eDiscovery processes, resulting in significant expense and inefficiencies. The case of U.S. ex rel. McBride v. Halliburton Co. [1]  illustrates this pain point well. In McBride, Magistrate Judge John Facciola’s instructive opinion outlines Halliburton’s eDiscovery struggles to collect and process data from remote locations:

Since the defendants employ persons overseas, this data collection may have to be shipped to the United States, or sent by network connections with finite capacity, which may require several days just to copy and transmit the data from a single custodian . . . (Halliburton) estimates that each custodian averages 15–20 gigabytes of data, and collection can take two to ten days per custodian. The data must then be processed to be rendered searchable by the review tool being used, a process that can overwhelm the computer’s capacity and require that the data be processed by batch, as opposed to all at once. [2]

Halliburton represented to the court that they spent hundreds of thousands of dollars on eDiscovery for only a few dozen remotely located custodians. The need to force-collect the remote custodians’ entire set of data and then sort it out through the expensive eDiscovery processing phase instead of culling, filtering and searching the data at the point of collection drove up the costs.

Despite the burdens associated with the electronic discovery of distributed data across the four corners of the earth, such data is considered accessible under the Federal Rules of Civil Procedure and thus must be preserved and collected if relevant to a legal matter. However, the good news is that the preservation and collection efforts can and should be targeted to only potentially relevant information limited to only custodians and sources with a demonstrated potential connection to the litigation matter in question.

This is important as the biggest expense associated with eDiscovery is the cost of overly inclusive preservation and collection. Properly targeted preservation initiatives are permitted by the courts and can be enabled by adroit software that is able to quickly and effectively access and search these data sources throughout the enterprise. The value of targeted preservation is recognized in the Committee Notes to the FRCP amendments, which urge the parties to reach agreement on the preservation of data and the key words, date ranges and other metadata to identify responsive materials. [3]  And In re Genetically Modified Rice Litigation, the court noted that “[p]reservation efforts can become unduly burdensome and unreasonably costly unless those efforts are targeted to those documents reasonably likely to be relevant or lead to the discovery of relevant evidence.” [4]

However, such targeted collection and ECA in place is not feasible in the decentralized global enterprise with current eDiscovery and information management tools. What is needed to address these challenges for the de-centralized enterprise is a field-deployable search and eDiscovery solution that operates in distributed and virtualized environments on-demand within these distributed global locations where the data resides. In order to meet such a challenge, the eDiscovery and search solution must immediately and rapidly install, execute and efficiently operate in a localized virtualized environment, including public or private cloud deployments, where the site data is located, without rigid hardware requirements or on-site physical access.

This is impossible if the solution is fused to hardware appliances or otherwise requires a complex on-site installation process. After installation, the solution must be able to index the documents and other data locally and serve up those documents for remote but secure access, search and review through a web browser. As the “heavy lifting” (indexing, search, and document filtering) is all performed locally, this solution can effectively operate in some of the harshest local environments with limited network bandwidth. The data is not only collected and culled within the local area network, but is also served up for full early case assessment and first pass review on site, so that only a much smaller data set of potentially relevant data is ultimately transmitted to a central location.

This ground breaking capability is what X1 Rapid Discovery provides. Its ability to uniquely deploy and operate in the IaaS cloud also means that the solution can install anywhere within the wide-area network, remotely and on-demand. This enables globally decentralized enterprises to finally address their overseas data in an efficient, expedient defensible and highly cost effective manner.

If you have any thoughts or experiences with the unique eDiscovery challenges of the de-centralized global enterprise, feel free to email me. I welcome the collaboration.

___________________________________________

[1] 272 F.R.D. 235 (2011)

[2] Id at 240.

[3] Citing the Manual for Complex Litigation (MCL) (4th) §40.25 (2)):

[4] 2007 WL 1655757 (June 5, 2007 E.D.Mo.)

Leave a comment

Filed under eDiscovery & Compliance, Enterprise eDiscovery

Defining Truly Cloud-Capable eDiscovery Software

Last week we discussed the challenges of searching and collecting data in Infrastructure as a Service (IaaS) cloud deployments (such as the Amazon cloud or Rackspace) for eDiscovery purposes.  Today we discuss what is needed for eDiscovery and enterprise search vendors to provide a truly cloud-capable solution and provide a decoder ring of sorts to cut through the hype.  For there is a lot of hype with the cloud becoming the latest eDiscovery hot button, with vendor marketing claims far surpassing actual capabilities.

In fact, many eDiscovery and enterprise software vendors claim to support the cloud, but are simply re-branding their long-existing SaaS offerings, which really has nothing to do with supporting IaaS. Barry Murphy of the eDiscovery Journal aptly identified this marketing practice as “cloud washing.” Data hosting, especially where the vendor’s manual labor is routinely required to upload and process data, does not meet defined cloud standards. Neither does a process that primarily exports data through APIs or other means out of its resident cloud environment to slowly migrate the cloud data to the vendor tools, instead of deploying the tools (and their processing power) to the data where it resides in the cloud. In order to truly support IaaS cloud deployments, eDiscovery and enterprise search software must meet the following three core requirements:

1.         Automated installation and virtualization:  The eDiscovery and search solution must immediately and rapidly install, execute and efficiently operate in a virtualized environment without rigid hardware requirements or on-site physical access. This is impossible if the solution is fused to hardware appliances or otherwise requires a complex on-site installation process. As hardware appliance solutions by definition are not cloud deployable and with enterprise search installations often requiring many months of man hours to install and configure, whether many of these vendors will be able to support robust IaaS cloud deployments in the reasonably foreseeable future is a significant question.

2.         On-demand self-service: In its definition of cloud computing, The National Institute of Standards and Technology (NIST) identifies on- demand self-service as an essential characteristic of the cloud where a “consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.”

Many hosted eDiscovery services require shipping of data to the provider or extensive behind the scenes manual labor to load and configure the systems for data ingestion. Conversely, solutions that truly support cloud IaaS will spin up, ingest data and fully operate in an automated fashion without the need for manual on-premise labor for configuration or data import.

3.         Rapid elasticity: NIST describes this characteristic as capabilities that “scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.” This important benefit of cloud computing is accomplished by a parallelized software architecture designed to dynamically scale out over potentially several dozen virtualized servers to enable rapid ingestion, processing and analysis of data sets in that cloud environment. This capability would allow several terabytes of data to be indexed and processed within 2 to 4 hours on a highly automated basis at far less cost than non-cloud eDiscovery efforts.

However, many characteristics of leading eDiscovery solutions fundamentality prevent their ability to support this core cloud requirement. Most eDiscovery early case assessment solutions are developed and configured toward a monolithic processing schema designed to operate on a single expensive hardware apparatus. While recently spawning some bold marketing claims of high speeds and feeds, such architecture is very ill-suited to the cloud, which is powered by highly distributed processing across multitudes of servers. Additionally, many of the leading eDiscovery and enterprise search solutions are tightly integrated with third party databases and other OEM technology that cannot be easily decoupled (and also present possible licensing constraints) making such elasticity physically and even legally impossible.

So is there eDiscovery software that will truly support the IaaS cloud based upon these requirements, and address up to terabytes of data?  Stay tuned….

Leave a comment

Filed under Cloud Data, Enterprise eDiscovery, IaaS

The Future for eDiscovery: Social Media and the Cloud

Greetings and welcome to all. This is the inaugural post of Next Generation eDiscovery, a blog that will focus on legal, technical and compliance issues related to the collection, preservation and early case assessment of social media and other cloud-based data. To provide some context, the team here at X1 Discovery is experienced in developing and supporting technology for collecting electronic evidence in the enterprise to meet eDiscovery and investigation requirements. Many of us hail from Guidance Software, the developer of EnCase, which is the leading eDiscovery and investigative solution for collecting from hard drives, both standalone and within the enterprise. And now we turn our focus to current trends and the future.

And the future for eDiscovery is about social media and the cloud. In fact, it seems like just this year when social media became a compelling issue in eDiscovery and is reaching critical mass given the level of rising discourse. With over 700 million Facebook users and 200 million people with Twitter accounts, evidence from social media sites can be relevant to just about every litigation dispute and investigation matter. Social media evidence is widely discoverable and generally not subject to privacy constraints when established to be relevant to a case, particularly when that data is held by a party to litigation or even a key witness.

It seems like in recent months there has been much talk in the eDiscovery and digital investigation fields about social media, mostly outlining the scope of the problem and the need to put corporate policies and procedures in place concerning social media.  That discussion is an important first step, but it’s time for actual solutions in terms of technical, legal and investigation techniques. This blog will seek to identify and foster discussion points, educate, and even pontificate at times but also learn from our readers, customers and non-customers alike. We look forward to the dialogue.

Leave a comment

Filed under Preservation & Collection