by Barry Murphy
In an earlier post, I described the importance of having the ability to quickly search for information stored in the Cloud. The post pointed out that Cloud search is somewhat more complicated than one might think at first glance because the speed of search is affected by how close the index lives to the actual data in the Cloud infrastructure. One comment I received was that Cloud search can be fast and simple if the Cloud vendor promises a certain service level for query times and results. That can address part of the issue around search (although IaaS providers – what we are truly talking about when we say “Cloud” – are typically not interested in guaranteeing SLAs for things like search because they allow customers to provision their infrastructure set to enable fast search with products like X1 Rapid Discovery). Even if a Cloud vendor were to guarantee phenomenal search SLAs, the issue of unified enterprise search of all information still remains.
The reality is that enterprises and government agencies store information in “hybrid” environments that encompass on-premise systems within corporate data centers, virtualized systems that companies operate, and Cloud-based repositories. Research firm Gartner predicts that by 2017, half of mainstream enterprises will have a hybrid cloud. And, research from NetApp shows that organizations will be managing data across multiple cloud environments, not just a single provider, per se.
Click image to enlarge
These are exciting developments. As organizations embrace more modern infrastructures, there are many benefits to be had. What we need to remember, however, is that business professionals still need to quickly find and take action on their information assets to do their jobs. As that information gets further scattered, enterprise search will take on increased importance. Workers don’t care if their data is stored on-premise or in the Cloud as long as they can quickly find it in an easy-to-use interface.
The challenge for today’s organizations is that information now lives in multiple infrastructures – on-premise, virtual, Cloud, or most frequently, a hybrid of all of these. Current approaches to including Cloud-based data in enterprise search and eDiscovery require downloading a copy of the data to search so that it resides alongside other local content. Unfortunately, that defeats the purpose of storing the data in the Cloud in the first place.
This takes me back to my original point: Cloud search is very important. But, Cloud search cannot simply exist in a vacuum. An effective enterprise search solution will combine on-premise search capabilities that can talk to search in the Cloud – without requiring downloading the cloud-based information in order to search across all data.
After several dozen posts on social media eDiscovery, we are going to focus the next few weeks on the related issue of eDiscovery in the cloud. As we see it, despite the enormous cost benefits of the cloud, concerns about the feasibility of eDiscovery and general search across an organization’s critical cloud-resident data has to some degree prevented broader adoption.
The cloud means many things to many people, but I believe the real eDiscovery action (and pain point) is in Infrastructure as a Service (IaaS) cloud deployments (such as the Amazon cloud, Rackspace, or pure enterprise cloud providers such as Fujitsu). According to a recent PwC report, Cloud IaaS will account for 30% of IT expenditures by 2014. IaaS currently provides the means for organizations to aggressively store and virtualize their enterprise data and software, thus potentially spawning the same large data volumes and requiring the same critical search and eDiscovery requirements as traditional enterprise environments. Amazon Web Services, the leading IaaS cloud provider, reports in our discussions with them extensive customer eDiscovery requirements that are currently addressed by inefficient and manual means. So for purposes of this discussion, IaaS, which is essentially cloud for the enterprise and where there is a current significant eDiscovery challenge, is what we will focus on.
So if an organization maintains two terabytes of documents in the Amazon or Rackspace cloud, how do they quickly access, search, triage and collect that data in its existing cloud environment if a critical eDiscovery or compliance search requirement suddenly arises? This scenario is a current significant pain point for IaaS cloud. In such situations, the organization is typically resorting to one of two agonizingly inefficient processes. The first option involves shipping the provider hard drives for their IT staff to copy the data in bulk for download and having that data shipped back. Rackspace’s guidelines provide that a transfer of 2 terabytes of bulk files would cost over $10,000 in fees and require about four to six weeks. And then all the company gets is a full 2 terabyte duplicate of its data that still must be searched, processed and reviewed.
The other alternative is to slowly download the data through a secure file transfer protocol connection. However, even with a robust T2 line, it would take three to six weeks to transfer the two TBs, depending on how much dedicated bandwidth IT would be willing to dedicate to the exercise.
So what is needed is robust eDiscovery software that can truly support the IaaS cloud where the data resides without first requiring mass data export. We will discuss what that entails and the requirements of truly cloud capable eDiscovery software in our next post, so please stay tuned!
Filed under Cloud Data, IaaS