Tag Archives: collection

The Post-PC Era Will End eDiscovery Collections as we Know It

Post PC World image

Updated 11/14/2013: Amazon Webs Services announced today a “game changing” cloud-based desktop virtualization offering.

“As of next month, no employees get a new PC, we are going all virtual and B.Y.O.D.” These words, spoken by one of our customers from one of the world’s largest financial institutions, should be disconcerting to anyone in the traditional eDiscovery collection business.  With well over 1000 computer forensics and eDiscovery services businesses in the US and Canada alone, ranging from small shops to large firms with hundreds of eDiscovery professionals on staff, the industry faces substantial disruption going forward. This is because most all of these firms thrive on full disk imaging, or otherwise manual collections, from the PCs and laptops issued to corporate employees, either as a substantial source of revenue, or a foundational first step that feeds into their processing and hosting business.

However, enterprises have entered a “post-PC world,” where desktop virtualization, cloud, social media, and mobile devices are supplanting the traditional PC infrastructure and “local” data storage. In fact, desktop virtualization, which will be about a six billion dollar market in 2016 according to industry researcher the 451 Group, is an ideal infrastructure to enable B.Y.O.D. as employees can have access to a virtual PC across a broad range of devices, from traditional PCs and laptops to smartphones and tablets. However, in such a framework, all the employees’ data and applications are stored and managed centrally in a virtual environment.

In addition to enabling B.Y.O.D., a virtual desktop infrastructure (VDI) provides IT significant benefits through the ability to centrally manage user desktops, gaining efficiencies in costs and resources. VDI provides for simpler desktop provisioning, lower costs for deploying new applications, improved desktop-image management, and improved data integrity through centralized backup services. In addition to a reduction in both desktop operating costs and call support, there is also a reduction in the number and duration of downtime events.

However, finding content is difficult enough on a traditional desktop, but the issue is compounded with the virtualized variety. There are many compelling benefits to VDI, but the architecture does not facilitate or even enable traditional desktop search solutions or physical disk imaging for forensic examination. X1 Search 8 provides search capabilities across physical, virtual and cloud environments with results returned in a single pane. X1 was specifically architected to uniquely and seamlessly operate in virtual desktop environments, including popular Citrix solutions XenApp and XenDesktop.

To further explore the disruptive challenges and benefits of VDI, X1 is partnering with one of the nations’ top VDI consulting firms, Agile 360 in a November 17 webinar (register here) to outline these challenges and opportunities associated with search and information access in VDI environments. We hope you can attend to learn more about the disruptive changes in store for enterprise search and eDiscovery in the Post-PC enterprise.

1 Comment

Filed under Cloud Data, Virtualized Environment

Discovery Templates for Social Media Evidence

Book coverAs a follow-up to the highly popular Q&A last week featuring DLA attorneys Joshua Briones and Ana Tagvoryan, they both have graciously allowed us to distribute a few of their social media discovery templates found in the appendix of their book:  Social Media as Evidence: Cases, Practice Pointers and Techniques, published by the American Bar Association, available for purchase online from the ABA here.

The first template is deposition questions relating to social media evidence. The second is a sample of special interrogatories. They can be accessed at this link. Thanks again to Joshua and Ana for their insightful interview, and for providing these resources.  Their book contains many more such templates and practice tips, including sample document requests, proposed jury instructions, client litigation hold memorandums with a detailed preservation checklist, preservation demand letters, and much more.

In other social discovery news, the ABA Journal this month published an insightful piece on social media discovery, featuring attorney Ralph Losey, with a nice mention of X1 Social Discovery. In a key excerpt, the ABA Journal acknowledges that “there is a pressing need for a tool that can monitor and archive everything a law firm’s client says and does on social media.”  The article also noted that more than 41% of firms surveyed in Fulbright’s 2013 annual Litigation Trends report, acknowledged they preserved and collected such data to satisfy litigation and investigation needs, which was an increase from 32% the prior year.

Another important publication, Compliance Week, also highlighted social media discovery, where Grant Thornton emphasizes their use of X1 Social Discovery as part of the firms anti-fraud and data leakage toolset. Incidentally,  when determining whether a given eDiscovery tool is in fact a leading solution in its class, in our view it is important to look at how many consulting firms are actually utilizing the technology, as consulting firms tend to be sophisticated buyers, who actually use the tools in “the front lines.” By our count we have over 400 paid install sites of X1 Social Discovery and over half of those – 223 to be exact – are eDiscovery and other digital investigation consulting firms. We believe this is a key testament to the strength of our solution, given the use by these early adopters.

Leave a comment

Filed under Best Practices, eDiscovery & Compliance, Social Media Investigations

No Legal Duty or Business Reason to Boil the Ocean for eDiscovery Preservation

As an addendum to my previous blog post on the unique eDiscovery and search burdens associated with the de-centralized enterprise, one tactic I have seen attempted by some CIOs to address this daunting challenge is to try to constantly migrate disparate data from around the globe into a central location. Just this past week, I spoke to a CIO that was about to embark on a Quixotic endeavor to centralize hundreds of terabytes of data so that it could be available for search and eDiscovery collection when needed. The CIO strongly believed he had no other choice as traditional information management and electronic discovery tools are not architected and not suited to address large and disparate volumes of data located in hundreds of offices and work sites across the globe that all store information locally. But boiling the ocean through data migration and centralization is extremely expensive, disruptive and frankly unworkable.

Industry analyst Barry Murphy succinctly makes this point:

Centralization runs counter to the realities of the working world where information must be distributed globally across a variety of devices and applications.  The amount of information we create is overwhelming and the velocity with which that information moves increases daily.  To think that an organization can find one system in which to manage all its information is preposterous. At the same time, the FRCPs essentially put the burden on organizations to be accountable for all information, able to conduct eDiscovery on a moment’s notice.  As we’ve seen, the challenge is daunting.

As I wrote earlier this month, properly targeted preservation initiatives are permitted by the courts and can be enabled by effective software that is able to quickly and effectively access and search these data sources throughout the enterprise.  The value of targeted preservation was recognized in the Committee Notes to the FRCP amendments, which urge the parties to reach agreement on the preservation of data and the keywords used to identify responsive materials. (Citing the Manual for Complex Litigation (MCL) (4th) §40.25 (2)).  And In re Genetically Modified Rice Litigation, 2007 WL 1655757 (June 5, 2007 E.D.Mo.), the court noted that “[p]reservation efforts can become unduly burdensome and unreasonably costly unless those efforts are targeted to those documents reasonably likely to be relevant or lead to the discovery of relevant evidence.”

What is needed to address both eDiscovery and enterprise search challenges for the de-centralized enterprise is a field-deployable search and eDiscovery solution that operates in distributed and virtualized environments on-demand within these distributed global locations where the data resides. This ground breaking capability is what X1 Rapid Discovery provides. Its ability to uniquely deploy and operate in the IaaS cloud also means that the solution can install anywhere within the wide-area network, remotely and on-demand. This enables globally de-centralized enterprises to finally address their overseas data in an efficient, expedient, defensible and highly cost-effective manner.

But I am interested in hearing if anyone has had success with the centralization model. In my 12 years in this business and the 8 years before that as a corporate attorney, I have yet to see an effective or even workable situation where a global enterprise has successfully centralized all of their electronically stored information into a single system consisting of hundreds of terabytes. If you can prove me wrong and point to such a verifiable scenario, I’ll buy you a $100 Starbucks gift certificate or a round of drinks for you and your friends at ILTA next week.  If you want to take the challenge of just meet up at ILTA next week in Washington, feel free to email me.

Leave a comment

Filed under Cloud Data, eDiscovery & Compliance, Enterprise eDiscovery, IaaS, Preservation & Collection

Authenticating Internet Web Pages as Evidence: a New Approach

By John Patzakis and Brent Botta

In recent posts, we have addressed the issue of evidentiary authentication of social media data. (See previous entries here and here). General Internet site data available through standard web browsing, instead of social media data provided by APIs or user credentials, presents slightly different but just as compelling challenges.

The Internet provides torrential amounts of evidence potentially relevant to litigation matters, with courts routinely facing proffers of data preserved from various websites. This evidence must be authenticated in all cases, and the authentication standard is no different for website data or chat room evidence than for any other. Under Federal Rule of Evidence 901(a), “The requirement of authentication … is satisfied by evidence sufficient to support a finding that the matter in question is what its proponent claims.” United States v. Simpson, 152 F.3d 1241, 1249 (10th Cir. 1998).

Ideally, a proponent of the evidence can rely on uncontroverted direct testimony from the creator of the web page in question. In many cases, however, that option is not available. In such situations, the testimony of the viewer/collector of the Internet evidence “in combination with circumstantial indicia of authenticity (such as the dates and web addresses), would support a finding” that the website documents are what the proponent asserts. Perfect 10, Inc. v. Cybernet Ventures, Inc. (C.D.Cal.2002) 213 F.Supp.2d 1146, 1154. (emphasis added) (See also, Lorraine v. Markel American Insurance Company, 241 F.R.D. 534, 546 (D.Md. May 4, 2007) (citing Perfect 10, and referencing MD5 hash values as an additional element of potential “circumstantial indicia” for authentication of electronic evidence).

One of the many benefits of X1 Social Discovery is its ability to preserve and display all the available “circumstantial indicia” – to borrow the Perfect 10 court’s term — to the user in order to present the best case possible for the authenticity of Internet-based evidence collected with the software. This includes collecting all available metadata and generating a MD5 checksum or “hash value” of the preserved data.

But html web pages pose unique authentication challenges and merely generating an MD5 checksum of the entire web page, or just the web page source file, provides limited value because web pages are constantly changing due to their very fluid and dynamic nature. In fact, a web page collected from the Internet in immediate succession would very likely calculate two different MD5 checksums. This is because web pages typically feature links to many external items that are dynamically loaded upon each page view. These external links take the form of cascading style sheets (CSS), graphical images, JavaScripts and other supporting files. This linked content can be stored on another server in the same domain, but is often located somewhere else on the Internet.

When the Web browser loads a web page, it consolidates all these items into one viewable page for the user. Since the Web page source file contains only the links to the files to be loaded, the MD5 checksum of the source file can remain unchanged even if the content of the linked files become completely different.  Therefore, the content of the linked items must be considered in the authenticity of the Web page. X1 Social Discovery addresses these challenges by first generating an MD5 checksum log representing each item that constitutes the Web page, including the main Web page’s source. Then an MD5 representing the content of all the items contained within the web page is generated and preserved.

To further complicate Web collections, entire sections of a Web page are often not visible to the viewer. These hidden areas serve various purposes, including metatagging for Internet search engine optimization. The servers that host Websites can either store static Web pages or dynamically created pages that usually change each time a user visits the Website, even though the actual content may appear unchanged.

In order to address this additional challenge, X1 Social Discovery utilizes two different MD5 fields for each item that makes a Web page.  The first is the acquisition hash that is from the actual collected information.  The second is the content hash.  The content hash is based on the actual “BODY” of a Web page and ignores the hidden metadata.  By taking this approach, the content hash will show if the user viewable content has actually changed, not just a hidden metadata tag provided by the server. To illustrate, below is a screenshot from the metadata view of X1 Social Discovery for website capture evidence, reflecting the generation of MD5 checksums for individual objects on a single webpage:

The time stamp of the capture and url of the web page is also documented in the case. By generating hash values of all individual objects within the web page, the examiner is better able to pinpoint any changes that may have occurred in subsequent captures. Additionally, if there is specific item appearing on the web page, such as an incriminating image, then is it is important to have an individual MD5 checksum of that key piece of evidence. Finally, any document file found on a captured web page, such as a pdf, Powerpoint, or Word document, will also be individually collected by X1 Social Discovery with corresponding acquisition and content hash values generated.

We believe this approach to authentication of website evidence is unique in its detail and presents a new standard. This authentication process supports the equally innovative automated and integrated web collection capabilities of X1 Social Discovery, which is the only solution of its kind to collect website evidence both through a one-off capture or full crawling, including on a scheduled basis, and have that information instantly reviewable in native file format through a federated search that includes multiple pieces of social media and website evidence in a single case. In all, X1 Social Discovery is a powerful solution to effectively collect from social media and general websites across the web for both relevant content and all available “circumstantial indicia.”

Leave a comment

Filed under Authentication, Best Practices, Preservation & Collection

Case Study: The Importance of Integrated Social Media and Website Crawling Collection

One of the benefits of the very strong market adoption of our X1 Social Discovery software is that we receive a significant amount of invaluable and excellent customer feedback from very seasoned eDiscovery and law enforcement professionals. Many of these experts report that a good number of their social media investigation and collection cases also require general website collection. For instance, a person on Facebook promoting infringing technology may also be posting relevant information to industry web bulletin boards or maintaining their own website. It is thus important that a social media eDiscovery and investigation process feature integrated web collection and social media support.

For an effective process, website data should be collected, searched and reviewed alongside social media collections in the same interface. The collected website data should not be a mere image capture or pdf, but a full HTML (native file) collection, to ensure preservation of all metadata and other source information as well as to enable instant and full search and effective evidentiary authentication. All of the evidence should be searched with one pass, reviewed, tagged and, if needed, exported to an attorney review platform from a single workflow.

To illustrate what this looks like in the field, we recorded an 8 minute demonstration based in part upon a real life example reported to us by one of our customers. This case study, performed by our CTO Brent Botta, involves the collection of social media data as well as message board posts on the web. Importantly, this evidence is consolidated into a unified workflow to be searched in one single pass.

The investigation features X1 Social Discovery as the platform, which now features automated and integrated web crawling capabilities in addition to its renowned functionality for the collection and analysis of Facebook and Twitter content. We believe this is the only solution of its kind to collect website evidence both through a one-off capture or full crawling, including on a scheduled basis, and have that information instantly reviewable in native file format through a federated search that includes multiple pieces of social media and website evidence in a single case. Up to millions of web captures and social media items are searched instantly with the patented X1 search, tagged and exported from a single interface.

Like social media content, web pages bring their own unique but important challenges for evidentiary authentication. In the next week, we will be posting on best practices for the collection and authentication of web pages as evidence, so stay tuned!

Leave a comment

Filed under Best Practices, Preservation & Collection