Category Archives: compliance

In-Place Data Analytics For Unstructured Data is No Longer Science Fiction

By John Patzakis

AI-driven analytics supercharges compliance investigations, data security, privacy audits and eDiscovery document review.  AI machine learning employs mathematical models to assess enormous datasets and “learn” from feedback and exposure to gain deep insights into key information. This enables the identification of discrete and hidden patterns in millions of emails and other electronic files to categorize and cluster documents by concepts, content, or topic. This process goes beyond keyword searching to identify anomalies, internal threats, or other indicators of relevant behavior. The enormous volume and scope of corporate data being generated has created numerous opportunities for investigators seeking deep information insights in support of internal compliance, civil litigation and regulatory matters.

The most effective use of AI in investigations couple continuous active learning technology with concept clustering to discover the most relevant data in documents, emails, text and other sources.  As AI continues to learn and improve over time, the benefits of an effectively implemented approach will also increase. In-house and outside counsel and compliance teams are now relying on AI technology in response to government investigations, but also increasingly to identify risks before they escalate to that stage.

Stock Photo - Digital Image used in blog

However, logistical and cost barriers have traditionally stymied organizations from taking advantage of AI in a systematic and proactive basis, especially regarding unstructured data, which, according to industry studies, constitutes 80 percent or more of all data (and data risk) in the enterprise. As analytics engines ingest the text from documents and emails, the extracted text must be “mined” from their native originals. And the natives must first be collected and migrated to a centralized processing appliance. This arduous process is expensive and time consuming, particularly in the case of unstructured data, which must be collected from the “wild” and then migrated to a central location, creating a stand-alone “data lake.”

Due to these limitations, otherwise effective AI capabilities are utilized typically only on very large matters on a reactive basis that limits its benefits to the investigation at hand and the information within the captive data lake.  Thus, ongoing active learning is not generally applied across multiple matters or utilized proactively. And because that captive information consists of migrated copies of the originals, there is a very limited ability to act on data insights as the original data remains in its actual location in the enterprise.

So the ideal architecture for the enterprise would be to move the data analytics “upstream” where all the unstructured data resides, which would not only save up to millions per year in investigation, data audit and eDiscovery costs, but would enable proactive utilization for compliance auditing, security and policy breaches and internal fraud detection.  However, analytics engines require considerable computing resources, with the leading AI solutions typically necessitating tens of thousands of dollars’ worth of high end hardware for a single server instance. So these computing workloads simply cannot be forward deployed to laptops and multiple file servers, where the bulk of unstructured data and associated enterprise risk exists.

But an alternative architecture solves this problem. A process that extracts text from unstructured, distributed data in place, and systematically sends that data at a massive scale to the analytics platform, with the associated metadata and global unique identifiers for each item.  As mentioned, one of the many challenges with traditional workflows is the massive data transfer associated with ongoing data migration of electronic files and emails, the latter of which must be sent in whole containers such as PST files. This process alone can take weeks, choke network bandwidth and is highly disruptive to operations. However, the load associated with text/metadata only is less than 1 percent of the full native item. So the possibilities here are very compelling. This architecture enables very scalable and proactive compliance, information security, and information governance use cases. The upload to AI engines would take hours instead of weeks, enabling continual machine learning to improve processes and accuracy over time and enable immediate action to taken on identified threats or otherwise relevant information.

The only solution that we are aware of that fulfills this vision is X1 Distributed GRC. X1’s unique distributed architecture upends the traditional collection process by indexing at the distributed endpoints, enabling direct pipeline of extracted text to the analytics platform. This innovative technology and workflow results in far faster and more precise collections and a more informed strategy in any matter.

Deployed at each end point or centrally in virtualized environments, X1 Enterprise allows practitioners to query many thousands of devices simultaneously, utilize analytics before collecting and process while collecting directly into myriad different review and analytics applications like RelativityOne and Brainspace. X1 Enterprise empowers corporate eDiscovery, compliance, investigative, cybersecurity and privacy staff with the ability to find, analyze, collect and/or delete virtually any piece of unstructured user data wherever it resides instantly and iteratively, all in a legally defensible fashion.

X1 displayed these powerful capabilities with ComplianceDS in a recent webinar with a brief but substantive demo of our X1 Distributed GRC solution, emphasizing our innovative support of analytics engines through our game-changing ability to extract text in place with direct feed into AI solutions.

Here is a link to the recording with a direct link to the 5 minute demo portion.

Leave a comment

Filed under Best Practices, collection, compliance, Corporations, eDiscovery & Compliance, Enterprise eDiscovery, Enterprise Search, GDPR, Uncategorized

Want Legal to Add A LOT More Value? Stop Over-Collecting Data

blog-cassting-net

The 2019 CLOC (Corporate Legal Operations Consortium) Conference ended last week, and by all accounts it was another great event for an organization that continues to gain relevance and momentum.  A story in Thursday’s Legaltech News entitled “Why E-discovery Savings Is About Department Value for Corporate Legal” summarized a CLOC session focused on “streamlining e-discovery and information governance inside corporate legal departments.”  At the risk of sounding biased, that seems like a perfect topic to me.

The article’s conclusions from the panel session, namely adding value by wresting control of eDiscovery from outside counsel, consolidating hosting vendors and creating a “living data map”, were all spot on and certainly useful.  One way for legal to add enormous value, however, was NOT discussed: collecting far less data as part of the eDiscovery, investigatory and compliance processes.

As we highlighted on an insightful webinar with our partner Compliance Discovery Solutions last Tuesday (which can be viewed here), the way most eDiscovery practitioners conduct ESI collection is remarkably unchanged from a decade ago, an example of which is shown in the infographic below: consult a data map, image entire drives from each and every custodian (e.g. with EnCase), load these many images into a processing application (e.g. Nuix), process these huge amounts of data (most of which is entirely irrelevant), then move this now-processed data into a review application (e.g. Relativity).

blog-legacy-collection-infographic

This legacy collection process for GRC (Governance, Risk & Compliance) and eDiscovery is wildly inefficient, disruptive to the business and costly, yet many if not most practitioners still use it, most likely because it’s the status quo and change is always hard in the legal technology world.  But change here is a must, as this “image everything à then process it all à and only then begin reviewing” workflow causes myriad issues not just for legal but for the company as well:

  • Increases eDiscovery costs exponentially. The still-seminal Rand study on eDiscovery pegged an overall cost-per-GB for identification through production of $1,800/GB.  While some elements of this price have come down in the intervening 6-7 years, especially processing and hosting rates, data volumes and variety have grown by at least as much thereby negating these reductions.  Imaging entire drives by definition collects far more data than could ever be relevant in any given matter – and the costs of this overcollection multiply every step thereafter, forcing clients to pay hundreds of thousands if not millions of dollars more than they should.
  • Is extremely disruptive to employees. Forensically imaging a drive usually requires gaining physical access to the laptop or desktop for some period of time, often for a day or two.  Put yourself in each of those employee’s shoes: even if you are given a “loaner” machine, you still don’t have all of your local information, settings, bookmarks, etc. – which is a major disruption to your work day and therefore a significant drag on productivity.
  • Takes far too long. With forensic imaging of drives requiring physical access to a device, each custodian’s machine must be dealt with.  In many collections, custodians are spread across multiple offices, or on vacation, or remote employees, which often extends the process to many weeks if not months.  All of this time lawyers are unable to access this critical data (e.g. to begin formulating case strategy, negotiating with opposing counsel or a regulator, etc).
  • Creates unnecessary copies of data that could otherwise be remediated. An often-overlooked byproduct of over-collection is that it creates another copy of data that is outside of most (if not all) data remediation programs.  For companies that are regulated and/or encounter litigation regularly, this becomes a major headache and undermines data governance and remediation programs.
  • Forces counsel to “fly blind” for months. Every day the IT and legal teams are spending forensically imaging each custodian’s drives, then processing it, and only then loading it into a review or analysis application is a day in-house and outside counsel are flying blind, unable to look at key data to begin constructing case strategy, conduct informed interviews, negotiate with opposing counsel (e.g. on the scope of a matter, including discovery) or interact with regulators.  This is incredibly valuable time lost for no value received in return.
  • Using forensic tools for non-forensic processes is unnecessary overkill. The irony of this “image everything” approach is that it is extreme overkill: it would be like a doctor whose only procedure to get rid of a mole was to cut off the arm.  Forensic images can always be utilized on a one-off basis in narrow circumstances where there are concerns about possible spoliation of evidence, but for the vast majority of circumstances, a forensic image is completely unnecessary.

As was a focus at the recent CLOC conference in Las Vegas, corporate legal operations are quite correctly focused on showing the value legal is bringing to the business.  However, there is still a fundamental change they need to make to how they handle the collection of ESI for eDiscovery, GRC and privacy purposes that would be an enormous value-add to all parts of the company, including legal: ending the systematic over-collection of data.  How this can be done quickly and cost-effectively has been the subject of previous blog posts, but will be addressed in detail in the next few weeks as well.

Leave a comment

Filed under Best Practices, collection, compliance, Data Audit, eDiscovery, Enterprise eDiscovery, Uncategorized

Government Regulators Reject “Paper” Corporate Compliance Programs Lacking Actual Enforcement

By John Patzakis

Recently, US Government regulators fined Stanley Black & Decker $1.8m after its subsidiary illegally exported finished power tools and spare parts to Iran, in violation of sanctions. The Government found that the tool maker failed to “implement procedures to monitor or audit [its subsidiary] operations to ensure that its Iran-related sales did not recur.”

Notably, the employees of the subsidiary concealed their activities by creating bogus bills of lading that misidentified delivery locations and told customers to avoid writing “Iran” on business documents. This conduct underscores the importance of having a diligent internal monitoring and investigation capability that goes beyond mere review of standard transactional records in structured databases such as CRM systems. This type of conduct is best detected on employee’s laptops and other sources of unstructured data through effective internal investigations processes.Law Journal2

The Treasury Department stated the Stanley Black & Decker case “highlights the importance of U.S. companies to conduct sanctions-related due diligence both prior and subsequent to mergers and acquisitions, and to take appropriate steps to audit, monitor and verify newly acquired subsidiaries and affiliates for….compliance.”

Further to this point, the US Department of Justice Manual features a dedicated section on assessing the effectiveness of corporate compliance programs in corporate fraud prosecutions, including FCPA matters. This section is a must read for any corporate compliance professional, as it provides detailed guidance on what the USDOJ looks for in assessing whether a corporation is committed to good-faith self-policing or is merely making hollow pronouncements and going through the motions.

The USDOJ cites United States v. Potter, 463 F.3d 9 (1st Cir. 2006), which provides that a corporation cannot “avoid liability by adopting abstract rules” that forbid its agents from engaging in illegal acts, because “[e]ven a specific directive to an agent or employee or honest efforts to police such rules do not automatically free the company for the wrongful acts of agents.” Id. at 25-26. See also United States v. Hilton Hotels Corp., 467 F.2d 1000, 1007 (9th Cir. 1972) (noting that a corporation “could not gain exculpation by issuing general instructions without undertaking to enforce those instructions by means commensurate with the obvious risks”).

The USDOJ manual advises prosecutors to determine if the corporate compliance program “is adequately designed for maximum effectiveness in preventing and detecting wrongdoing by employees and whether corporate management is enforcing the program or is tacitly encouraging or pressuring employees to engage in misconduct to achieve business objectives,” and that “[p]rosecutors should therefore attempt to determine whether a corporation’s compliance program is merely a ‘paper program’ or whether it was designed, implemented, reviewed, and revised, as appropriate, in an effective manner.”

With these mandates from government regulators for actual and effective monitoring and enforcement through internal investigations, organizations need effective and operational mechanisms for doing so. In particular, any anti-fraud and internal compliance program must have the ability to search and analyze unstructured electronic data, which is where much of the evidence of fraud and other policy violations can be best detected.

To help meet the “actual enforcement” requirements of government regulators, X1 Distributed Discovery (X1DD) enables enterprises to quickly and easily search across up to thousands of distributed endpoints and data servers from a central location.  Legal and compliance teams can easily perform unified complex searches across both unstructured content and metadata, obtaining statistical insight into the data in minutes, and full results with completed collection in hours, instead of days or weeks. Built on our award-winning and patented X1 Search technology, X1DD is the first product to offer true and massively scalable distributed data discovery across an organization. X1DD replaces expensive, cumbersome and highly disruptive approaches to meet enterprise investigation, compliance, and eDiscovery requirements.

Once the legal team is satisfied with a specific search string, after sufficient iteration, the data can then be collected by X1DD by simply hitting the ‘collect’ button. The responsive data is “containerized” at each end point and automatically transmitted to either a central location, or uploaded directly to Relativity, using Relativity’s import API where all data is seamlessly ready for review. Importantly, all results are tied back to a specific custodian, with full chain of custody and preservation of all file metadata. Here is a recording of a live public demo with Relativity, showing the very fast direct upload from X1DD straight into RelativityOne.

This effort described above — from iterative, distributed search through collection and transmittal straight into Relativity from hundreds of endpoints — can be accomplished in a single day. Using manual consulting services, the same project would require several weeks and hundreds of thousands of dollars in collection costs alone, not to mention significant disruption to business operations. Substantial costs associated with over-collection of data would mount as well, and could even dwarf collection costs through unnecessary attorney review time.

In addition to saving time and money, these capabilities are important demonstrate a sincere organizational commitment to compliance versus maintaining a mere “paper program.”

1 Comment

Filed under Best Practices, Case Law, Case Study, compliance, Corporations, eDiscovery & Compliance, Enterprise eDiscovery, Information Governance

GDPR Fines Issued for Failure to Essentially Perform Enterprise eDiscovery

By John Patzakis

The European General Data Protection Regulation (GDPR) came into full force in May 2018. Prior to that date, what I consistently heard from most of the compliance community was general fear and doubt about massive fines, with the solution being to re-purpose existing compliance templates and web-based dashboards. However, many organizations have learned the hard way that “paper programs” alone fall far short of the requirements under the GDPR. This is because the GDPR requires that an organization have absolute knowledge of where all EU personal data is stored across the enterprise, and be able to search for, identify and remove it when required.GDPR-stamp

Frequent readers of this blog may recall we banged the Subject Access Request drum prior to May 2018. We noted an operational enterprise search and eDiscovery was required to effectively comply with many of the core data discovery-focused requirements of GDPR. Under the GDPR, a European resident can request — potentially on a whim — that all data an enterprise holds on them be identified and also be removed. Organizations are required to establish a capability to respond to these Subject Access Requests (SARs). Forrester Research notes that “Data Discovery and classification are the foundation of GDPR compliance.” This is because, according to Forrester, GDPR effectively requires that an organization be able to identify and actually locate, with precision, personal data of EU data subjects across the organization.

Failure to respond to SARs has already led to fines and enforcement actions against several companies, including Google and the successor entity to Cambridge Analytica. This shows that many organizations are failing to understand the operational reality of GDPR compliance. This point is effectively articulated by a recent practice update from the law firm of DLA Piper on the GDPR, which states: “The scale of fines and risk of follow-on private claims under GDPR means that actual compliance is a must. GDPR is not a legal and compliance challenge – it is much broader than that, requiring organizations to completely transform the way that they collect, process, securely store, share and securely wipe personal data (emphasis added).”

These GDPR requirements can only be complied with through an effective enterprise eDiscovery search capability:

To achieve GDPR compliance, organizations must ensure that explicit policies and procedures are in place for handling personal information, and just as importantly, the ability to prove that those policies and procedures are being followed and operationally enforced. What has always been needed is gaining immediate visibility into unstructured distributed data across the enterprise, through the ability to search and report across several thousand endpoints and other unstructured data sources, and returning results within minutes instead of days or weeks. The need for such an operational capability is further heightened by the urgency of GDPR compliance.

X1 Distributed GRC represents a unique approach, by enabling enterprises to quickly and easily search across multiple distributed endpoints and data servers from a central location.  Legal and compliance teams can easily perform unified complex searches across both unstructured content and metadata, obtaining statistical insight into the data in minutes, instead of days or weeks. With X1, organizations can also automatically migrate, collect, delete, or take other action on the data as a result of the search parameters.  Built on our award-winning and patented X1 Search technology, X1 Distributed GRC is the first product to offer true and massively scalable distributed searching that is executed in its entirety on the end-node computers for data audits across an organization. This game-changing capability vastly reduces costs while effectuating that all-too-elusive actual compliance with information governance programs, including GDPR.

Leave a comment

Filed under Best Practices, compliance, Data Audit, GDPR, Uncategorized

In addition to TAR, CAR Can Dramatically Reduce Attorney Review Costs

eDiscovery efforts are often costly, time consuming and burdensome. The volume of Electronically Stored Information is growing exponentially and will only continue to do so. Even with the advent of technology assisted review (TAR), the costs associated with collecting, processing, reviewing, and producing documents in litigation are the source of considerable pain for litigants. The only way to reduce that pain to its minimum is to use all tools available in all appropriate circumstances within the bounds of reasonableness and proportionality to control the volumes of data that enter the discovery pipeline.

Litigators and commentators often pine for the advent of a systemized, uniform and defensible process for custodian self-collection. Conceptually, such an ideal process would be where custodians are automatically presented with a set of their documents and emails that are identified as potentially relevant to a given matter through a set of keywords and other search parameters that are uniformly applied across all custodians. This set of ESI would be presented to the custodian in a controlled interface with no ability to delete documents or emails, and only the ability to review and apply tags and annotations. The custodian would have to comply with the order and all documents responsive to the initial unified search would be collected as a default control mechanism.

With X1 Data Audit and Compliance (XDAC), the option for a defensible custodian assisted review (CAR) is now a reality. At a high level, with XDAC, organizations can perform targeted search and collection of the ESI of thousands of endpoints over the internal network without disrupting operations. The search results are returned in minutes, not weeks, and thus can be highly granular and iterative, based upon multiple keywords, date ranges, file types, or other parameters. This approach typically reduces the eDiscovery collection and processing costs by at least one order of magnitude (90%), thereby bringing much needed feasibility to enterprise-wide eDiscovery collection that can save organizations millions while improving compliance. XDAC includes X1 Insight and Collection for pure eDiscovery use cases.

As a key optional feature, XDAC provides custodian assisted review, where custodians are presented with a listing of their potentially relevant ESI in a controlled, systemized and uniform identification process for their review and tagging. Instead of essentially asking the custodians to “please rummage through your entire email account and all your documents to look for what you might think is relevant to this matter,” the custodians are presented with a narrow and organized subset of potentially relevant ESI for their review.

screenshot

While the custodians are able to assist with the review, they cannot impact or control what ESI is identified and preserved; this is controlled and managed centrally by the eDiscovery practitioner. This way, custodians can apply their own insight to the information and even flag personal private data, all while effectuating very cost-effective and systematic ESI collection.

Powerful Analytics Engine

TAR features powerful algorithms that cluster documents and otherwise work their magic. CAR also relies on a powerful analytics engine — the human brain. Custodians know a lot about their own documents and emails. This is particularly true in technical or other complex matter where the custodians are engineers or other professionals who simply better understand the dynamics and the nuances of their information. With the X1 process, the custodians provide a key data point, where their input is used to inform the secondary review.

The process is very defensible as the exercise is logged and documented, with all metadata kept intact and a concise chain of custody established. Best of all, the custodian-applied tags and annotations are preserved and retained through the review process with X1 integration with Relativity. I could describe this very important feature a lot further, but candidly the best way to get a full picture is to see it for yourself. I recommend that you view this recorded 9 minute demonstration of X1’s custodian self-review feature here.

We believe X1’s functionality provides the optimal means for enterprise eDiscovery preservation, collection and early data assessment, especially with the key additional (and optional) feature of custodian assisted review. But please see for yourself and let us know what you think!

 

Leave a comment

Filed under Best Practices, compliance, Desktop Search, eDiscovery & Compliance, Enterprise eDiscovery