Category Archives: Cloud Data

De-NISTing in eDiscovery: A Costly Provision That Shouldn’t Be in Model Orders in the First Place

By John Patzakis

A model eDiscovery order I recently came across from a federal district court issued by a respected judge included a provision requiring parties to de-NIST their files in the course of eDiscovery production. On its face, this may seem like a reasonable technical requirement to some practitioners. But this provision reflects a fundamental misunderstanding of how proportional, targeted eDiscovery collection should work — and it points to a broader problem in our industry that deserves some attention.

For those unfamiliar with the term, de-NISTing refers to the process of filtering out known, irrelevant system files from a forensic collection using the National Institute of Standards and Technology’s reference database of known file signatures. The NIST database catalogs hundreds of thousands of known operating system files, executables, DLL files, and other system-generated data that have no evidentiary value whatsoever. De-NISTing removes these files from a collection so that reviewers are not burdened with wading through mountains of irrelevant system data. The reason you need to de-NIST in the first place is because you collected a full-disk image — capturing everything on the drive, relevant or not.

And that is precisely the problem with requiring de-NISTing in a model eDiscovery order. As I have written extensively, including in our recent white paper on proportionality in eDiscovery, courts have consistently held that full-disk imaging is not the appropriate default for civil litigation collections. Going all the way back to Deipenhorst v. City of Battle Creek in 2006, courts have warned that imaging a hard drive results in the production of massive amounts of irrelevant — and potentially privileged — information. More recently, in Motorola Solutions v. Hytera Communications Corp., the court emphasized that forensic examination of a party’s computers “is no routine matter” and that courts must use caution to avoid unduly impinging on privacy interests. A model order that presupposes full-disk imaging by requiring de-NISTing is, at minimum, inconsistent with this well-established body of case law.

The 2015 amendments to Federal Rule of Civil Procedure 26(b)(1) established a clear six-pronged proportionality framework for eDiscovery, requiring parties and courts to weigh factors including the importance of the issues at stake, the amount in controversy, the parties’ resources, and whether the burden or expense of proposed discovery outweighs its likely benefits. Courts have taken these amendments seriously and have consistently limited overbroad discovery requests on proportionality grounds. A blanket model order requirement to de-NIST implicitly endorses a collect-everything methodology that runs counter to the proportionality principles embedded in Rule 26(b)(1) and the extensive case law that has developed around it.

So how does a provision like this end up in a model court order? The answer, I believe, lies in the undue influence that certain eDiscovery service providers have had on collection practices and, ultimately, on the drafting of court orders and guidelines. Some service providers have a clear financial incentive to collect as much data as possible, since their fees are calculated on a per-gigabyte basis — meaning the more data collected, processed, and hosted, the higher the bill. This volume-based business model has shaped industry “best practices” in ways that favor over-collection, and that mindset has quietly seeped into the thinking of some federal judges and the model orders they issue. What gets dressed up as technical diligence is, in many cases, simply an artifact of a business model that profits from excess.

If you are conducting a properly scoped, targeted eDiscovery collection that is consistent with the principles of proportionality — as the Federal Rules and overwhelming case law require — there is simply no reason to de-NIST. A targeted collection does not reach system files, executables, DLLs, or other non-user-generated data in the first place. You are collecting potentially relevant ESI from identified custodians, scoped by search terms, date ranges, file types, and data sources. You never touch the data that de-NISTing is designed to filter out, which means the entire de-NISTing step — and its associated cost and processing time — is unnecessary overhead born entirely of an overbroad collection methodology.

This is precisely the approach built into X1 Enterprise, which enables legal and IT teams to conduct targeted, remote collections across large numbers of custodians without ever capturing the system-level data that necessitates de-NISTing. X1 Enterprise collects only the user-generated, potentially relevant ESI within defined parameters, preserving full metadata integrity and maintaining a documented chain of custody — satisfying every requirement for forensic soundness without the bloat, expense, and proportionality concerns of full-disk imaging. In an era where courts are increasingly scrutinizing eDiscovery costs and demanding proportionality, practitioners and judges alike should be asking not how to manage the mess created by over-collection, but how to avoid creating that mess in the first place.

Leave a comment

Filed under Best Practices, Case Law, Cloud Data, Cybersecurity, Data Audit, Data Governance, eDiscovery, eDiscovery & Compliance, Enterprise eDiscovery, GDPR, Information Governance, Information Management

Why Most SaaS Architectures Fall Short for Enterprise-Grade AI

By John Patzakis and Chas Meier

SaaS Architectures Fall Short for Enterprise-Grade AI

As organizations accelerate adoption of AI to support legal, compliance, security, and business operations, one principle is becoming clear: the underlying deployment architecture matters as much as the model itself. Many enterprise AI initiatives fail not because the technology is immature, but because the environment in which it operates was never designed for high-volume, sensitive, or tightly regulated use cases.

Traditional multi-tenant SaaS architectures—where numerous customers share the same provider-controlled environment—excel at delivering standardized, lower-risk business applications. But applying that same model to AI workloads involving privileged, regulated, or company sensitive data introduces material limitations in governance, security, performance, and operational feasibility.

Below are the core architectural constraints that legal, IT, and security leaders consistently raise as they evaluate AI strategies.

  1. Data Governance, Privacy, and Regulatory Control
    Most commercial SaaS AI platforms require customer data—or derivative artifacts such as embeddings, logs, or temporary working sets—to be processed within the provider’s environment. Even with strong encryption and contractual controls, this shift of data outside the enterprise’s controlled boundary introduces challenges that many legal and security teams cannot accept.

    Key concerns include:
    Loss of direct data sovereignty. Once data is inside a vendor’s multi-tenant environment, the organization no longer controls how it is stored, moved, or isolated.
    Jurisdiction and residency risks. Multi-tenant SaaS services often replicate or route data across regions for load or resilience purposes, complicating GDPR, HIPAA, ITAR, or sector-specific compliance requirements.
    Governance of secondary artifacts. AI systems often generate embeddings, caches, metadata, and diagnostic logs. Ensuring these artifacts adhere to the same retention, destruction, and legal hold rules become significantly more complex in a shared environment.

    For legal departments, eDiscovery teams, and CISOs, these factors create an expanded compliance burden that is often disproportionate to the value of outsourcing AI workloads.
  2. Assurance of Isolation and Auditability
    Large enterprises increasingly demand verifiable guarantees—not merely assurances—that:
    • Their data is isolated from other tenants
    • Their information is not used for model training unless explicitly authorized
    • Every transaction is auditable and traceable
    • No shared services introduce inadvertent cross-tenant visibility

    While reputable AI providers enforce strong separation controls, multi-tenant architecture inherently increases the assurance burden. The organization must rely on the vendor’s internal controls, certifications, and change management practices—none of which it can independently verify.

    For regulated entities, this can be an unacceptable dependency, particularly where privileged legal data, sensitive communications, or proprietary research is involved.
  3. Performance and Scalability Under AI Workloads
    AI inference and large-scale analysis require sustainable compute performance. Multi-tenant environments, by design, pool capacity across customers. Even when quotas or isolation tiers exist, resource contention and dynamic scaling can introduce variability.

    For enterprise workloads—such as legal investigations, regulatory responses, internal audits, or global compliance monitoring—performance variability translates directly into operational delays and risk.

    Organizations routinely raise:
    Deterministic performance requirements for time-sensitive matters
    Workload isolation needs when running tens of thousands of queries or document classifications
    The high cost of dedicated capacity tiers in third-party SaaS models

    These are structural limitations, not configuration issues.
  4. Data Movement, Transfer Overhead, and Operational Disruption
    Before any SaaS-based AI workflow begins, enterprises must stage or transfer large volumes of data—including emails, documents, chat messages, or historical repositories—into the vendor’s cloud environment.

    This poses several obstacles:
    Time and bandwidth constraints when transferring terabytes or petabytes
    Chain-of-custody and legal hold considerations during data movement
    Jurisdictional restrictions when data cannot transit or be stored outside specific regions
    Ongoing synchronization challenges as new data is generated

    For legal, compliance, and security teams, these issues often make multi-tenant SaaS unsuitable for high-value unstructured data.
  5. Limited Customization and Restricted Model Control
    Most multi-tenant AI SaaS offerings operate within a shared, standardized stack. This limits an enterprise’s ability to:
    • Tailor models to domain-specific content or workflows
    • Implement custom inference pipelines
    • Integrate internal security, monitoring, or policy engines
    • Maintain visibility into how models process and route sensitive information

    For departments handling privileged, confidential, or regulated data, this lack of deep configurability hampers both innovation and risk mitigation.

The Industry Shift Toward AI-in-Place Architectures
To address these concerns, organizations are increasingly adopting AI-in-Place models—deploying AI capabilities directly onto systems, repositories, and environments they already control.

AI-in-place allows enterprises to:
• Keep all source data behind the firewall or within their private cloud tenancy
• Maintain full sovereignty over models, embeddings, logs, and derived artifacts
• Enforce internal security, retention, and access policies without exception
• Optimize performance around their own infrastructure and workflows
• Reduce compliance complexity by avoiding data egress entirely

This architectural shift reflects a maturing understanding: the value of AI is maximized only when it can operate where sensitive data already resides.

X1 Enterprise: A Modern Foundation for AI-in-Place
X1 Enterprise—with its patented distributed micro-indexing architecture—has emerged as a leading platform for organizations adopting AI-in-Place strategies.

X1 enables:
In-place analysis without data movement
Deploy LLMs, embeddings, and AI pipelines directly to endpoints, repositories, and cloud data sources—without exporting or copying sensitive content.
Enterprise-wide visibility across unstructured data
Email, documents, chat, archives, and cloud sources can be searched, tagged, classified, and analyzed at scale from a single federated index.
High-assurance governance
All data remains within the enterprise’s security boundary or isolated single-tenant cloud, supporting legal holds, audits, discovery, and regulatory requirements.
Scalable performance tailored to the enterprise’s environment
Micro-indexing distributes compute to where data lives, eliminating bottlenecks inherent in centralized SaaS architectures.

For legal, IT, and security leaders seeking to implement AI responsibly, X1 provides a practical and compliant path forward.

See AI-in-Place in Action
We invite you to join our upcoming webinar on Wednesday, December 10, where our team will present:
• A detailed look at X1’s new AI-in-Place capabilities
• Architectural considerations for legal, IT, and CISO stakeholders
• A live demonstration of enterprise-scale AI applied directly to live data sources

Register here to secure your spot.

Leave a comment

Filed under Best Practices, Cloud Data, Corporations, Cybersecurity, Data Audit, Data Governance, eDiscovery, eDiscovery & Compliance, Enterprise AI, Enterprise eDiscovery, Information Governance, SaaS

X1 Brings “AI In-Place” to the Enterprise—A Major Breakthrough for Secure, Scalable AI Deployment

By John Patzakis

Our latest announcement represents a true inflection point in enterprise AI. With X1 Enterprise’s newly introduced capability for AI in-place, organizations and their service providers will, for the first time, be able to deploy and execute large language models (LLMs) directly where enterprise data lives—without moving or copying that data.

This is more than a product enhancement; it is a fundamental shift in how AI is applied across the enterprise.

The Foundation: Efficient Text Extraction Is Critical for AI
Large language models (LLMs) are the core engines that power today’s AI revolution. These models rely entirely on textual input to perform reasoning, summarization, search, and analysis. That is why text extraction is the critical first step. LLMs can only operate once another process extracts the text from emails, documents and chats. Traditionally, that meant copying or exporting data to external systems hosted by third party vendors, a process fraught with risk, cost, and compliance challenges.

Solving the “Data Movement Problem” for Enterprise AI
So, the key barrier to enterprise AI adoption has been the reluctance to move sensitive corporate data to external AI platforms. Whether for security, governance or cost reasons, most enterprises simply cannot send their data outside their environment.

X1’s innovation solves that problem head-on. Instead of shipping sensitive data out to an AI system, X1 brings the AI to the data. Enterprises can now deploy their own proprietary models or open-source LLMs within the secure perimeter of their existing infrastructure, whether on premises or in the cloud. X1’s index-in-place architecture performs the text extraction and indexing where the data resides. By extending that same principle to AI—forward-deploying LLMs directly to enterprise data sources—X1 now enables AI in-place. The result: organizations can apply the analytical power of LLMs across their data without ever moving it.

Once the LLMs are deployed into the X1 micro-indexes, X1 will then auto-apply AI-informed tags, which a user can query globally from a central console and act upon through targeted data collection or remediation. Imagine petabytes of data on file servers, laptops M365 and other sources all AI-classified and then queried and collected on a highly targeted basis.

This means enterprises can now unlock powerful new use cases no matter the scale—AI-assisted compliance, risk monitoring, GRC audits, eDiscovery, and more—while maintaining full control of their data and eliminating the need for costly, risky data transfers.

Enabling Collaboration Between Enterprises and Their Advisors
William Belt, Managing Director and Consulting Practice Leader at Complete Discovery Source, described the impact succinctly:

“Enabling AI in-place where our corporate client’s data lives is game-changing. We look forward to working with our clients to deploy AI models that are either pre-trained or customized for a specific matter or compliance requirement utilizing the X1 Enterprise platform.”

This capability creates a new bridge between corporations and their professional advisors—consulting firms, law firms, and service providers—who can now collaborate directly with their clients to develop, fine-tune, and deploy customized AI models for specific business or legal needs.

Rather than relying on generic cloud-based AI tools, organizations can now build targeted, matter-specific LLMs that are tuned to their unique data and compliance requirements, all executed securely in-place through the X1 Enterprise Platform.

A New Era for Enterprise AI
With this release, X1 is redefining the architecture of enterprise AI. Its ability to perform distributed micro-indexing and in-place AI analysis across global data sources enables secure, scalable, and cost-effective intelligence—without ever duplicating or relocating sensitive data.

For enterprises and their partners, this represents a new era of possibility: true AI at enterprise scale, in-place.

X1 will host a webinar on Wednesday, December 10, featuring a detailed overview of this new capability and a live demonstration. You can register here.

Leave a comment

Filed under Cloud Data, Corporations, Cybersecurity, eDiscovery, eDiscovery & Compliance, Enterprise AI, Enterprise eDiscovery, Information Governance, m365

The Business Case for In-House eDiscovery: Lessons from Two Prominent Corporate eDiscovery Counsel

By John Patzakis

Building a Business and Legal Case for In-House eDiscovery

In a recent webinar hosted by Ad Idem, a non-profit legal education provider for in-house counsel, attorneys Kelly Twigger and Eric Stansell offered a compelling roadmap for corporate legal departments looking to bring eDiscovery and information governance (InfoGov) in-house. Their insights cut through the complexity of traditional discovery models and emphasized the strategic, operational, and legal advantages of internalizing these processes. For legal professionals navigating mounting data volumes and rising litigation costs, their discussion provided both practical guidance and a call to rethink legacy workflows.

Eric Stansell, Senior Counsel for Discovery at Tyson Foods, opened with a candid reflection on how his role was created to address the company’s need for a more efficient eDiscovery program. He emphasized that building a business case for in-house capabilities starts with understanding the “why”—whether it’s cost savings, risk reduction, or process defensibility. Stansell emphasized that standardizing internal processes not only improves consistency but also enhances defensibility and reduces exposure by limiting data sprawl across external vendors.

Kelly Twigger — who is one of if not the top eDiscovery lawyer in the field in my opinion — built on Stansell’s narrative by stressing the importance of conducting a thorough assessment before launching any in-house initiative. She encouraged legal teams to break down business cases into manageable chunks, identifying quick wins such as revising email retention policies. Twigger noted that internal culture shifts and stakeholder alignment are just as critical as technology adoption. Her approach favors incremental change backed by measurable ROI, rather than sweeping transformations that risk overwhelming legal and IT teams.

Both speakers underscored the importance of engaging multiple stakeholders. Stansell shared Tyson’s experience with cross-functional collaboration, highlighting how legal, IT, audit, and compliance teams must be involved from the outset. As one example of such collaboration, Stansell noted that eDiscovery enterprise search and collection software procured by the legal team can also address key IT security priorities such as PII data audits and internal investigations.

Twigger also delivered a deep dive into the proportionality principles now codified under the Federal Rules of Civil Procedure, urging legal teams to build factual arguments early in the discovery process. She explained that proportionality isn’t just about cost—it’s about narrowing scope through targeted custodians, refined date ranges, and iterative search terms. Stansell added that understanding custodians’ roles and historical relevance can help avoid unnecessary data collection, further supporting proportionality claims in court.

One of the most pressing issues Twigger addressed was the evolving case law around hyperlinked files. She traced the trajectory from Nichols v. Noom, Inc.—where hyperlinks were deemed not attachments—to more recent rulings that treat them as discoverable content depending on technological capabilities. Twigger cited In re Uber and Young v. Salesforce to illustrate how courts are increasingly expecting parties to preserve and produce hyperlinked documents, especially when shared via chat platforms or collaborative tools.

Twigger warned that failing to understand your organization’s tech stack could lead to costly missteps. She recommended that in-house counsel proactively assess their systems—especially Microsoft 365 environments—to determine what’s feasible when it comes to hyperlink preservation and production. She also highlighted X1 Discovery’s capabilities, noting that X1’s software can automate the collection of contemporaneous versions of hyperlinked documents in M365, support targeted Teams chat collection as well as many other data sources, making X1 a valuable solution for defensible in-house eDiscovery.

In closing, both Twigger and Stansell made it clear that bringing eDiscovery and InfoGov in-house isn’t just a cost-cutting measure—it’s a strategic imperative. With the right mix of technology, process, and cross-functional collaboration, legal departments can gain control, reduce risk, and improve outcomes. Their insights serve as a blueprint for legal teams ready to evolve beyond reactive discovery and toward a proactive, integrated approach.

The recording of the webinar can be accessed here.

Leave a comment

Filed under Best Practices, Case Law, Cloud Data, Corporations, ECA, eDiscovery & Compliance, Enterprise eDiscovery, ESI, Information Governance, m365, MS Teams, Preservation & Collection

X1 Expands Its Leadership in Microsoft Teams eDiscovery Collection

X1 Enterprise MS Teams Collection

By John Patzakis and Chas Meier

The rapid growth of Microsoft 365 has fundamentally changed the eDiscovery landscape. Among its most prominent data sources, Microsoft Teams now generates vast volumes of business-critical communications that must be identified, collected, and reviewed in litigation, regulatory, and compliance matters.

Yet most eDiscovery tools still rely on outdated methods: bulk copying massive amounts of sensitive data and transferring it to proprietary processing or review platforms. This approach is slow, costly, and disruptive. Bulk transfers frequently trigger Microsoft’s throttling controls, adding significant delays. More importantly, organizations that have invested heavily in Microsoft 365 do not want their data routinely exported out of its secure, native environment every time an eDiscovery matter or compliance investigation arises.

Recognizing these challenges, X1 has built upon its industry-leading Microsoft 365 collection capabilities to deliver unmatched support for Microsoft Teams—alongside OneDrive, Exchange, and SharePoint.

Key Benefits of X1’s Teams Collection Capabilities
Precision targeting of Channels at scale – Quickly search all available channels, select, and target specific Teams channels, even in organizations with tens of thousands of them. This feature is not even available in Microsoft Purview!
Granular control – Target individual custodians and message threads, avoiding unnecessary mass downloads.
Contextual collections – Automatically include a designated number of preceding and subsequent messages, preserving conversational context.
Seamless review integration – One-click upload of fully formatted in-context results directly into review platforms—no manual processing required.
Unified approach – Search and collect across Teams, OneDrive, SharePoint, Exchange, laptops, and file shares from a single interface.
In-place indexing – Leverage X1’s patented technology to index, search, and process data where it resides, eliminating reliance on expensive third-party processing.
True automation – A software-based solution that reduces dependency on manual, service-heavy workflows.

No other independent software provider matches the speed, precision, and scalability of X1’s Microsoft Teams eDiscovery collection. Our customers consistently report significant gains in efficiency, cost savings, and defensibility compared to legacy approaches.

As Teams usage continues to surge, legal and compliance professionals need solutions that deliver targeted, defensible collections without the inefficiencies of bulk exports. X1’s enhanced Teams support ensures organizations can meet these demands with speed, accuracy, and minimal disruption.

Seeing is believing—watch our short demo video to experience X1’s Teams capabilities in action.

Leave a comment

Filed under Best Practices, Cloud Data, Corporations, ECA, eDiscovery, eDiscovery & Compliance, Enterprise eDiscovery, Enterprise Search, ESI, Hybrid Search, Information Governance, m365, MS Teams, OneDrive