Category Archives: ESI

Why Most Tools Fall Short for Large-Scale Information Governance and What Actually Works

By John Patzakis

For more than a decade, enterprise organizations have struggled with a persistent and costly challenge: how to effectively search, collect, manage, and analyze large volumes of unstructured on-premise data for information governance, eDiscovery, and enterprise search use cases. We are talking about environments with many terabytes of data distributed across file servers, email archives, endpoints, and Microsoft 365 data that must be rapidly interrogated, precisely analyzed, and in many cases urgently remediated in response to a regulatory inquiry, a data breach, or an M&A transaction. Despite the proliferation of tools claiming to address this challenge, none has ever truly solved it at scale. The core reason is architectural. Most of these tools are built on a flawed foundation from the start.

The gravitational pull toward Elasticsearch as the search foundation for enterprise data tools is easy to understand. It is open source, it is widely documented, and it is written in Java a language familiar to a large pool of developers. For these reasons, a basic centralized search and analysis tool can be assembled relatively quickly, and hundreds of vendors and in-house development teams have taken exactly this path. The problem is not that Elasticsearch lacks capability for general-purpose search. The problem is that general-purpose search and large-scale enterprise information governance are fundamentally different problems, and what works for one fails badly at the other. What is rarely discussed openly but what practitioners learn the hard way is that Elasticsearch’s architectural limitations are not configuration issues that can be engineered around. They are structural constraints baked into the platform’s design, and they surface precisely at the scale and complexity that serious information governance work demands.

The result is a graveyard of failed or severely limited information governance deployments: tools that work impressively in demos on curated datasets of a few hundred gigabytes, but that buckle, stall, or simply break when asked to operate on the multi-terabyte, distributed, live data environments that characterize real enterprise compliance projects.

The Structural Limitations of Elasticsearch for Information Governance
The memory problem with Elasticsearch begins with Java itself, which requires a significant amount of compute power over other code bases when addressing large volumes of data. The Java Virtual Machine (JVM) requires a heap to manage object allocation, and as data volumes grow, the memory demands scale dramatically. Each Elasticsearch index must be loaded into memory to be searched, and in a multi-terabyte environment with complex query patterns — the kind that information governance work consistently requires — the JVM heap pressure becomes severe and unmanageable. Organizations that have attempted to deploy Elasticsearch-based platforms against over 10 terabytes of enterprise data consistently encounter the same outcome: massive hardware requirements, constant tuning, and performance that degrades as the dataset grows rather than holding steady. The compute overhead is not a solvable problem; it is an inherent consequence of building a memory-intensive centralized index on a Java runtime, and it places a practical ceiling on what Elasticsearch-based governance tools can realistically accomplish.

Beyond the memory constraints, the workflow required to use Elasticsearch for information governance introduces a second, equally serious problem: it requires a full copy of the data under governance to be made and migrated into the centralized index. For a 50-terabyte dataset, this means creating 50 additional terabytes of sensitive material — often including personally identifiable information, privileged communications, and confidential business records — and transferring it outside its original, controlled location. Requiring the wholesale copying and centralization of that same data in order to govern it is a fundamental contradiction, one that legal, security, and compliance stakeholders increasingly and rightly reject.

The timeline problem compounds the data duplication problem. Copying, transferring, and indexing 50 terabytes of enterprise data into a centralized Elasticsearch platform is not a weekend project. In real-world deployments, this process can take months, even under favorable conditions. And information governance use cases are rarely patient ones. Data breach impact assessments operate under regulatory notification deadlines measured in days. M&A-related data audits run on compressed timelines driven by transaction closing schedules. By the time the data has been staged and indexed into a centralized Elasticsearch platform, the underlying data has changed, and the copied index set is already stale.

Finally, even if an organization tolerates the data duplication, survives the timeline, and manages the memory overhead, there is a “last mile” problem that the centralized Elasticsearch architecture cannot solve: remediation. Information governance is not just about finding sensitive or problematic data — it is about acting on it — Deleting records past their retention period. Quarantining compromised PII. Tagging and separating data in support of a corporate divestiture. When the discovery and analysis workflow is built on a centralized copy of the data, the organization is operating on clones, not originals. The identified data still exists in its original locations distributed across file servers, Microsoft 365 environments, laptops, and cloud storage. Tracing back from a finding in a centralized index to the live source, and then executing a remediation action on that source, is a manual, error-prone, and operationally disruptive process.

How X1 Enterprise’s Micro-Indexing Architecture Solves What Elasticsearch Based Tools Cannot
X1 Enterprise is built on a fundamentally different architectural premise: rather than requiring data to be copied and centralized, X1’s patented micro-indexing technology indexes, searches, analyzes, and remediates data entirely in place where it lives, within the corporate environment, without ever moving it. This architectural difference is consequential at every stage of a large-scale governance project. The micro-indexing engine is written in C++, which delivers dramatically more efficient memory utilization than a Java-based runtime. Individual micro-indexes do not need to be loaded into memory simultaneously; the architecture is genuinely distributed and parallelized, enabling X1 Enterprise to operate effectively at multi-terabyte scale, including at hundreds of terabytes, without the memory walls and hardware escalation that make Elasticsearch-based platforms impractical for serious enterprise deployments.

Because X1 Enterprise operates in place, the data duplication problem is eliminated entirely. There is no second copy of your sensitive data to govern, secure, or explain to regulators. The indexed data remains in its original location, under the organization’s existing controls, throughout the entire governance workflow. This means that X1 Enterprise not only avoids compounding compliance risk, it actively reduces it, by ensuring that sensitive data never leaves its controlled environment. For organizations subject to GDPR, HIPAA, CCPA, or sector-specific data residency requirements, the ability to conduct large-scale information governance analysis entirely within the corporate firewall is not a luxury. It is a hard requirement. X1 Enterprise is the only platform in the market that can meet this requirement at multi-terabyte scale without architectural compromise.

Perhaps most powerfully, the in-place architecture closes the remediation loop that Elasticsearch-based tools leave permanently open. When X1 Enterprise identifies data that must be deleted, preserved, tagged, or acted upon, it can execute that remediation directly on the source data in Microsoft 365, on file servers, on endpoints, wherever the data resides. There is no manual tracing back from a centralized index to a distributed original. The finding and the action occur in the same environment, with full auditability and chain-of-custody documentation.

X1 Enterprise delivers the architecture that the industry has needed for years.

To learn more, schedule a briefing today at sales@x1.com or visit x1.com/solutions/x1-enterprise-platform.

Leave a comment

Filed under Best Practices, Business Productivity Search, Data Governance, eDiscovery & Compliance, Enterprise AI, Enterprise eDiscovery, Enterprise Search, ESI, Information Governance, Information Management

Bringing AI to the Data: How X1 Search v11 Redefines Secure Enterprise Search

By John Patzakis

At X1, we believe the future of enterprise AI depends on a simple but often overlooked principle: data should not have to move in order to become intelligent. With the launch of X1 Search v11, we are introducing a fundamentally different approach—one that embeds AI directly into our index-in-place architecture. Rather than forcing organizations to centralize and copy their data into external platforms, we enable AI to operate exactly where that data already lives. You can read the full press release here: https://www.x1.com/x1-introduces-ai-powered-x1-search-delivering-secure-ai-in-place-for-individual-and-enterprise-users/

This release represents an important milestone for us and for our customers. As Chas Meier noted, “X1 Search v11 marks an important milestone in how organizations can safely apply AI…without compromising the security controls enterprise environments demand.” That statement reflects our core design philosophy: AI must adapt to enterprise security, compliance, and governance requirements—not the other way around.

With X1 Search v11, we are delivering AI capabilities directly within our micro-index. That means organizations can apply advanced intelligence—classification, categorization, and contextual analysis—across emails, files, and collaboration data without ever relocating that information. Everything happens in place, within existing security boundaries, whether on endpoints or across enterprise systems.

For large enterprises, this architecture unlocks an even more powerful capability: the ability to deploy their own trained and curated large language models directly into the X1 index. Instead of relying solely on generic, hosted AI services, organizations can operationalize models tailored to their data that reflect their internal policies, regulatory requirements, and business workflows. These models run directly against their data, in place, delivering highly relevant and controlled outcomes.

This approach stands in sharp contrast to traditional hosted AI platforms. In those models, organizations must copy and transfer massive amounts of sensitive data into third-party hosted AI platforms before any meaningful analysis can occur. That process introduces serious risks. Moving data to outside providers complicates compliance, potentially compromises IP, and creates new attack surfaces that most enterprises simply cannot accept.

Beyond security concerns, the traditional model also breaks down operationally at scale. Enterprises are not dealing with small data sets; they are managing dozens of terabytes of distributed, unstructured data. Attempting to duplicate and transfer that volume is not just costly; it is infeasible. The result is delays, fragmentation, and incomplete analysis—undermining the very promise of AI.

We have taken a different path. By bringing AI to the data through our distributed micro-indexing technology, we eliminate the need for data movement entirely. Models can be deployed directly to where data resides, enabling real-time analysis while preserving security, reducing infrastructure overhead, and scaling seamlessly across the enterprise.

We see X1 Search v11 as more than a product release—it is a shift in how enterprise AI is deployed. Organizations no longer have to choose between innovation and control. With AI in place, they can achieve both.

To see this in action, we invite you to join our upcoming live product tour on Thursday, April 23, providing a guided walkthrough of the new AI-enriched capabilities and flexible model deployment features.

Leave a comment

Filed under Best Practices, Business Productivity Search, Desktop Search, Enterprise AI, Enterprise eDiscovery, Enterprise Search, ESI, Google Workspace, Information Access, Information Management, m365, MS Teams, X1 Search 11

Navigating Legal and Compliance Risks When Corporations Expose Sensitive Data to AI

By Kelly Twigger and John Patzakis

Implementing AI within a corporate environment is no longer a matter of “if” but “how.” We recently addressed these challenges in our webinar, “Navigating Legal and Compliance Risks in AI,” where our panel of experts discussed the strategic transition required to build a robust risk mitigation framework. While the efficiency gains of AI—such as automating workflows and surfacing deep insights—are compelling, introducing sensitive enterprise data into these models without a tactical plan can lead to unintended consequences. These risks range from the dilution of trade secrets to complex eDiscovery obligations and substantial regulatory exposure under the GDPR.

To leverage AI safely, counsel should focus on the following grounded strategies for risk management.

Protect Trade Secrets
Under federal law, trade secret status is contingent upon the owner taking “reasonable measures” to maintain secrecy. This is a rigorous standard; if proprietary information—such as source code or high-value technical data—is fed into an unsecured AI model without strict access controls, a company risks losing its legal protections entirely.

  • Review the Judicial Standard: In Snyder v. Beam Technologies, Inc., the 10th Circuit affirmed that failing to use confidentiality protections or allowing information to reside on unsecured devices can defeat trade secret status.
  • Maintain Active Safeguards: Courts emphasize that consistent and active safeguards are required to maintain secrecy. Lax internal controls during AI interactions can be cited as evidence that “reasonable measures” were not maintained.
  • Implement No-Prompt Zones: Establish “No-Prompt Zones” for your organization’s most sensitive intellectual property. By isolating core IP from third-party cloud models, you maintain a defensible record of “reasonable measures” that can withstand scrutiny in litigation.

Manage the eDiscovery Paper Trail
AI interactions—both the prompts submitted by employees and the responses generated by the tools—are considered discoverable Electronically Stored Information (ESI). These records are part of the corporate record and are subject to subpoena and legal holds.

  • Understand the Technical Reality: Microsoft has confirmed that Microsoft 365 Copilot interactions are logged through the Purview unified audit log, making them searchable, preservable, and producible via eDiscovery tools.
  • Assess Scope of Exposure: Because these chats are treated no differently than emails, they may inadvertently expose privileged or damaging material if not managed properly.
  • Map Information Logs: Update your legal hold workflows to specifically include AI conversation logs and audit trails. Mapping where these logs live before litigation arises ensures a more controlled and cost-effective discovery process.

Navigate GDPR and Data Privacy
Processing customer or employee data through AI models requires strict adherence to the GDPR principles of data minimization, purpose limitation, and lawfulness. Feeding sensitive data into AI models without a clearly articulated lawful basis—such as consent or legitimate interest—can result in significant administrative fines.

  • Meet Compliance Requirements: European authorities require organizations to demonstrate compliance by documenting purposes, limiting data inputs, and ensuring appropriate safeguards are in place.
  • Identify Special Categories: The GDPR is particularly restrictive regarding health information or data revealing racial or ethnic origin, requiring specific exemptions for processing.
  • Conduct Privacy Impact Assessments: Perform mandatory Privacy Impact Assessments (PIAs) for any AI tool that touches personal data. Documenting the purpose and necessity of the processing is critical for maintaining regulatory standing during an audit.

Leverage In-Place AI Functionality
A critical strategy for reducing risk is shifting where the AI processing occurs. Rather than routing data through external, third-party cloud-hosted AI services, organizations should consider prioritizing workflows where AI is applied in-place within the corporate network or controlled enterprise environment.

  • Secure the Data Perimeter: By keeping data and AI processing behind the organization’s own security firewall, you materially reduce the risk of trade secret leakage and data exfiltration.
  • Minimize Third-Party Footprint: Applying AI in-place narrows the scope of discoverable third-party records, as the interactions remain within your internal infrastructure rather than residing on a vendor’s servers.
  • Establish Full Governance Control: This model provides counsel with direct control over privacy, retention, and audit obligations—essentially giving you the “kill switch” for data that you simply do not have with external cloud vendors.

Tactical Governance and Ethical Oversight
Counsel must navigate the professional and technical nuances of AI deployment to ensure long-term stability.

  • Ensure Professional Competence: The ethical duty of technological competence requires attorneys to understand the limitations of the tools they use. AI should be treated as a “junior associate”—capable of great speed but requiring diligent human verification of all output.
  • Apply Risk-Based Tiering: Not all AI use cases carry the same weight. We recommend a tiered approach:
    o Tier 1 (Administrative): Low-risk tasks involving non-sensitive data.
    o Tier 2 (Internal/Marketing): Standard communications requiring routine oversight.
    o Tier 3 (High-Value/Restricted): High-stakes processing involving PII, health data, or proprietary IP, requiring senior legal sign-off and strict data handling protocols.
  • Execute Proactive Vendor Vetting: Move from consumer-grade tools to enterprise solutions that offer SOC 2 Type 2 attestations. Ensure contracts explicitly prohibit the vendor from using your data to train their global models.

In light of these risks, corporate counsel should take a proactive, structured approach to AI governance. This includes implementing data classification and usage controls to prevent sensitive trade secrets from being exposed to AI systems without safeguards; establishing clear policies governing AI prompts, outputs, retention, and eDiscovery treatment; and conducting privacy impact assessments to ensure personal data processing complies with GDPR and similar regulations. In addition, counsel should carefully evaluate AI deployment models and consider workflows in which AI models are deployed in-place within the corporate network or controlled enterprise environment, rather than routed through third-party cloud-hosted AI services. Keeping data and AI processing inside the organization’s security perimeter can materially reduce trade secret leakage risk, narrow the scope of discoverable third-party records, and provide greater control over privacy, retention, and audit obligations—while still allowing the enterprise to realize the benefits of advanced AI capabilities.

For a deeper dive into these strategies and more case studies, you can watch the full session here.

1 Comment

Filed under Best Practices, compliance, Corporations, Cybersecurity, Data Governance, ECA, eDiscovery & Compliance, Enterprise AI, Enterprise eDiscovery, ESI, GDPR, Information Governance, Records Management

AI Without Data Movement: X1’s Webinar Reveals the Future of Secure Enterprise AI

By John Patzakis

X1’s recent webinar announcing the availability of true “AI in-place” for the enterprise was both highly attended and strongly validated by the audience response. The session did more than introduce a new feature; it articulated a fundamentally different architectural approach to enterprise AI—one designed explicitly for security, compliance, and scalability in complex, distributed environments. Our central message was simple: enterprise AI adoption has been constrained not by lack of interest, but by architectural and security requirements that existing platforms have failed to address.

That reality was most powerfully captured in a quote shared on the opening slide from a Fortune 100 Chief Information Security Officer, which set the tone for the entire discussion:

“Normally AI for infosec and compliance use cases is a non-starter for security reasons, but your workflow and architecture is completely different. This allows us – all behind our firewall — to develop our own models that are trained on our own data and customized to our specific security and compliance use cases and deployed in-place across our enterprise.”

This endorsement crystallized the webinar’s core insight: AI becomes viable for the most sensitive enterprise use cases only when it is deployed where the data already lives, rather than forcing data into external or centralized systems.

The technical foundation that makes this possible is X1’s micro-indexing architecture. Unlike traditional platforms built on centralized, resource-intensive indexing technologies, X1 deploys lightweight, distributed micro-indexes directly at the data source. This allows enterprises to index, search, and now apply AI analysis without mass data movement. As emphasized during the webinar, centralized indexing is not just expensive and slow—it is fundamentally misaligned with how modern enterprise data is distributed across file systems, endpoints, cloud platforms, and collaboration tools.

The session then highlighted how this architectural distinction resolves a long-standing problem in discovery, compliance, and security workflows. Legacy platforms require organizations to collect and centralize data before they can analyze it, introducing delays, high costs, and significant risk exposure. X1 reverses that workflow. By enabling visibility and AI-driven classification before collection, organizations can make informed, targeted decisions—collecting only what is necessary, remediating issues in-place, and dramatically reducing both risk and operational overhead.

The discussion also demystified large language models (LLMs), explaining that while model training is compute-intensive, models themselves are increasingly commoditized and portable. Critically, LLMs require extracted text and metadata— processed from native files—to function. This aligns perfectly with X1’s existing capability, as text and metadata extraction are already integral to our micro-indexing process. AI models can therefore be deployed alongside these indexes, operating in parallel across thousands of data sources with massive scalability.

The conversation then connected this architecture to concrete, high-value use cases. In eDiscovery, AI in-place enables faster early case assessment and proportionality by analyzing data where it resides. In incident response and breach investigations, security teams can immediately scope exposure across distributed systems without waiting months for data exports. For compliance and governance, AI models can continuously identify sensitive data, enforce retention policies, and surface risk conditions that were previously impractical to monitor at scale.

In addition to a live product demo showcasing this new capability, we concluded the webinar with several clarifying points and announcements. First, we emphasized that X1 does not access, monetize, or host customer data. Also, AI in-place is not an experimental add-on but an enhancement to a proven, production-grade platform. And notably, there is no additional licensing cost for the AI capability itself—customers simply deploy models within their own environment. With proof-of-concept testing beginning shortly and production deployments targeted for April 2026, the webinar made clear that AI in-place is not a future vision, but an imminent reality for the enterprise.

You can access a recording of the webinar here, and to learn more about X1 Enterprise, please visit us at X1.com.

Leave a comment

Filed under Best Practices, Corporations, Cybersecurity, Data Audit, Data Governance, ECA, eDiscovery, eDiscovery & Compliance, Enterprise AI, Enterprise eDiscovery, ESI, Information Governance

The Business Case for In-House eDiscovery: Lessons from Two Prominent Corporate eDiscovery Counsel

By John Patzakis

Building a Business and Legal Case for In-House eDiscovery

In a recent webinar hosted by Ad Idem, a non-profit legal education provider for in-house counsel, attorneys Kelly Twigger and Eric Stansell offered a compelling roadmap for corporate legal departments looking to bring eDiscovery and information governance (InfoGov) in-house. Their insights cut through the complexity of traditional discovery models and emphasized the strategic, operational, and legal advantages of internalizing these processes. For legal professionals navigating mounting data volumes and rising litigation costs, their discussion provided both practical guidance and a call to rethink legacy workflows.

Eric Stansell, Senior Counsel for Discovery at Tyson Foods, opened with a candid reflection on how his role was created to address the company’s need for a more efficient eDiscovery program. He emphasized that building a business case for in-house capabilities starts with understanding the “why”—whether it’s cost savings, risk reduction, or process defensibility. Stansell emphasized that standardizing internal processes not only improves consistency but also enhances defensibility and reduces exposure by limiting data sprawl across external vendors.

Kelly Twigger — who is one of if not the top eDiscovery lawyer in the field in my opinion — built on Stansell’s narrative by stressing the importance of conducting a thorough assessment before launching any in-house initiative. She encouraged legal teams to break down business cases into manageable chunks, identifying quick wins such as revising email retention policies. Twigger noted that internal culture shifts and stakeholder alignment are just as critical as technology adoption. Her approach favors incremental change backed by measurable ROI, rather than sweeping transformations that risk overwhelming legal and IT teams.

Both speakers underscored the importance of engaging multiple stakeholders. Stansell shared Tyson’s experience with cross-functional collaboration, highlighting how legal, IT, audit, and compliance teams must be involved from the outset. As one example of such collaboration, Stansell noted that eDiscovery enterprise search and collection software procured by the legal team can also address key IT security priorities such as PII data audits and internal investigations.

Twigger also delivered a deep dive into the proportionality principles now codified under the Federal Rules of Civil Procedure, urging legal teams to build factual arguments early in the discovery process. She explained that proportionality isn’t just about cost—it’s about narrowing scope through targeted custodians, refined date ranges, and iterative search terms. Stansell added that understanding custodians’ roles and historical relevance can help avoid unnecessary data collection, further supporting proportionality claims in court.

One of the most pressing issues Twigger addressed was the evolving case law around hyperlinked files. She traced the trajectory from Nichols v. Noom, Inc.—where hyperlinks were deemed not attachments—to more recent rulings that treat them as discoverable content depending on technological capabilities. Twigger cited In re Uber and Young v. Salesforce to illustrate how courts are increasingly expecting parties to preserve and produce hyperlinked documents, especially when shared via chat platforms or collaborative tools.

Twigger warned that failing to understand your organization’s tech stack could lead to costly missteps. She recommended that in-house counsel proactively assess their systems—especially Microsoft 365 environments—to determine what’s feasible when it comes to hyperlink preservation and production. She also highlighted X1 Discovery’s capabilities, noting that X1’s software can automate the collection of contemporaneous versions of hyperlinked documents in M365, support targeted Teams chat collection as well as many other data sources, making X1 a valuable solution for defensible in-house eDiscovery.

In closing, both Twigger and Stansell made it clear that bringing eDiscovery and InfoGov in-house isn’t just a cost-cutting measure—it’s a strategic imperative. With the right mix of technology, process, and cross-functional collaboration, legal departments can gain control, reduce risk, and improve outcomes. Their insights serve as a blueprint for legal teams ready to evolve beyond reactive discovery and toward a proactive, integrated approach.

The recording of the webinar can be accessed here.

Leave a comment

Filed under Best Practices, Case Law, Cloud Data, Corporations, ECA, eDiscovery & Compliance, Enterprise eDiscovery, ESI, Information Governance, m365, MS Teams, Preservation & Collection