Tag Archives: micro-indexing

Why X1’s AI In-Place Architecture Is a Genuine Departure from Legal AI’s Status Quo

By John Patzakis

X1 AI In-Place Architecture — AI hub connecting to distributed enterprise data sources including Microsoft 365, email, cloud, and endpoints

The legal technology market has a buzzword problem. Terms like “AI-powered,” “intelligent review,” and “automated analysis” have been applied so broadly—and so inconsistently—that they have largely lost their ability to signal anything meaningful about how a product actually works. Against that backdrop, X1’s announcement last week of AI In-Place for X1 Enterprise represents a genuinely different approach to applying AI within enterprise legal and compliance workflows. The reason for this basis is X1’s unique architecture.

To understand why, it helps to start with the dominant model that most legal AI tools share. The overwhelming majority of AI-enabled eDiscovery and governance platforms are built on a collect-first assumption: data must be moved out of its native environment—copied, ingested, centralized in a vendor-controlled repository—before any AI model can be applied to it. This is not an incidental design choice; it reflects the fundamental architecture of how most of these platforms were built, long before AI became part of the product story. The result is what practitioners have come to call the “prompt wrapper” problem: an AI interface sits in front of a conventional data pipeline, and the underlying mechanics—the cost, the risk, the latency—remain largely unchanged. A large language model with a “middleware” workflow does not solve the structural problem of what happens to sensitive data before the AI touches it.

X1’s AI In-Place architecture inverts that assumption. Rather than requiring data to travel to an AI system, X1’s patented distributed micro-indexing technology deploys AI models directly into lightweight micro-indexes at the data source itself—across Microsoft 365 environments, file shares, cloud repositories, and endpoints. The AI executes where the data lives, and the data does not move. The implications run across multiple dimensions: data never leaves the enterprise perimeter, security policies and endpoint controls remain intact throughout the process, and the computational overhead and massive AI token costs associated with large-scale data ingestion is avoided entirely. For matters involving a terabyte of data or more—where centralized collection is not merely expensive but operationally infeasible—this architectural distinction is not incremental. It changes what is actually possible.

The workflow mechanics reinforce the point. AI models are deployed into X1’s distributed micro-indexes behind the firewall, execute against enterprise data in place, and surface AI-enriched insights—tags, classifications, risk scores—into a central console without the underlying data ever being collected or copied. That means targeted collection decisions, early case assessment, and information governance actions can be driven by AI-informed analysis conducted across the full enterprise data landscape, not just against a subset of data that has already been moved. The distinction matters because the scope of analysis in the collect-first model is constrained by collection costs; in the in-place model, analysis scope is no longer tethered to collection volume. Investigations and governance programs can, in principle, cast a much wider net analytically while actually reducing the volume of data that requires review.

Mandi Ross, CEO of Insight Optix, offered a perspective that cuts to the core of what makes this architecture commercially significant: “Enabling AI directly where the clients’ data resides fundamentally changes the economics, speed, and risk profile of enterprise data discovery, investigations and compliance workflows. With X1 Enterprise AI In-Place, we can deploy AI models, pre-trained or customized for specific matters, data queries, or compliance requirements—securely within client environments, dramatically accelerating time to insight without sensitive information being collected, duplicated, or centralized outside their control.”

Ross identifies three dimensions the in-place approach changes: economics, speed, and risk. On economics, a significant lever is the reduction in review population size—AI-informed pre-collection filtering means fewer documents proceed to human review. Additionally, costs associated with collection and processing, including expensive AI token utilization, are all but eliminated. On speed, running analysis in situ, without waiting for collection and ingestion cycles, compresses time to first insight—critical in time-sensitive investigations and regulatory responses. On risk, data that does not move cannot be breached in transit, does not reside in vendor infrastructure outside the client’s control, and does not generate the compliance exposure of large-scale cross-boundary transfers. Her comment reflects what experienced practitioners understand but marketing language tends to obscure: the most consequential question about any legal AI tool is not what the AI does, but what happens to the data before and during its operation.

The enterprise deployment model reflects design discipline that distinguishes AI In-Place from retrofitted solutions. Organizations retain centralized governance over AI usage while processing remains local under existing security policies and endpoint controls. AI capabilities are fully optional and configurable at the data source level—important for organizations operating across multiple jurisdictions with differing regulatory requirements—and customer data is never used to train, fine-tune, or enrich underlying AI models, addressing a standard due diligence concern in enterprise AI procurement.

The practical use case implications are significant across several domains. In legal and eDiscovery contexts, in-place TAR and pre-collection analytics allow AI-informed decisions about what to collect before collection begins, directly reducing review volumes and costs. In information governance, AI-driven classification and policy enforcement can operate continuously across the full enterprise data estate rather than against periodic snapshots, enabling more responsive and defensible governance programs. In security and investigations, real-time insider risk detection at petabyte scale—across endpoint and cloud environments simultaneously—becomes feasible where centralized architectures make it impractical. In each case, analytical scope is no longer constrained by collection logistics.

Most legal AI products apply AI to data after it has already moved through the conventional collection pipeline. AI In-Place asks a more fundamental question: whether the pipeline itself should be reconceived. We will demonstrate it live on Wednesday, June 24—for those evaluating enterprise AI in legal, compliance, or governance contexts, it is worth seeing what a genuinely different architecture looks like in practice.

Register for the June 24 AI In-Place™ Product Tour →

Leave a comment

Filed under Best Practices, Cloud Data, Corporations, Cybersecurity, Data Audit, Data Governance, ECA, eDiscovery & Compliance, Enterprise AI, Enterprise eDiscovery, Enterprise Search, ESI, GDPR, Information Access, Information Governance, Information Management, m365, MS Teams, OneDrive, SharePoint

The Three Different eDiscovery Approaches to Address Microsoft 365 Data

By John Patzakis

Microsoft reports 345 million paid users worldwide of its Microsoft 365 platform (“M365”), spanning over two million companies, with more than one million of them based in the United States. M365’s cloud-based data sources such as OneDrive, Outlook mail, Teams and SharePoint online represent arguably the majority of ESI being produced in litigation going forward. However, M365 presents significant eDiscovery challenges and costs, requiring legal and eDiscovery professionals to be aware of the various methods to address this critical data source.

This article briefly addresses the benefits and challenges of each of the three main approaches to addressing eDiscovery and information governance in M365: 1) Utilizing Microsoft Purview; 2) Outsourced Services; or 3) Relying on a 3rd Party Purpose-build eDiscovery Solution.

Microsoft Purview
Microsoft Purview is the built-in M365 eDiscovery tool. It comes in different licensing tiers, the highest and most useful being Premium, or also known as E5 licenses. A key benefit of utilizing Purview Premium is that it’s integrated with M365, which is obviously convenient for workflow and also budgeting. Purview features a good legal hold process that allows the application of legal holds in place for key M365 data sources.

There is also a good consultant ecosystem to provide training and add-on services, which are often needed to address the larger projects at extra cost. And a premium license provides other functionalities unrelated to eDiscovery such as data analytics for business as well as a lot of security functions.

As far as the challenges of MS Purview Premium that we hear from users, a common complaint is that it can be very expensive, with licenses costing about $600 per employee annually. For large cases, licenses for several thousand custodians run in the millions of dollars and well into the tens of millions when you are dealing with a company with about 40,000 employees.

But the biggest complaint that we hear is that it’s not suited for large cases, M365 is built for user productivity, and the shared architecture is designed to support hundreds of millions of global users with normal individual workloads. eDiscovery and information governance projects are very large and aberrant workloads, so the system is designed to throttle large data throughputs. For instance, when you start a case in Purview, a separate and new index is created to allow eDiscovery and compliance searches in Purview, but there is a 2 GB hourly limit when creating this index — according to Microsoft’s own documentation — which limits your ability to address larger cases in a timely manner. There are many documented concerns about the accuracy and transparency of search results and data exports, especially as cases get bigger and there’s more custodians with higher volumes. Also, large attachments over 150 mb are not being a supported, as well as many filetypes such as engineering files like CAD drawings. MS only supports 50 file types, while the right eDiscovery software will support over 500.

These search accuracy and throughput limitations were called out by a Special Master Phillip Favro in the case of Deal Genius, LLC v. O2COOL, LLC, No. 21-C-2046, 2022 WL 17418933, at *1–2 (N.D. Ill. Oct. 24, 2022), and further expounded upon by Favro is his recent technical whitepaper:

“Purview eDiscovery does not provide the advanced features offered by a full service e-discovery platform needed to support discovery efforts in complex cases such as multidistrict litigation and class actions or regulatory investigations like Hart-Scott-Rodino Second Requests. Even small lawsuits that involve high volumes of ESI can present difficulties for organizations that wish to manage much of their discovery process with Purview eDiscovery. Responding parties that rely on Purview eDiscovery may not be able to perform a comprehensive search to reasonably identify relevant information. Responding parties who wish to incorporate Purview eDiscovery functionality into their discovery workflows must understand its search limitations and take steps to address them so they can establish the defensibility of their discovery process.” “Microsoft Purview eDiscovery: Key Features and Limitations,” Practical Law (July 2024).

Finally, Purview only addresses data within 365. It’s not going to address data sources such as Slack, or on-premises sources including laptops, fileshares, even on prem exchange or on-Prem SharePoint.

Outsourced Services
The second approach to addressing M365 for eDiscovery is to retain an outsourced service provider. There are well over 100 consulting firms that perform such services, and the main benefit is that the right consultants can get the job done. The consultants know how to export M365 data into a standard eDiscovery workflow, are very good at project management, and are well-versed with working with attorneys and their litigation deadlines. For companies that are smaller without the internal resources or expertise or have backlogs, this can be a good approach.

The main drawback is that it can be very expensive, because often times what we generally see is the service providers parachute in and run very basic scripts to conduct a mass data export from M365. After that, it defaults to a traditional eDiscovery workflow with processing tools, a lot of manual services, and then an upload to a standard review platform. This reactive approach results in a high amount of expensive data overcollection. Additionally, outsourced service providers typically require very high level, super-admin privileges in order to run their bulk data download scripts, which can be a significant concern from a security standpoint. These privileges can be delegated sometimes without the company’s knowledge, so it is important to be aware of and audit the privileges that are being granted.

Also, we have seen that for large eDiscovery collection projects in Europe, EU based companies are required to perform a data protection impact analysis (DPIA), and mass bulk collections involving copying of all the employees’ emails and other sensitive files and taking that data offsite are frowned upon by privacy auditors. That approach runs afoul of the GDPR’s proportionality and data minimalization requirements.

Third Party eDiscovery Software Solution
And finally, a third approach is utilizing a non-Microsoft eDiscovery solution that’s purpose- built to conduct eDiscovery, including by connecting to M365. A benefit of this approach is that the right solution can scale for larger data sets. This is particularly important for information governance projects such as data compliance audits. The good solutions will not require expensive Premium Purview licensing for every custodian and will enable you to employ it as an established and repeatable process. It can also address the indexing throughput and completeness challenges in Purview. And finally, a platform like this should be able to support data outside of M365 such as on-premises sources or data such as Slack.

One of the challenges of an in-house system is that internal IT resources or tech savvy paralegals are needed to run the process. Some technology platforms still require you to have the most expensive Purview Premium licensing to support essential functionality, such as collection of hyper-linked documents, and other key features. Further, many of these vendors are simply providing repurposed email archiving platforms, which function by a mass copy and transfer of all the organization’s data in M365. This poses significant logistical challenges in terms of scalability, not to mention unnecessary cost. M365 does not easily allow for the mass data download, which can lead to errors and data corruption, as in the recent case of FTC v. Match Group, No. 3:19-CV-2281-K, 2025 WL 46024, at *4 (N.D. Tex. Jan. 7, 2025) where MS Purview exports to an email archival system failed, resulting in court imposed discovery sanctions. So, if the solution does not allow for index in place functionality, but a bulk download, copy and data transfer, then there can be significant challenges with that approach.

The X1 Enterprise platform for 365 and on-premises sources takes a unique approach with a micro indexing architecture so that each data source and each custodian is associated with their own index. This enables a true index in place keep capability for targeted search and analytics at the point of collection, which enables the bypassing of most of the M365 throttling issues so that hundreds of custodians can be addressed in hours, not weeks. Our customers have successfully addressed matters involving thousands of custodians and upwards of 80 terabytes of M365 data that was indexed in a very short period of time. X1 Enterprise does not require Purview Premium licensing to address all the required functionality, such as the search and collection of hyperlinked files, archived email, inactive mailboxes, as well as many other detailed requirements.

Simply put, we believe X1 Enterprise is the best solution available to address M365 data for eDiscovery and information governance requirements.

Ready to Learn More?
For companies navigating complex information governance and eDiscovery requirements, including those involving M365, organizations that rely on the  X1 Enterprise Platform  not only reduce costs and save valuable time but also gain a strategic advantage in managing their eDiscovery and information governance needs. For a demonstration of the X1 Enterprise Platform, contact us at sales@x1.com. For more details on this innovative solution, please visit www.x1.com/solutions/x1-enterprise-platform.

Leave a comment

Filed under Best Practices, Cloud Data, eDiscovery, eDiscovery & Compliance, Enterprise eDiscovery, Information Governance, law firm, m365, Preservation & Collection