Before June 2023, the legal profession's relationship with artificial intelligence was defined by cautious optimism. Law firms experimented with document review tools, contract analysis platforms, and research assistants. The prevailing assumption was that AI would reduce costs and improve efficiency in legal work, and that the primary risk was displacement of junior associates.

Then came Mata v. Avianca, Inc., No. 22-cv-1461 (S.D.N.Y.), and the assumption shattered.

What Happened in Mata v. Avianca

The facts are by now well known. Attorney Steven Schwartz used ChatGPT to research case law for a motion opposing the dismissal of a personal injury claim. The AI generated a brief citing six judicial opinions, complete with docket numbers, reporter citations, and quotations from the holdings. Schwartz included these citations in a filing with the Southern District of New York without independent verification.

None of the cases existed.

When opposing counsel flagged the fabricated citations, Schwartz went back to ChatGPT and asked whether the cases were real. The model assured him they were. He submitted an affirmation to the court attesting to their validity. Judge P. Kevin Castel ultimately sanctioned Schwartz and his colleague Peter LoDuca, finding that they had "abandoned their responsibilities" by submitting "bogus judicial decisions with bogus quotes and bogus internal citations" and failing to undertake "the most basic follow-up research."

The sanctions were relatively modest -- $5,000 in fines. The reputational damage was not. Schwartz and LoDuca became the cautionary tale that every legal technology conference would reference for years to come.

Why General-Purpose LLMs Are Dangerous for Legal Work

The Schwartz incident revealed a fundamental mismatch between what general-purpose language models are designed to do and what legal work requires. Language models are trained to produce statistically likely continuations of text. When asked for a legal citation, a model produces text that looks like a legal citation -- correct reporter abbreviations, plausible volume and page numbers, realistic-sounding case names. The model is not consulting a database of actual cases. It is generating patterns that match the statistical distribution of legal text in its training data.

This is not a flaw in the model. It is the model working exactly as designed. The problem is that legal practice requires not just plausible text but verifiable truth. When a lawyer cites Smith v. Jones, 500 F.3d 200 (2d Cir. 2007), they are making a representation to the court that this case exists, that it says what the brief claims it says, and that it supports the proposition for which it is cited. Each of these representations must be independently verifiable. A language model's confidence in its output provides no verification whatsoever.

The danger compounds because the failure mode is invisible to the untrained eye. A hallucinated citation looks exactly like a real one. There is no formatting error, no grammatical tell, no visual indicator that distinguishes fabricated authority from genuine authority. The only way to detect the fabrication is to look up the case -- which is precisely the step that Schwartz skipped.

The Standard Legal AI Must Meet

The post-Schwartz era has clarified, with brutal precision, the standard that any AI system used in legal practice must meet. This standard is not aspirational. It is the minimum required to avoid professional sanctions, malpractice liability, and harm to clients.

  1. No hallucinated citations. Every authority cited in the output must be a real case, statute, or regulation that exists in the legal system. The system must distinguish between "I found this authority" and "I generated text that looks like an authority."
  2. Full provenance tracking. Every factual assertion must be traceable to a specific source document. Every legal conclusion must be traceable to identified authorities and factual predicates.
  3. Jurisdiction awareness. The system must distinguish between binding and persuasive authority and apply jurisdiction-specific rules to every analysis.
  4. Adverse authority disclosure. The system must identify and disclose authorities that cut against the client's position. Under Rule 3.3 of the Model Rules of Professional Conduct, a lawyer has an obligation to disclose directly adverse controlling authority. An AI system that suppresses unfavorable precedent creates an ethical trap for the attorney who relies on its output.
  5. Human review as the final gate. No system output should be filed, sent, or relied upon without review by a licensed attorney who takes professional responsibility for the work product.

How Themis Prevents Hallucinated Citations

Themis's architecture addresses the hallucination problem at a structural level, not through prompt engineering or output filtering. The key insight is that citation hallucination occurs because drafting and research are conflated in a single generation pass. When a model is simultaneously composing an argument and generating citations, it has no mechanism to distinguish between citations it has verified and citations it has invented to support the argument it is constructing.

Themis separates research from drafting into distinct phases, executed by distinct agents with distinct quality gates.

The Doctrinal Expert Agent (DEA) performs legal research during the Research and Retrieval phase. Its sole task is to identify relevant authorities -- both supporting and adverse -- for each legal issue identified during issue framing. The DEA's output is a structured set of authority records, each with a full Bluebook citation, the relevant holding, and a classification of the authority's relationship to the client's position.

The Document Drafting Agent (DDA) operates during the Draft and Review phase. When it needs to cite authority, it draws exclusively from the DEA's research output. It cannot generate new citations during drafting. The orchestrator enforces this constraint by validating that every citation in the DDA's output appears in the DEA's authority set.

If the DDA needs an authority that the DEA has not identified, the orchestrator does not allow the DDA to fabricate one. Instead, it routes a new research task back to the DEA, which performs targeted research to find the needed authority. If no authority exists to support the proposition, the system flags it as an unsupported assertion rather than generating a fictitious citation.

When uncertain about a legal principle, flag it as an unresolved issue rather than guessing. No hallucinations.

Human Review as the Final Gate

Structural safeguards reduce the risk of hallucination, but they cannot eliminate it entirely. No AI system should be trusted to produce filing-ready documents without human review. This is not a limitation of Themis's architecture -- it is a design principle.

Every artifact that Themis produces is explicitly designated as a draft for attorney review. The system does not file motions, send demand letters, or advise clients. It produces work product that a licensed attorney evaluates, refines, and takes professional responsibility for before it reaches any audience outside the firm.

This is the appropriate division of labor between AI and human expertise. The system handles the bandwidth-intensive tasks -- reading thousands of pages, identifying relevant authorities across multiple bodies of law, structuring analysis, drafting initial documents. The attorney brings judgment, strategic intuition, client knowledge, and professional accountability that no system can replicate.

The Future of AI in Litigation

The Schwartz debacle was a setback for legal AI, but it was also a clarifying moment. It established, for the entire profession, what the minimum acceptable standard for legal AI looks like. Any system that cannot track provenance, verify citations, and flag uncertainty is not ready for use in legal practice.

We believe the future belongs to systems that meet this standard -- systems that augment attorney capacity without replacing attorney judgment, that produce verifiable work product rather than plausible text, and that are transparent about their limitations rather than confident in their errors.

The post-Schwartz era is not a retreat from AI in legal practice. It is the beginning of responsible AI in legal practice. The attorneys who were sanctioned in Mata v. Avianca were not punished for using AI. They were punished for using it without verification. The lesson is not that lawyers should avoid AI tools. The lesson is that AI tools must be built, from the ground up, to make verification possible.

That is what Themis is built to do.