Contracts are the operational backbone of every law firm and every union. Collective bargaining agreements, service agreements, NDAs, vendor contracts, settlement documents — legal organisations spend an enormous amount of time reading, comparing, and extracting meaning from dense, clause-heavy text. AI-powered contract review tools promise to compress that work dramatically. Some of them deliver. Many do not.
The gap between marketing claims and actual capability in this space is significant. "AI contract review" can mean anything from a glorified keyword search to a genuinely capable clause extraction and risk analysis system built on large language models. Understanding the difference matters whether you are evaluating a vendor product or deciding to build a purpose-built solution for your firm or union.
This guide focuses on the practical picture — what modern contract AI actually does, where it performs well, where it falls short, and what specific considerations apply to union environments where the stakes of a missed clause can affect thousands of members.
Key Takeaways
- Modern AI contract review uses large language models and NLP to extract clauses, flag risks, and compare documents — it is not keyword matching.
- Union and labour law use cases are distinct from commercial contract review and require models trained or prompted on collective bargaining agreement structures.
- Off-the-shelf tools perform well on standard commercial contracts; they underperform on specialised language, industry-specific standards, and jurisdiction-specific requirements.
- Attorney-client privilege and work product doctrine create real data handling risks when sending contracts to third-party AI platforms — these must be evaluated before deployment.
- Purpose-built AI review systems outperform general tools when the clause taxonomy and risk criteria are specific to a practice area or client type.
- AI review augments legal judgment — it does not replace the lawyer's role in interpreting ambiguous language or assessing strategic risk.
What AI Contract Review Actually Does
The term "AI contract review" covers a range of technical approaches that differ substantially in capability. Understanding what is happening under the hood helps you evaluate whether a given tool can handle your actual workload.
Clause Identification and Extraction
The foundational capability of any contract review system is identifying and extracting specific clause types from unstructured text. Earlier systems used rule-based pattern matching — regular expressions, keyword proximity rules, trained classifiers on labelled clause examples. These systems work reasonably well on standard commercial contracts with predictable structure, but degrade on non-standard drafting, dense legal language, or documents that use unconventional terminology for familiar concepts.
Modern systems built on large language models (LLMs) approach this differently. Instead of matching patterns, the model understands the semantic content of a clause — it can identify a limitation of liability provision even if the section heading says nothing of the kind, or locate an indemnification obligation buried inside a broader warranty section. This is a meaningful capability improvement for legal documents, where the same concept can be expressed in dozens of ways across different drafting traditions.
Risk Flagging
Beyond extraction, capable systems assess extracted clauses against a risk framework. This can mean flagging clauses that are missing (no limitation of liability where one is expected), flagging clauses that deviate from a standard position (uncapped indemnification when your firm's standard is a capped mutual indemnity), or flagging language that is unusual or ambiguous.
Risk flagging quality depends entirely on the quality of the risk framework the system is working against. A well-configured system trained on your firm's standard positions and past negotiated outcomes will flag meaningfully. A generic system applying generic commercial standards may generate noise that is more burden than help — experienced reviewers will recognise most of what it flags and will be irritated by what it misses.
Cross-Document Comparison
One of the most practically valuable use cases is comparing a received contract against a template or a prior version. AI systems can identify deviations — additions, deletions, changed language — and present them in a structured way with the associated risk assessment. This is particularly useful in high-volume environments where the same agreement type is received repeatedly with incremental variations.
The Union and Labour Law Context
Union contract review has specific characteristics that distinguish it from commercial contract review. Collective bargaining agreements (CBAs), memoranda of understanding (MOUs), letters of understanding (LOUs), and grievance procedure documents have a clause taxonomy and a set of risk criteria that differ substantially from commercial or transactional contracts. Most off-the-shelf contract AI tools are trained primarily on commercial and corporate documents. Their performance on CBA-specific language is typically poor without significant customisation.
CBA-Specific Clause Types
The clause types that matter most in union environments are not the same as those in a vendor agreement. Systems designed for commercial contract review often have no concept of:
- Recognition clauses — defining which employees are in the bargaining unit and the scope of union representation
- Management rights clauses — what the employer retains the right to do unilaterally, and what requires bargaining
- Grievance and arbitration procedures — timelines, steps, filing requirements, arbitrator selection, and finality provisions
- Just cause standards — what standard governs discipline and discharge
- Seniority provisions — how seniority is calculated, applied to layoffs, bumping rights, and job postings
- Union security clauses — dues checkoff, agency fee, or union shop provisions (jurisdiction-dependent)
- No-strike / no-lockout provisions — scope, exceptions, and remedies
- Wage and benefit schedules — rate tables, progressions, benefit eligibility triggers
A contract review system for a union legal department needs to understand all of these clause types, know what deviations from standard language are meaningful, and flag issues that are specific to labour law — not commercial contract law.
Grievance Management and Pattern Analysis
Beyond review of the CBA itself, AI tools for union environments have a distinct and highly valuable use case: analysing grievance history against contract language to identify systemic violations, track management compliance patterns, and build stronger arbitration cases.
This requires connecting contract intelligence to case management data — a workflow that no general-purpose contract review tool supports out of the box. Purpose-built systems for union legal departments can extract the relevant contract provisions cited in each grievance, track outcomes, and surface patterns: which supervisors generate the most grievances, which clauses are most frequently in dispute, which arbitrators have ruled in which direction on similar language.
Core Capabilities to Evaluate (or Build)
Whether you are assessing a vendor product or scoping a custom build, the same core capability questions apply. The answers will tell you whether a system will genuinely reduce review time and improve accuracy for your specific workload.
| Capability | What to Look For | Why It Matters |
| Clause taxonomy coverage | Does the system's taxonomy include the clause types you actually review? | Gaps in the taxonomy mean gaps in the review — clauses the system doesn't know about won't be flagged |
| Custom playbook support | Can you define your own standard positions and acceptable deviations? | Generic risk standards generate noise; your standards drive value |
| Training data provenance | What documents was the model trained on, and are they relevant to your practice area? | A model trained on Silicon Valley commercial contracts will underperform on CBA language |
| Confidence scoring | Does the system indicate confidence levels on extractions and risk flags? | Unconfident flags need human review; confident flags can be triaged faster |
| Citation and traceability | Does every output link back to the specific contract language that produced it? | Lawyers must be able to verify every flag — black-box outputs are unusable in practice |
| Comparison and redline support | Can the system compare received contracts against templates or prior versions? | Version comparison is the highest-ROI use case for many firms |
Privilege and Data Handling
Sending client contracts to a third-party AI platform raises real professional responsibility questions that many firms are not fully working through before deploying these tools. The concerns are not hypothetical — they have direct implications for privilege, confidentiality obligations, and client consent requirements.
Attorney-Client Privilege and Third-Party Disclosure
Attorney-client privilege generally protects confidential communications between a lawyer and client made for the purpose of legal advice. The privilege can be waived by voluntary disclosure to third parties. Whether sending a client's contract to a cloud-based AI review platform constitutes a waiver of privilege is not fully settled law, but the risk is not zero — particularly if the platform's terms of service include provisions allowing the vendor to use uploaded documents to improve its models.
Before deploying any third-party contract AI tool, your firm should:
- Review the vendor's data processing terms carefully — specifically whether uploaded documents are used for model training
- Confirm whether a data processing agreement is available and what it covers
- Assess whether your jurisdiction's ethics rules require client disclosure or consent before sending their documents to a third-party processing system
- Determine whether you need a data retention and deletion policy that requires the vendor to delete documents after processing
Work Product Doctrine
Attorney work product — documents and mental impressions prepared in anticipation of litigation — receives separate protection. AI-generated analysis of a contract prepared in anticipation of a dispute may itself be work product. The implications of storing that analysis on a third-party platform, and what happens to it if the vendor is subject to a subpoena or breach, are worth working through with your risk committee before deployment.
Self-Hosted vs. API-Based Architecture
For firms or unions handling a high volume of sensitive matters, the data handling question often resolves in favour of self-hosted or private-deployment architecture. Running an open-source or commercially licensed LLM within your own infrastructure — or within a cloud environment under a robust data processing agreement — eliminates the third-party disclosure risk entirely. The tradeoff is infrastructure cost and operational overhead. For high-sensitivity practice areas, it is often the right tradeoff.
Several well-funded contract AI platforms — Kira Systems, Luminance, Ironclad, Spellbook, and others — perform well in the use cases they were designed for. That use case is typically commercial and corporate transactional work: M&A due diligence, vendor agreements, real estate, financial contracts. Their clause taxonomies, training data, and risk frameworks reflect that focus.
The situations where these tools consistently underperform are predictable:
- Specialised practice areas — labour law, healthcare regulatory, Indigenous land claims, environmental law, and other domains where the clause vocabulary and risk criteria are highly specific
- Non-standard or bespoke drafting — contracts drafted outside standard commercial templates, particularly in smaller markets or older institutional documents
- Non-English documents — most tools are English-centric; French, Spanish, and multilingual document review capabilities vary significantly
- Historical document analysis — scanning and reviewing archives of legacy contracts, particularly those that exist only in scanned paper form, introduces OCR quality problems that degrade AI performance
- Jurisdiction-specific requirements — what constitutes a legally required notice period, a compliant non-compete clause, or an enforceable arbitration agreement varies by province, state, and country; generic tools do not know your jurisdiction's rules
In these situations, the choice is between accepting degraded performance, building a custom system, or heavily customising a flexible platform. The right answer depends on volume, stakes, and the degree to which your workload diverges from the tool's designed use case.
Building a Purpose-Built Contract Review System
When off-the-shelf tools cannot meet the requirements of a specific practice area or organisation, a purpose-built system becomes the correct investment. The architecture of a modern AI contract review system is well-established. The complexity lies in the domain-specific layer — the clause taxonomy, the risk framework, and the prompt engineering that translates legal knowledge into reliable AI output.
The Core Pipeline
A purpose-built contract review system typically involves five components working in sequence:
- Document ingestion — accepting contracts in PDF, DOCX, and scanned formats; OCR for scanned documents; text normalisation and structure extraction (identifying sections, headings, numbered clauses)
- Clause segmentation — dividing the document into meaningful units for analysis; this can be sentence-level, clause-level, or section-level depending on the use case
- Clause classification — identifying what type of clause each segment represents, using the organisation's clause taxonomy
- Risk assessment — evaluating each classified clause against the organisation's playbook, flagging deviations, missing provisions, and unusual language
- Report generation — producing a structured review output that presents findings in a format lawyers can act on, with citations back to the source language
The Role of LLMs vs. Fine-Tuned Models
Current LLMs like Claude and GPT-4 perform well at steps 3, 4, and 5 when given well-designed prompts and sufficient context. Their general language understanding handles novel phrasings and complex clause structures that would defeat a traditional classifier. The tradeoffs relative to fine-tuned models are real — general LLMs are less predictable on edge cases, more expensive per document at scale, and require careful prompt engineering to maintain consistent output structure.
Fine-tuned models trained on your specific document corpus can outperform general LLMs on routine clause classification once you have sufficient labelled data. A practical architecture for most organisations starts with LLM-based prompting for rapid deployment and flexibility, then identifies the highest-volume, highest-confidence cases as candidates for fine-tuning once you have accumulated enough reviewed output to create training data.
Structured Output and Citation
Every extraction and risk assessment in a legal AI system must cite the specific contract language that produced it. Lawyers cannot rely on outputs they cannot verify, and any AI system deployed in a legal context must be designed from the beginning to surface its reasoning and its sources. This means requiring structured output from your LLM calls — JSON responses that include the extracted text, its location in the document, the clause type assigned, the risk assessment, and the rule or standard that produced the flag.
FAQ
Can AI contract review replace junior associate document review work?
For well-defined, high-volume review tasks — identifying specific clause types, flagging missing provisions, comparing received contracts against templates — AI review can handle the bulk of the work that would otherwise fall to junior associates. This does not mean those hours disappear; it means they shift toward higher-value analysis, exception handling, and the strategic judgment that AI cannot reliably provide. Firms that have deployed contract AI at scale consistently find that it changes the composition of associate work rather than eliminating it.
Accuracy is clause-type-specific and context-dependent. For standard commercial clause types in well-formatted documents, mature platforms report extraction accuracy rates above 90% on their designed use cases. Accuracy degrades for specialised clause types outside the training distribution, unusual or archaic drafting styles, poor-quality scanned documents, and clauses whose meaning depends heavily on context outside the four corners of the document. Accuracy claims from vendors should always be tested on a representative sample of your actual documents before deployment.
What is the risk of AI hallucination in contract review?
Hallucination — generating plausible but incorrect output — is a real risk in any LLM-based system. In contract review, the primary hallucination risk is in risk assessment and summary generation rather than in direct extraction. A well-designed system mitigates this by requiring the model to cite source language for every finding and by separating the extraction step (lower hallucination risk) from the assessment step. Any production legal AI system must have a human-in-the-loop review process for its outputs — AI findings should be treated as a first-pass triage, not a final review.
How do we handle contracts in French or other languages?
Most major LLMs have strong French and Spanish language capability, though performance is generally somewhat below their English-language performance. For Canadian legal organisations reviewing French-language contracts or bilingual CBAs, this is a tractable problem but requires testing. Jurisdiction-specific legal terminology — particularly Quebec civil law concepts — can be problematic for models trained primarily on common law documents. Evaluating language performance on a representative sample from your actual document corpus is essential before relying on any tool for multilingual review.
What should a union look for when evaluating contract AI vendors?
Start with the clause taxonomy — does it cover CBA-specific clause types, or is it designed for commercial contracts? Ask the vendor for a demonstration on an actual collective bargaining agreement from your sector. Evaluate confidence scoring and citation support. Assess data handling terms carefully — union contracts contain sensitive member information and strategic negotiating positions that should not be used to train a vendor's general model. Finally, assess the customisation path: can you define your own standard positions and risk rules, or are you locked into generic commercial standards?
For most law firms and unions, the starting point should be evaluating existing platforms with a rigorous test on your actual document types. If those platforms perform adequately on your core use cases, the build vs. buy calculus typically favours buying — maintenance cost and required ML expertise make custom builds expensive to sustain. The cases where custom builds are warranted are: highly specialised practice areas where no existing tool has adequate training data, organisations with very high review volume where a purpose-built system pays back at scale, and situations where data sovereignty or privilege concerns rule out third-party platforms entirely.
Last updated: April 2026