Audit vs. First-Pass Review: Two Workflows, Two Mindsets
"Contract review" is one phrase that covers two very different workflows. One is the work you do on a clean third-party draft sitting on your desk for the first time. The other is the work you do on a marked-up version of your own draft that just came back from the counterparty. They look superficially similar (a lawyer reading a contract, producing edits), but the inputs, the outputs, and the kinds of attention each one demands are different enough that conflating them produces bad tooling and bad workflow advice.
This post is the precise statement of the difference. The shorthand we use: first-pass review for the clean-paper workflow, redline audit for the marked-up-paper workflow. Both deserve their own discipline. Neither should pretend to be the other.
First-pass review: clean paper, playbook out
First-pass review starts with a single document. Usually the counterparty's standard form, sometimes your own template, sometimes a fresh draft from another lawyer. The document has no markup. The reviewer's job is to identify problems against a standard: either an explicit playbook ("our DPA must include X, must not include Y") or implicit drafting expectations ("the indemnity should be capped, the cap should be in line with the fees, the carve-outs should be enumerated").
The output is the markup the reviewer produces. Tracked changes, comments, redlines.
The mental model is essentially:
- Input: one document, clean (or unfamiliarly marked).
- Reference: a playbook or set of drafting standards.
- Process: read the document, compare it against the reference, mark up the deltas.
- Output: a tracked-change document going outbound.
Most contract-AI tools sold today are built for this workflow. Spellbook, LegalOn, Ivo, Gavel Exec, BlackBoiler, LexCheck, DocJuris, LegalSifter all start from the assumption that a clean (or mostly clean) document arrives, and the AI's job is to produce the markup. Some of them differentiate on playbook customization, some on precedent grounding, some on chat versus auto-redline, but the workflow shape is the same: one document in, markup out.
Redline audit: markup in, verdict out
Redline audit starts with two documents. The version you sent and the version that came back. The markup already exists. The reviewer's job is not to identify problems against a playbook; it is to form a verdict on a specific set of edits already proposed by a specific counterparty.
The output is a structured analysis of the edits, which then becomes the basis for the reply going back.
The mental model:
- Input: two documents (sent and received).
- Reference: the diff itself, plus context about the deal and the negotiation history.
- Process: compute the diff, group it into reviewable units, classify by substantiveness, surface untracked edits, rank by importance.
- Output: a verdict on each change (accept, reject, counter), feeding the reply.
Almost no contract-AI tool on the market is built primarily for this workflow. Some can be adapted to it (Spellbook's compare-to-market, Ivo's intelligence layer, the various tools that read counterparty redlines as input), but the optimization is for first-pass review with audit as a secondary mode. The workflow has been vendor-neutral territory, owned by Word's built-in Compare and a handful of legacy comparison tools (Litera, Workshare, Draftable).
Why the data model is different
The reason the same tool rarely does both jobs well comes down to what is at the center of the data model.
For first-pass review, the document is at the center. The reviewer (or AI) holds the playbook in one hand and walks the document clause by clause looking for matches and mismatches. The data model is "document + playbook" and the engine is a pattern-matcher: does this clause match the playbook entry for indemnity? Yes; flag it. No; what should be there?
For redline audit, the diff is at the center. The reviewer (or AI) holds two documents and asks: what is different, and how should each difference be classified? The data model is "diff + classifier" and the engine is a difference analyzer: which changes are substantive, which are noise, which were tracked, which were not, which are connected to which.
The downstream consequences:
- A first-pass tool needs a strong playbook engine. Customizable rules, clause-type recognition, reference precedents. Spellbook's "Compare to Market," LegalOn's playbook customization, Ivo's playbook agents are all expressions of this.
- An audit tool needs a strong diff engine. Deterministic, noise-suppressing, move-detecting, anchor-aligned, capable of grouping related changes across the document and surfacing untracked edits. The diff engine is the thing the workflow stands on.
- The interaction layer differs. A first-pass tool wants a playbook configurator and a clause-by-clause review pane. An audit tool wants a ranked queue of changes, an issue grouping, a side-by-side view, and a clean path from "I made this decision" to "this is in the reply."
Clausul's runtime is built around the audit data model: a deterministic diff produces atoms with positions, atoms group one-to-one into changelets, changelets become review cards, related cards group into thematic issues. The first-pass review workflow is not the workflow the runtime is optimized for, and we say so explicitly.
Why the failure modes are different
The two workflows fail in different ways, which is why the same QA discipline does not protect both.
First-pass review fails by missing what the playbook would have caught. A clause that should have triggered the indemnity rule but did not. A missing confidentiality carve-out. A jurisdiction the playbook flags as high-risk that did not get flagged this time. The QA work is about playbook coverage: are the rules complete, are they correctly applied, are the edge cases handled.
Redline audit fails by missing what the counterparty changed. An untracked edit. A subtle word swap inside a long sentence. A formatting move that actually changed the operative text. A defined-term rename whose effects in section 14 are easy to overlook because the visible markup is in section 2. The QA work is about diff completeness and classification accuracy: are all the differences surfaced, are they grouped sensibly, is the substantiveness call right.
The two failure modes share almost nothing operationally. A team that has its first-pass review tooling under control can still be missing untracked edits in every round of every negotiation, because the discipline that catches playbook gaps is not the discipline that catches counterparty drift. The reverse is also true.
Why the same tool rarely does both well
Tooling tends to optimize for what it markets. The "AI contract review" category as it stands today markets first-pass review almost exclusively. Compare-to-market features are the closest thing to an audit verb in the category leaders, but they are framed as "compare your draft against market practice," not "audit the markup the counterparty just sent."
The result is that buyers who do most of their work auditing counterparty markups (a large share of transactional and in-house practitioners, especially after the first round of any deal) are buying tools whose primary optimization is for the smaller half of their job.
The practical signs that a tool is first-pass-first:
- The product takes one document as input and asks for a playbook or instruction.
- The output is markup, not a verdict on existing markup.
- The "compare to a previous version" feature is a sub-mode, not the main mode.
- The marketing language is "review your contracts," not "audit the markup you got back."
The practical signs that a tool is audit-first:
- The product takes two documents as input by default.
- The output is a structured analysis of the diff: classification, ranking, issue grouping.
- Untracked edits are surfaced as a distinct category, not blended with tracked edits.
- The reply step (producing a tracked-change document going back) is part of the same workflow.
Which one are you actually doing?
For most lawyers, the honest answer is "both, in different proportions." A few things shift the proportion:
- Stage of negotiation. First round is usually first-pass review. Every round after is usually audit. Deals that close after multiple rounds spend most of their lawyer-time in audit mode.
- Whose paper. If the counterparty's paper is the starting point, round one is first-pass review on a clean third-party document. If your paper is the starting point, round one is the markup you produce, and round two onward is audit.
- Practice area. Transactional lawyers (M&A, commercial, finance, real estate) spend disproportionate time in audit. Regulatory and compliance lawyers spend disproportionate time in first-pass review.
- Volume. High-volume in-house teams (procurement, sales-ops, vendor management) live in audit, because the inbound flow is mostly counterparty markups on standard agreements.
The point of being precise about which workflow you are running is not academic. It determines what kind of tool helps, what kind of discipline matters, and where the failure modes are. A first-pass review tool used in audit mode is doing a job it was not optimized for; the discipline of running an independent comparison and surfacing untracked edits is no less important just because the tool does not encourage it.
Frequently asked questions
Are first-pass review and redline audit really different workflows?
Yes. First-pass review takes one document and produces markup. Redline audit takes two documents and produces a verdict on the difference between them. The inputs are different, the output is different, and the failure modes are different. A tool that is excellent at one is often mediocre at the other because the data model that makes one workflow fast (a document and a playbook) is not the data model that makes the other fast (a diff and a classifier).
Can the same tool do both?
Some tools try. Most do one well and the other as an afterthought. The category leaders in AI contract review (Spellbook, LegalOn, Ivo, Gavel Exec, BlackBoiler, LexCheck) are first-pass review tools that some users adapt for audit-style workflows by feeding them the counterparty document and asking for a review. That works for some cases. It does not work as well as a tool whose primary job is auditing the markup that already exists.
Which one am I doing when I review a third-party draft for the first time?
First-pass review. There is no prior version to compare against; the document is what it is, and the work is to identify problems against your standards or playbook. The output is the markup you produce, going outbound.
Which one am I doing when the counterparty sends back my draft with their changes?
Redline audit. The prior version exists (your draft); the new version is theirs; the work is to figure out what they changed and form a verdict on each change. The output is your reply.
Why does this distinction matter to me as a buyer?
Because tools optimize for what they market. If you spend most of your review time on counterparty markups (which most transactional and in-house lawyers do, after the first round), a tool optimized for first-pass review is helping with the smaller half of your job. Picking a tool that is built for the workflow you actually run is the practical reason to care.