What CONSORT-AI and SPIRIT-AI Are

CONSORT-AI and SPIRIT-AI are the two international reporting standards that together govern how artificial intelligence interventional clinical trials are designed, documented, and published. They were developed in parallel and published simultaneously on 9 September 2020 in three journals — Nature Medicine, The BMJ, and Lancet Digital Health — marking the first time a coordinated international standard for AI clinical trial reporting existed.

SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials – Artificial Intelligence) extends the SPIRIT 2013 framework with 15 AI-specific items — 12 new extensions and 3 elaborations to existing SPIRIT 2013 guidance — that must be addressed in clinical trial protocols involving an AI component. It governs the planning and ethics review stage of the trial lifecycle.

CONSORT-AI extends CONSORT 2010 with 14 AI-specific items for trial reports — the manuscripts submitted for peer review and publication after trial completion. Full detail on CONSORT-AI's 14 items is covered in the companion reference entry CONSORT-AI: The Reporting Standard for AI Clinical Trials. This entry focuses on SPIRIT-AI and the lifecycle division between the two standards.

Why Existing Trial Reporting Guidelines Were Insufficient for AI

SPIRIT 2013 was designed for conventional pharmaceutical and device trials. When AI-based interventions began entering randomized controlled trials, journal editors and methodologists identified a consistent pattern: protocols submitted for AI trials routinely omitted information that was critical to understanding whether the trial could be reproduced, interpreted, or ethically reviewed.

The core gaps SPIRIT 2013 did not address for AI trials include:

  • Algorithm versioning: AI models can be updated between protocol registration and trial completion. SPIRIT 2013 provided no mechanism for specifying which version of an algorithm was being studied or how version changes would be handled.
  • Input data eligibility: AI interventions often depend on specific data types, image acquisition parameters, or sensor configurations. Conventional eligibility criteria describe participant characteristics but not the data quality or format requirements that determine whether an AI system can function.
  • Integration requirements: The clinical workflow context in which an AI tool operates — which systems it connects to, what hardware it requires, which staff interact with it — affects both implementation fidelity and generalizability. SPIRIT 2013 had no provision for specifying these.
  • Human-AI interaction: Whether and how a clinician reviews, overrides, or acts on AI output is a key determinant of trial outcomes. Protocols were not required to pre-specify this interaction structure.
  • Output specification: AI systems produce outputs that vary in format — scores, classifications, highlighted regions, ranked lists. Without pre-specifying what the AI output is and how it is used in clinical decision-making, trial results are difficult to interpret or replicate.
  • Performance error planning: AI systems can produce systematic errors tied to specific input subgroups or edge cases. SPIRIT 2013 did not require protocols to pre-specify how performance errors would be monitored or analyzed during the trial.

These omissions meant that ethics committees reviewing AI trial protocols often lacked the information needed to assess risks, and that peer reviewers evaluating published results could not trace the intervention back to a fully specified protocol. SPIRIT-AI was developed to close these gaps at the point where they can most effectively be addressed: before the trial begins.

How CONSORT-AI and SPIRIT-AI Divide Reporting Responsibility Across the Trial Lifecycle

The structural logic of the two guidelines is a lifecycle division. Each standard applies at a distinct stage of the AI interventional trial, and the two stages are sequential rather than overlapping.

Infographic showing the AI clinical trial reporting lifecycle as a horizontal arrow from Protocol Stage (SPIRIT-AI, 15 items) to Trial Report Stage (CONSORT-AI, 14 items)
The AI clinical trial reporting lifecycle: SPIRIT-AI governs the protocol stage; CONSORT-AI governs the trial report stage. Together they cover the full lifecycle of an AI interventional trial.
The lifecycle division between SPIRIT-AI and CONSORT-AI. Neither standard substitutes for the other.
StageGuidelineBase Standard ExtendedItems AddedPrimary AudienceWhen Applied
Protocol / PlanningSPIRIT-AISPIRIT 201315 (12 new + 3 elaborations)Protocol authors, ethics committees, trial registriesBefore trial begins; at ethics submission and protocol registration
Trial Report / PublicationCONSORT-AICONSORT 201014Manuscript authors, peer reviewers, journal editorsAt manuscript submission and peer review

SPIRIT-AI governs what must be documented before a trial begins — the protocol that ethics committees review, that trial registries record, and that defines the pre-specified commitments investigators make about how the AI intervention will be described, deployed, and monitored.

CONSORT-AI governs what must appear in the published trial report — the manuscript that peer reviewers evaluate and that the research community uses to assess whether the trial was conducted as planned and whether its findings are valid and reproducible. The CONSORT-AI reporting standard provides the detailed requirements for that stage.

Together, the two guidelines create a continuous accountability chain: what investigators commit to in the SPIRIT-AI-compliant protocol can be checked against what they report in the CONSORT-AI-compliant manuscript. This linkage is the primary mechanism for detecting protocol deviations, selective reporting, and post-hoc algorithm modifications in AI trials.

The 15 SPIRIT-AI Items: What Protocol Authors Must Report

The 15 SPIRIT-AI items are organized across five domains of the SPIRIT framework. Rather than a flat checklist, they represent a structured expansion of existing SPIRIT sections to accommodate the specific documentation requirements of AI interventions. The full SPIRIT-AI checklist is published in The BMJ and should be used alongside the core SPIRIT 2013 checklist.

Administrative Items (SPIRIT-AI 1i and 1ii)

The administrative section requires that the trial title or abstract identify the intervention as involving an AI component, and that the protocol document the intended use of the AI system — the specific clinical task it is designed to perform, the target population, and the clinical context. These items ensure that the AI nature of the trial is immediately apparent to ethics reviewers and trial registry readers, not buried in technical appendices.

Introduction Items (SPIRIT-AI 6a-i and 6a-ii)

The introduction section requires protocol authors to describe the role the AI system is intended to play in the clinical pathway — whether it will be used for screening, diagnosis, treatment planning, monitoring, or another function — and to summarize the pre-existing evidence base for the AI intervention. This includes any prior validation studies and a description of the AI's known performance characteristics before the trial begins. The rationale for conducting the trial must be grounded in what is already known about the system's capabilities and limitations.

Participants and Interventions (SPIRIT-AI 9, 10i, 10ii, and 11a-i through 11a-vi)

This is the most substantive domain in SPIRIT-AI, containing eight items that address the AI intervention itself in operational detail.

Integration requirements (item 9) require the protocol to describe the setting in which the AI intervention will be integrated — the clinical environment, the technical infrastructure, the workflow position, and the instructions and skills required for those who will interact with the system. This item recognizes that an AI tool's performance is inseparable from its deployment context.

Eligibility criteria operate at two levels in AI trials. Item 10i requires participant-level eligibility criteria to specify any characteristics that affect whether the AI system can be applied to a given patient. Item 10ii introduces a distinct concept — input-data-level eligibility — requiring the protocol to define what constitutes acceptable input data for the AI system, including acquisition parameters, format requirements, and quality thresholds. A trial participant may meet clinical eligibility criteria while producing data that is technically ineligible for AI processing; SPIRIT-AI requires this distinction to be pre-specified.

The six sub-items under intervention description (11a-i through 11a-vi) cover:

  • Algorithm version (11a-i): The specific version of the AI algorithm being studied must be identified and documented. Any planned or permissible updates to the algorithm during the trial must be described.
  • Input data acquisition (11a-ii): The protocol must specify how input data will be acquired, processed, and prepared before being passed to the AI system — including any preprocessing steps that are part of the intervention.
  • Handling poor-quality inputs (11a-iii): The protocol must describe what happens when input data does not meet the AI system's requirements — whether the data is excluded, flagged, reacquired, or processed with a fallback procedure. This pre-specification prevents ad hoc decisions during the trial that could bias results.
  • Human-AI interaction in input handling (11a-iv): If a human operator is involved in preparing, reviewing, or selecting input data before it reaches the AI system, the protocol must describe the nature and extent of that involvement.
  • AI output specification (11a-v): The form and content of the AI system's output must be described — whether it produces a binary classification, a probability score, a highlighted region, a ranked list, or another output type — along with any thresholds or decision boundaries used.
  • Output contribution to decision-making (11a-vi): The protocol must specify how the AI output is intended to influence clinical decision-making — whether it is advisory, confirmatory, or determinative — and what role the clinician plays in acting on it. This item directly addresses the human-AI interaction at the point of clinical use.

Monitoring (SPIRIT-AI 22)

Item 22 requires the protocol to include a pre-specified plan for analyzing performance errors during the trial. This includes defining what constitutes an AI performance error, how errors will be detected and recorded, and what analysis will be conducted to understand their frequency, pattern, and clinical consequences. Pre-specifying this plan prevents post-hoc rationalization of error patterns and enables meaningful safety monitoring of the AI intervention.

Ethics and Dissemination (SPIRIT-AI 29)

Item 29 addresses accessibility of the AI intervention and code. The protocol must describe plans — or explain constraints — regarding whether the AI system, its code, or its model weights will be made accessible to other researchers following the trial. This item applies the open science principle to AI trials while acknowledging that commercial or intellectual property constraints may limit full disclosure; the key requirement is that the protocol addresses the question explicitly rather than omitting it.

The 14 CONSORT-AI Items: What Trial Reports Must Include

While SPIRIT-AI governs the protocol stage, CONSORT-AI governs what the published trial report must contain. Its 14 items address the same AI-specific dimensions — algorithm description, input data handling, human-AI interaction, output specification, and performance analysis — but applied to the completed trial rather than the planned one.

Key reporting requirements at the manuscript stage include identifying the intervention as AI-based in the title or abstract, describing the algorithm version used, reporting how input data was handled and what happened to poor-quality inputs, specifying the human-AI interaction structure as implemented, and reporting subgroup analyses of AI performance across relevant participant groups.

Development Process: International Consensus Basis

SPIRIT-AI and CONSORT-AI were developed through a single shared process, ensuring methodological coherence between the two standards. The development followed the EQUATOR Network's established approach for reporting guideline development.

The process began with a systematic literature review that generated 29 candidate items — potential additions or modifications to the base SPIRIT and CONSORT checklists that might be needed for AI trials. These candidate items were then assessed through a two-stage Delphi survey involving 169 invited international stakeholders, of whom 103 responded. Participants included clinicians, AI researchers, methodologists, journal editors, patient advocates, and regulatory representatives.

  1. Literature review generating 29 candidate items across both SPIRIT and CONSORT domains.
  2. Two-stage online Delphi survey with 103 responding stakeholders from 169 invited, rating candidate items for importance and feasibility.
  3. Two-day in-person consensus meeting in January 2020 at the University of Birmingham, attended by 31 stakeholders, to resolve items where Delphi ratings were borderline or contested.
  4. Checklist pilot with 34 participants to test usability and clarity of the draft checklists before finalization.
  5. Pre-specified 80% agreement threshold for item inclusion, applied consistently across both guidelines.

The shared development process means that the two guidelines are conceptually aligned: items that appear in SPIRIT-AI at the protocol stage have corresponding reporting requirements in CONSORT-AI at the manuscript stage. This alignment is intentional — it enables direct comparison between what was planned and what was reported.

CONSORT 2025 and SPIRIT 2025: Why the AI Extensions Remain Operative

Both base guidelines were substantially updated in April 2025. CONSORT 2025 was published on 14 April 2025, adding seven new checklist items, revising three existing items, and restructuring the checklist to 30 items. SPIRIT 2025 followed on 28 April 2025, expanding its checklist to 34 items and incorporating new requirements around patient and public engagement.

Neither update incorporated AI-specific guidance. A 2025 correspondence in The Lancet explicitly identified this as an omission, noting that CONSORT 2025 and SPIRIT 2025 do not require authors to disclose AI involvement in trial design, conduct, or analysis, and calling for CONSORT-AI and SPIRIT-AI to be revisited in alignment with the 2025 base statements to provide unified and enforceable standards.

The EQUATOR Network now lists CONSORT 2025 as the updated generic base guideline for CONSORT-AI, but neither AI extension has been formally revised to incorporate the 2025 base statement changes. Protocol authors and journal editors should be aware of this structural gap and not assume the 2025 updates have resolved it.

Adoption, Endorsement, and Compliance Evidence

Direct measurement of SPIRIT-AI adoption in trial protocols is limited in the published literature. The compliance data that does exist is largely derived from studies measuring CONSORT-AI concordance in published trial reports, not from audits of protocol registries. Readers should interpret the figures below with that methodological constraint in mind.

A 2024 systematic review published in Nature Communications identified 65 AI RCTs and assessed their concordance with CONSORT-AI. Median concordance was 90%, but only 10 of the 65 trials explicitly reported using CONSORT-AI, and only 3 of 52 journals in the sample explicitly endorsed or mandated it — Lancet Digital Health and Lancet Gastroenterology as mandates, and Ophthalmology Science as a recommendation. Algorithm version — one of the most fundamental AI-specific items — was concordant in only 20% of trials.

A 2025 editorial in BMJ Oncology reported a more concerning trajectory: CONSORT-AI adherence peaked at 96% in 2022 but had dropped to 79% by 2024 as the volume of published AI trials increased. The editorial explicitly called for protocol repositories to recommend SPIRIT-AI from the outset — a signal that the field has not yet normalized SPIRIT-AI as a standard step in AI trial registration.

  • Median CONSORT-AI concordance across 65 AI RCTs: 90% (IQR 77–94%), but only 10 trials explicitly reported using it.
  • Algorithm version — a core SPIRIT-AI and CONSORT-AI item — was concordant in only 20% of reviewed trials.
  • Only 3 of 52 journals in the 2024 review explicitly mandated or recommended CONSORT-AI.
  • CONSORT-AI adherence fell from 96% (2022) to 79% (2024) as AI trial volume grew.
  • Specific SPIRIT-AI protocol adoption data is not well documented in the published literature as of Q2 2026.

Where CONSORT-AI and SPIRIT-AI Fit in the Broader AI Reporting Guideline Ecosystem

SPIRIT-AI and CONSORT-AI are specific to interventional randomized controlled trials involving an AI component. They do not apply to other AI study designs, and other AI reporting guidelines do not apply to interventional RCTs. Conflating these standards is a common source of confusion for researchers selecting the appropriate guideline for their work.

Schematic diagram showing the AI reporting guideline ecosystem: SPIRIT-AI and CONSORT-AI for RCTs, with DECIDE-AI, TRIPOD+AI, STARD-AI, and CHEERS-AI covering other study types
The AI reporting guideline ecosystem by study type. SPIRIT-AI and CONSORT-AI apply only to interventional RCTs; other guidelines cover distinct study designs and development phases.
AI reporting guidelines by study type and lifecycle stage. Selecting the wrong guideline for a study design is a common error; the interventional RCT scope of SPIRIT-AI and CONSORT-AI is a hard boundary.
GuidelineApplies ToTrial StageScope
SPIRIT-AIAI interventional RCTsProtocol / planning15 items extending SPIRIT 2013 for protocol documentation
CONSORT-AIAI interventional RCTsTrial report / publication14 items extending CONSORT 2010 for manuscript reporting
DECIDE-AIAI decision-support systemsEarly-phase evaluationCovers feasibility and preliminary evaluation before full RCT
TRIPOD+AIAI prediction modelsModel development & validationCovers development, validation, and updating of AI prediction models
STARD-AIAI diagnostic testsDiagnostic accuracy studiesExtends STARD for studies evaluating AI diagnostic accuracy
CHEERS-AIAI health economic evaluationsEconomic modelingCovers cost-effectiveness and health economic analyses of AI interventions

A practical rule of thumb: if the study is a randomized controlled trial in which the AI system is the intervention being tested, SPIRIT-AI applies to the protocol and CONSORT-AI applies to the report. If the study evaluates a diagnostic AI model without randomization, STARD-AI or TRIPOD+AI is more likely appropriate. If the study assesses an AI tool before a full RCT, DECIDE-AI may apply.

Practical Guidance for Protocol Authors, Reviewers, and Editors

The following guidance is intended for those working with AI trial protocols at different points in the review and publication process.

For Protocol Authors

  • Use SPIRIT-AI alongside SPIRIT 2013 when writing any AI interventional trial protocol. SPIRIT-AI does not replace SPIRIT 2013 — it adds to it. Both checklists must be completed.
  • Include the completed SPIRIT-AI checklist when submitting the protocol to an ethics committee. Ethics reviewers cannot adequately assess the risks of an AI intervention without the information SPIRIT-AI requires.
  • Register the protocol in a trial registry and include SPIRIT-AI-compliant documentation at registration. Protocol registration creates the accountability link that CONSORT-AI reporting at the manuscript stage depends on.
  • Document the algorithm version at protocol registration. This is the single most under-reported item in AI trials (20% concordance in the 2024 Nature Communications review) and one of the most consequential for reproducibility.
  • Pre-specify the performance error analysis plan before the trial begins. Defining what constitutes an AI error and how it will be analyzed cannot credibly be done after results are known.

For Ethics Committees

  • Expect SPIRIT-AI-compliant protocols for any trial in which an AI system is the intervention. A protocol that does not address algorithm version, input data eligibility, integration requirements, and human-AI interaction is incomplete for an AI trial.
  • Pay particular attention to the performance error analysis plan (item 22) and the handling of poor-quality inputs (item 11a-iii). These items are most directly relevant to patient safety monitoring.

For Peer Reviewers and Journal Editors

  • Apply CONSORT-AI when reviewing AI trial manuscripts, and verify that the manuscript references a registered SPIRIT-AI-compliant protocol. The absence of a registered protocol is itself a reporting deficiency.
  • Check algorithm version reporting specifically — this is the most consistently under-reported CONSORT-AI item and the one that most directly undermines reproducibility.
  • Journals that do not yet mandate CONSORT-AI or SPIRIT-AI should consider doing so. As of the 2024 systematic review, only three of 52 journals with AI trial publications had explicit endorsement policies.

For those working primarily with trial reports rather than protocols, the CONSORT-AI reporting standard provides the corresponding framework for the manuscript stage. The two entries are designed to be read together as a complete reference for the AI interventional trial reporting lifecycle.