Ambient Clinical Intelligence: Beyond AI Scribes

Physician and patient in face-to-face consultation with layered ambient AI interface panels visible on a background workstation showing note auto-population, coding suggestions, order staging, and care gap alerts. — Ambient clinical intelligence as quiet infrastructure: the technology operates in the background while the clinician-patient encounter remains central.

ACI vs. Ambient Scribing: Why the Terminology Distinction Matters

Most health systems first encounter ambient clinical intelligence through a sales pitch about documentation burden — a physician speaking naturally with a patient while AI generates a structured note in the background. That use case is real, the evidence for it is reasonably solid, and the market has moved fast. But the term ambient clinical intelligence (ACI) describes something considerably broader than automated note-writing, and conflating the two creates a specific governance failure: health systems acquire platform capabilities they have not evaluated, under a regulatory and safety framework designed only for the documentation layer.

Ambient scribing is a feature. ACI is the platform. That distinction is not semantic — it determines which evidence base applies, which regulatory questions are open, and which clinical oversight obligations fall on the deploying institution.

Several leading vendors have begun using the phrase "ambient operating system" to describe ACI — positioning it as the conversational intelligence layer for the entire clinical encounter, not just documentation. DeepScribe and Suki both articulate this framing explicitly in their 2025–2026 market materials. It is a useful conceptual frame, but it is vendor-originated and carries no clinical or regulatory standing. The FDA has not defined ACI as a device category. The EU MDR has not harmonized its classification. The NHS is the first health authority globally to issue specific guidance on ambient scribing products, and even that guidance addresses only tools with summarization capability — not the full extended platform stack.

The practical consequence of the terminology gap is that procurement conversations often bundle capabilities with very different evidence profiles under a single contract. A health system evaluating an ACI platform in 2026 may be simultaneously acquiring: a documentation tool with RCT-level evidence, a clinical coding module with no peer-reviewed validation, an order-staging feature with vendor-reported accuracy claims, and an agentic prior-authorization workflow that operates largely outside current regulatory oversight. These are not equivalent decisions.

For a broader overview of how ambient documentation fits within the larger landscape of AI-driven workflow change, see AI in Medicine: How It's Actually Reshaping Clinical Workflows. The present article focuses on the capability taxonomy and governance obligations that overview does not address.

The Six-Layer ACI Capability Taxonomy

ACI platforms in 2026 span at least six distinct capability layers. Each layer differs in its clinical function, underlying technology, current deployment prevalence, and — critically — the quality of available evidence. The taxonomy below maps these layers from the most established to the most nascent.

Platform-stack diagram showing six ACI capability layers from ambient documentation at the base through clinical coding, order staging, clinical decision support, patient communication, and agentic automation at the top, with lower layers solid and upper layers progressively more translucent. — The six-layer ACI capability stack: evidence strength and regulatory clarity decrease as capabilities extend upward from documentation.

ACI capability taxonomy: six layers mapped by clinical function, technology, deployment status, and evidence maturity as of Q2 2026. Evidence maturity decreases substantially above Layer 1. Deployment status data for Layer 1 is from Suki (vendor-cited); market figures from Menlo Ventures 2025 report (investor survey, methodology caveat applies).
Layer	Clinical Function	Technology Stack	Deployment Status (2026)	Evidence Maturity
1 — Ambient Documentation	Passive capture of clinician-patient conversation; generation of structured SOAP or specialty-specific notes in real time	Automatic speech recognition (ASR) + large language model (LLM) summarization; speaker diarization	Broad clinical deployment; approximately two-thirds of U.S. Epic hospitals using an ambient documentation tool by mid-2025 (Suki, vendor-cited)	Strongest: multiple prospective cohort studies; at least two RCTs; consistent burnout and documentation-burden findings
2 — Structured Clinical Coding	Automated suggestion of ICD-10, CPT, E&M, HCC, and SNOMED/LOINC codes derived from the clinical encounter transcript	NLP extraction + coding ontology mapping; some platforms use LLM-based inference for HCC risk adjustment	Active commercial deployment by major ACI vendors; coding accuracy cited as a key vendor differentiator	Sparse: coding accuracy claims are primarily vendor-reported; no peer-reviewed RCTs or prospective cohort studies identified as of Q2 2026
3 — Order Staging and Suggestion	Pre-population of lab, imaging, referral, and medication orders based on clinical conversation content	Intent extraction from ASR transcript; EHR order-entry API integration; LLM-based clinical reasoning	Early commercial deployment in select platforms; EHR incumbents (Epic) competing directly in this space	Absent from peer-reviewed literature; no independent validation studies identified; capability described in vendor documentation only
4 — Real-Time Clinical Decision Support and Care Gap Alerts	In-encounter alerts for missed screenings, guideline adherence gaps, social determinants of health, and differential diagnosis prompts	Real-time NLP analysis of conversation stream; integration with clinical knowledge bases (e.g., UpToDate, CDS Hooks); rule-based and LLM-based alerting	Emerging; described in vendor roadmaps and the Mayo Clinic narrative review as a near-term capability; limited confirmed real-world deployments	Absent from peer-reviewed literature for ACI-specific CDS; general EHR-embedded CDS evidence does not transfer directly to ambient-conversation-triggered alerts
5 — Patient-Facing Communication	Automated generation of after-visit summaries, discharge instructions, and patient education materials in plain language	LLM summarization of encounter transcript; patient portal integration; multilingual output capability in some platforms	Active deployment in several platforms; after-visit summary generation is among the more commonly reported extended features	Very sparse: patient comprehension, satisfaction, or outcome data from ambient-generated summaries not yet in peer-reviewed literature; patient perspectives on ACI remain largely unstudied
6 — Agentic Workflow Automation	Autonomous or semi-autonomous execution of administrative tasks: prior authorization drafting, referral letters, inbox message triage, care coordination tasks	LLM-based document generation; EHR API and payer portal integration; multi-step task orchestration; human-in-the-loop confirmation gates (design-dependent)	Early commercial deployment; prior authorization AI growing rapidly (Menlo Ventures: 10x YoY growth, 2025); Abridge deploying real-time prior auth; significant variation in human oversight design	Absent from peer-reviewed literature; no independent safety, accuracy, or outcome data; regulatory classification as SaMD unresolved for tools with decision-support characteristics

Evidence Review by Capability Layer: What the Literature Actually Shows

The evidence base for ACI is sharply stratified. Layer 1 has accumulated a meaningful body of peer-reviewed research. Layers 2 through 6 have almost none. This is not a gap that vendor adoption figures can fill.

Layer 1: Documentation — Where the Evidence Is

The documentation layer has the most rigorous evidence of any ACI capability. A randomized step-wedge controlled study at Providence Health found that ACI use was associated with 30.3% less reported burnout, 49.5% less frustration with documentation, and meaningful reductions in after-hours documentation time — commonly called "pajama time" — among early implementers. These are clinician-reported outcomes from a controlled design, which places them above the typical retrospective cohort evidence level common in health IT research.

The quality of AI-generated notes, however, is a separate question from documentation burden. The most current prospective real-world data comes from a 2026 study at UC Davis Health covering 7,545 notes generated by 31 volunteer physicians across two months. The findings are discussed in the safety signals section below. For full depth on documentation-layer accuracy and error taxonomy, the site's dedicated evidence review of LLM-powered ambient AI scribe clinical accuracy covers the literature in detail.

Layers 2–6: Where the Evidence Is Not

As of mid-2026, no peer-reviewed RCTs or prospective cohort studies have evaluated coding accuracy, order appropriateness, CDS alert quality, patient communication outcomes, or agentic task safety for ACI platforms. This is not a minor gap — it means health systems deploying these capabilities are doing so without the same clinical validation foundation that exists for documentation.

Evidence availability by ACI capability layer as of Q2 2026. The absence of peer-reviewed studies for Layers 2–6 is not a search artifact — it reflects the current state of the literature.
Capability Layer	Peer-Reviewed RCTs	Prospective Cohort Studies	Retrospective Studies	Independent Validation	Primary Data Source (2026)
1 — Documentation	Yes (≥2)	Yes (multiple)	Yes (multiple)	Partial	Peer-reviewed literature
2 — Clinical Coding	None identified	None identified	None identified	None identified	Vendor-reported claims
3 — Order Staging	None identified	None identified	None identified	None identified	Vendor documentation
4 — Real-Time CDS	None identified	None identified	None identified	None identified	Vendor roadmaps; Mayo Clinic narrative review (secondary)
5 — Patient Communication	None identified	None identified	None identified	None identified	Vendor documentation
6 — Agentic Automation	None identified	None identified	None identified	None identified	Vendor documentation; market reports

A 2025 editorial in JMIR Medical Informatics (Leung et al.) noted that there is currently no systematic data collection for evaluating the extent to which clinical errors or negative patient outcomes can be attributed to ambient AI scribe use. That observation applies even more forcefully to the extended capability layers, where error attribution frameworks do not yet exist in the published literature.

Specialty-Specific Deployment Patterns and Evidence Gaps

The performance of ACI documentation tools is not uniform across clinical settings. A 2026 narrative review from the Mayo Clinic group, published in Cardiovascular Diagnosis and Therapy, provides the most systematic specialty-specific analysis currently in the peer-reviewed literature — and its findings are sobering for subspecialty deployment.

Primary care physicians reported the highest satisfaction with ACI documentation tools, with 85% noting an improved work experience. Medical subspecialties — including oncology, cardiology, and dermatology — reported only 36.4% satisfaction, a statistically significant difference. Subspecialists also spent a mean of 3.75 additional minutes per appointment compared to primary care physicians using the same tools. The review explicitly confirms that no RCTs have evaluated ambient AI scribes specifically in cardiology, representing a clear research gap for one of the highest-volume subspecialty settings.

Specialty and setting deployment patterns for ACI documentation (Layer 1) and extended capabilities (Layers 2–6). Evidence for extended capabilities is absent across all settings as of Q2 2026. Satisfaction figures from Mayo Clinic narrative review (PMC12973079, June 2025 literature cutoff, primarily ambulatory settings).
Clinical Setting	Evidence Level for ACI Documentation	Reported Satisfaction / Efficiency	Key Deployment Challenges	Extended Capability Evidence (Layers 2–6)
Primary Care (Ambulatory)	Prospective cohort; RCT (Providence)	85% satisfaction; consistent documentation burden reduction	Vocabulary breadth; EHR template variation	None in peer-reviewed literature
Medical Subspecialties (Cardiology, Oncology, Dermatology)	Narrative review synthesis; no subspecialty-specific RCTs	36.4% satisfaction; 3.75 min/appointment additional time vs. primary care	Complex, jargon-dense clinical language; structured reporting requirements; specialty-specific note formats	None in peer-reviewed literature
Surgical Subspecialties	Limited; narrative review reports ~50% satisfaction	Moderate; operative note generation presents specific structured-data challenges	Procedural terminology; consent documentation; intraoperative context capture	None in peer-reviewed literature
Inpatient / ICU	Very limited; high-acuity barrier documented (Ohde et al., 2026)	Deployment barriers: noise environment, multi-clinician conversations, rapid status changes	Acoustic complexity; multi-speaker attribution; critical documentation accuracy requirements	None in peer-reviewed literature
Emergency Department	Limited; deployment described but not systematically studied	Workflow integration challenges: interruption-heavy environment, high patient volume, shift handoffs	Real-time note generation under time pressure; triage documentation accuracy	None in peer-reviewed literature
Behavioral Health	Very limited; privacy and consent concerns documented	Consent complexity; therapeutic relationship concerns; sensitive content handling	Patient consent for ambient recording; confidentiality obligations; trauma-informed care context	None in peer-reviewed literature
Telehealth	Limited; some deployment in synchronous video visits	Acoustic quality dependent on patient hardware; variable note quality	Audio quality variability; multi-party call attribution	None in peer-reviewed literature

The high-acuity deployment barrier is particularly relevant for health systems planning inpatient or ICU rollouts. A 2026 perspective article in npj Digital Medicine (Ohde et al.) identified acoustic complexity, multi-clinician conversation attribution, and the clinical consequences of documentation errors in critical settings as specific barriers that ambulatory-validated tools have not been tested against.

Safety Signals Across the ACI Stack

The safety profile of ACI platforms involves several distinct risk mechanisms that operate differently across the capability layers. The documentation layer has the most empirical safety data; extended layers carry theoretical and emerging risks that have not yet been systematically characterized.

Documentation-Layer Safety: Current Real-World Evidence

The most current prospective real-world safety data for ACI documentation comes from a 2026 study at UC Davis Health, covering 7,545 notes generated by 31 volunteer physicians across July–August 2024, primarily in family medicine and internal medicine. The study found that accidental omissions were the most prevalent error type, occurring in 18% of evaluated notes. Hallucinations — content present in the AI-generated note but not in the clinical encounter — appeared in 11.5% of notes. Accidental inclusions (information from prior visits or other sources) occurred in 9.3% of notes. Bias was rare at 1.1%.

The most clinically significant finding: 5.3% of notes contained errors rated as posing serious or imminent risk of patient harm if not corrected before use. These were not edge cases in a stress test — they were errors in routine clinical notes from volunteer physicians who had opted into using the tool.

The omission-versus-hallucination distinction matters clinically. A hallucination — a fabricated finding or medication — is visible in the note and more likely to be caught by a reviewing clinician. An omission — a finding discussed in the encounter that does not appear in the note — is invisible. The clinician reviewing the note has no signal that something is missing. This asymmetry makes omissions the more dangerous error class despite appearing at a similar frequency in the literature.

Physician editing practices in the UC Davis study showed wide variation: the median percentage of AI-generated words changed was 9.0%, but individual physician rates ranged from 1.9% to 69.3%. Critically, 14.9% of notes were left entirely unedited before use. This pattern — a minority of notes receiving no human review — is the clearest signal of automation bias in current deployment data.

Cross-Layer Safety Signals

Several safety concerns apply across the full ACI capability stack, not only to documentation:

Automation bias: As ACI platforms generate more content — notes, codes, orders, care gap alerts — the cognitive pressure on clinicians to review each output carefully increases while the available time does not. The 14.9% unedited-note rate in the UC Davis study suggests this is already occurring in documentation; the risk compounds as platforms generate more outputs across more layers.
Cognitive debt: The 2025 JMIR editorial (Leung et al.) raised concern about long-term cognitive effects of delegating clinical synthesis to AI systems, citing preliminary evidence that LLM use for writing tasks may impair memory recall and reduce neural engagement. Whether this applies to clinical reasoning in ACI-heavy workflows is not yet studied.
Note bloat: ACI-generated notes may be longer and more comprehensive than clinician-authored notes, which could obscure signal in downstream clinical review. The downstream impact on clinical quality is unstudied.
The pajama-time paradox: ACI consistently reduces after-hours documentation time — a clear benefit. But if that time reduction comes partly from reduced editing of AI-generated content rather than from genuine workflow improvement, the safety trade-off is less favorable than burnout metrics alone suggest.
Agentic layer safety unknowns: Layer 6 agentic tasks — prior authorization drafting, referral letters, inbox triage — introduce autonomous or semi-autonomous actions with downstream clinical and administrative consequences. No peer-reviewed safety data exists for these capabilities. Human-in-the-loop confirmation design varies significantly across platforms and is not standardized.

For the full error taxonomy and evidence review for the documentation layer, see the site's detailed analysis of LLM-powered ambient AI scribe accuracy and safety evidence.

Regulatory Classification and Governance Landscape

The regulatory environment for ACI tools is characterized more by open questions than settled classifications. This ambiguity has direct practical consequences for health systems, which bear most governance responsibility in the current framework.

The core classification question is whether ACI tools — particularly those with summarization and decision-support capabilities — qualify as Software as a Medical Device (SaMD) under FDA frameworks or equivalent EU MDR classifications. The 2026 npj Digital Medicine perspective by Ohde et al. summarizes the current state: pure transcription tools are less likely to be considered medical devices, but tools capable of summarization and clinical decision support have the potential to alter how information is communicated and influence clinical decision-making — which raises questions about oversight, safety standards, and accountability that current regulatory frameworks have not fully answered.

The NHS is currently the first health authority to release specific guidance on AI-enabled ambient scribing products in health and care settings. That guidance focuses on tools with summarization capability and requires regulatory scrutiny for those tools. No equivalent harmonized guidance exists under the FDA's SaMD framework or the EU MDR as of mid-2026.

Beyond device classification, ACI deployments generate data obligations that health systems must address independently of regulatory status. Ambient recording of clinical encounters implicates patient consent requirements that vary by state. Data retention policies for encounter audio and transcripts require explicit institutional decisions. Agentic workflow layers that access patient data across systems — to draft prior authorization letters or pull referral history — implicate interoperability and information-blocking obligations. For the data-access and interoperability implications of agentic ACI workflows specifically, see the site's analysis of the ONC information blocking rule and its implications for AI systems.

Evaluating ACI Platforms Beyond Documentation: A Governance Framework for Health Systems

When a health system signs an ACI contract in 2026, it is rarely signing a contract for documentation only. The governance framework for evaluating these platforms must be structured to match the capability layers being acquired, not the capability that received the most marketing attention.

One market dynamic worth understanding: Menlo Ventures' 2025 healthcare AI survey (700+ healthcare executives, August–September 2025, VC-perspective caveat applies) found that health system customers increasingly prefer to acquire coding, billing, prior authorization, scheduling, clinical decision support, and patient navigation capabilities from their incumbent EHR vendor rather than standalone AI vendors. This preference reflects integration risk concerns and consolidation fatigue — but it does not resolve the evidence gap for any of those capabilities regardless of which vendor provides them.

For Epic-specific governance considerations in ambient AI documentation deployment, the site's dedicated analysis of ambient AI scribes in Epic EHR addresses integration architecture and Epic-specific oversight obligations.

Governance framework for ACI capability layers: obligations, evidence requirements, oversight design, and liability considerations by layer. This framework reflects current regulatory ambiguity and the absence of peer-reviewed validation for Layers 2–6.
ACI Capability Layer	Governance Obligation	Evidence Requirement Before Deployment	Clinical Oversight Design	Liability Consideration
1 — Documentation	Physician attestation policy; editing audit capability; note-quality monitoring program	Peer-reviewed evidence available; institution should conduct internal pilot with error-rate monitoring	Mandatory physician review before note finalization; editing rate monitoring; unedited-note flagging	Physician remains legally responsible for attestation; institution responsible for oversight policy
2 — Clinical Coding	Independent coding accuracy audit; revenue integrity oversight; appeal-rate monitoring	No peer-reviewed validation available; require vendor to provide independent accuracy data with methodology disclosure; conduct internal pre-deployment audit	Certified coder review of AI-suggested codes; denial-rate and audit-flag monitoring	Coding errors affect reimbursement, compliance, and audit exposure; institution bears compliance risk
3 — Order Staging	Clinical appropriateness review; order acceptance/rejection rate monitoring; clinician override tracking	No peer-reviewed validation; require vendor accuracy claims with patient-population specifics; internal validation against clinical guidelines	Physician must confirm every staged order; no autonomous order entry; alert fatigue monitoring	Order errors have direct patient safety implications; liability follows the authorizing clinician and institution
4 — Real-Time CDS	Alert content clinical review; false-positive rate monitoring; alert fatigue assessment	No ACI-specific CDS evidence; general CDS alert literature applies; institutional clinical review of alert logic required	Clinical governance committee review of alert rules; regular content audits; clinician override documentation	CDS liability is unsettled; institutions should document alert-review governance to demonstrate due diligence
5 — Patient Communication	Patient consent for AI-generated communication; readability and accuracy review; multilingual accuracy validation	No peer-reviewed patient outcome data; pilot with patient comprehension assessment before broad deployment	Clinician review of AI-generated summaries before patient delivery in high-risk contexts; patient feedback mechanism	Patient communication errors may affect informed consent and care adherence; institution responsible for content accuracy
6 — Agentic Automation	Human-in-the-loop confirmation design; data access scope limitation; audit log requirements; consent for automated actions	No peer-reviewed safety data; require vendor to disclose human-override architecture; internal safety testing required	No fully autonomous execution for clinically consequential tasks; mandatory human confirmation gate; error escalation pathway	Agentic errors may affect authorization decisions, referral accuracy, and patient access to care; regulatory classification unresolved; maximum institutional caution warranted

On Adoption Thresholds and Vendor-Reported Metrics

Vendor materials frequently cite adoption thresholds as proxies for platform effectiveness. DeepScribe, for example, reports that organizations achieving 70% or greater clinician adoption (defined as using the platform for at least 50% of weekly encounters) consistently report stronger returns, and claims its own average adoption rate of 80%. These figures are vendor-originated and have not been independently validated.

Adoption rate is a usage metric, not a clinical outcome metric. High adoption does not establish that notes are accurate, codes are correct, or orders are appropriate.
The definition of adoption (50% of weekly encounters) is vendor-defined and may not align with institutional governance thresholds for clinical oversight.
The UC Davis Health data shows that even among volunteer adopters — a high-engagement population — 14.9% of notes were left unedited. Adoption rate and oversight quality are not the same variable.
Health systems should define their own adoption and oversight metrics independently of vendor benchmarks, anchored to clinical quality indicators rather than usage frequency.

Evidence Gaps and What Health Systems Should Monitor

The evidence gaps in ACI are not uniform — they are stratified by capability layer in a way that maps directly onto deployment risk. Health systems that understand this stratification can build monitoring programs calibrated to where the uncertainty is highest.

No peer-reviewed data on coding accuracy for ACI-generated ICD-10, CPT, E&M, or HCC suggestions. Health systems deploying Layer 2 should establish internal coding audit programs with pre/post accuracy baselines before expanding deployment.
No peer-reviewed data on order appropriateness for ACI-staged orders. Order acceptance rates, override rates, and downstream order outcomes (imaging utilization, lab redundancy) should be tracked from day one of deployment.
No peer-reviewed data on CDS alert quality for ambient-conversation-triggered alerts. Alert fatigue from poorly calibrated ACI CDS may erode clinician attention to high-value alerts across the entire CDS ecosystem.
No patient outcome data linking ACI use to clinical quality metrics. Documentation burden reduction and burnout improvement are important outcomes, but they do not establish that patients receive better care. This linkage is entirely unstudied.
Subspecialty evidence gap confirmed and large. The Mayo Clinic review confirms no RCTs for cardiology-specific ACI. Health systems deploying in subspecialty settings should treat their deployment as a prospective evaluation, not an implementation of validated technology.
Vocal biomarker integration is emerging but unvalidated. DeepScribe's vendor materials describe vocal biomarker-based diagnostic support as a near-term ACI capability. No peer-reviewed evidence supports this application in clinical practice as of mid-2026; health systems should treat vendor roadmap items in this category as speculative.
Agentic safety architecture is vendor-dependent and unstandardized. The design of human-in-the-loop confirmation gates for Layer 6 agentic tasks varies significantly across platforms. Health systems should require explicit documentation of override architecture and audit logging before deployment.
Patient and caregiver perspectives on ACI remain largely unstudied. Consent practices, patient comfort with ambient recording, and patient comprehension of AI-generated communication materials have not been systematically evaluated in the peer-reviewed literature.

The core governance obligation for health systems in 2026 is this: the evidence base that justified deploying Layer 1 documentation tools does not extend to the platform layers being added on top of it. Each capability layer requires its own evidence evaluation, its own clinical oversight design, and its own monitoring program. Deploying beyond documentation means accepting that the institution becomes, in part, the validation study.

Ambient Clinical Intelligence: Beyond AI Scribes — The Full Capability Landscape for Health Systems in 2026