
ACI vs. Ambient Scribing: Why the Terminology Distinction Matters
Most health systems first encounter ambient clinical intelligence through a sales pitch about documentation burden — a physician speaking naturally with a patient while AI generates a structured note in the background. That use case is real, the evidence for it is reasonably solid, and the market has moved fast. But the term ambient clinical intelligence (ACI) describes something considerably broader than automated note-writing, and conflating the two creates a specific governance failure: health systems acquire platform capabilities they have not evaluated, under a regulatory and safety framework designed only for the documentation layer.
Ambient scribing is a feature. ACI is the platform. That distinction is not semantic — it determines which evidence base applies, which regulatory questions are open, and which clinical oversight obligations fall on the deploying institution.
Several leading vendors have begun using the phrase "ambient operating system" to describe ACI — positioning it as the conversational intelligence layer for the entire clinical encounter, not just documentation. DeepScribe and Suki both articulate this framing explicitly in their 2025–2026 market materials. It is a useful conceptual frame, but it is vendor-originated and carries no clinical or regulatory standing. The FDA has not defined ACI as a device category. The EU MDR has not harmonized its classification. The NHS is the first health authority globally to issue specific guidance on ambient scribing products, and even that guidance addresses only tools with summarization capability — not the full extended platform stack.
The practical consequence of the terminology gap is that procurement conversations often bundle capabilities with very different evidence profiles under a single contract. A health system evaluating an ACI platform in 2026 may be simultaneously acquiring: a documentation tool with RCT-level evidence, a clinical coding module with no peer-reviewed validation, an order-staging feature with vendor-reported accuracy claims, and an agentic prior-authorization workflow that operates largely outside current regulatory oversight. These are not equivalent decisions.
For a broader overview of how ambient documentation fits within the larger landscape of AI-driven workflow change, see AI in Medicine: How It's Actually Reshaping Clinical Workflows. The present article focuses on the capability taxonomy and governance obligations that overview does not address.
The Six-Layer ACI Capability Taxonomy
ACI platforms in 2026 span at least six distinct capability layers. Each layer differs in its clinical function, underlying technology, current deployment prevalence, and — critically — the quality of available evidence. The taxonomy below maps these layers from the most established to the most nascent.

| Layer | Clinical Function | Technology Stack | Deployment Status (2026) | Evidence Maturity |
|---|---|---|---|---|
| 1 — Ambient Documentation | Passive capture of clinician-patient conversation; generation of structured SOAP or specialty-specific notes in real time | Automatic speech recognition (ASR) + large language model (LLM) summarization; speaker diarization | Broad clinical deployment; approximately two-thirds of U.S. Epic hospitals using an ambient documentation tool by mid-2025 (Suki, vendor-cited) | Strongest: multiple prospective cohort studies; at least two RCTs; consistent burnout and documentation-burden findings |
| 2 — Structured Clinical Coding | Automated suggestion of ICD-10, CPT, E&M, HCC, and SNOMED/LOINC codes derived from the clinical encounter transcript | NLP extraction + coding ontology mapping; some platforms use LLM-based inference for HCC risk adjustment | Active commercial deployment by major ACI vendors; coding accuracy cited as a key vendor differentiator | Sparse: coding accuracy claims are primarily vendor-reported; no peer-reviewed RCTs or prospective cohort studies identified as of Q2 2026 |
| 3 — Order Staging and Suggestion | Pre-population of lab, imaging, referral, and medication orders based on clinical conversation content | Intent extraction from ASR transcript; EHR order-entry API integration; LLM-based clinical reasoning | Early commercial deployment in select platforms; EHR incumbents (Epic) competing directly in this space | Absent from peer-reviewed literature; no independent validation studies identified; capability described in vendor documentation only |
| 4 — Real-Time Clinical Decision Support and Care Gap Alerts | In-encounter alerts for missed screenings, guideline adherence gaps, social determinants of health, and differential diagnosis prompts | Real-time NLP analysis of conversation stream; integration with clinical knowledge bases (e.g., UpToDate, CDS Hooks); rule-based and LLM-based alerting | Emerging; described in vendor roadmaps and the Mayo Clinic narrative review as a near-term capability; limited confirmed real-world deployments | Absent from peer-reviewed literature for ACI-specific CDS; general EHR-embedded CDS evidence does not transfer directly to ambient-conversation-triggered alerts |
| 5 — Patient-Facing Communication | Automated generation of after-visit summaries, discharge instructions, and patient education materials in plain language | LLM summarization of encounter transcript; patient portal integration; multilingual output capability in some platforms | Active deployment in several platforms; after-visit summary generation is among the more commonly reported extended features | Very sparse: patient comprehension, satisfaction, or outcome data from ambient-generated summaries not yet in peer-reviewed literature; patient perspectives on ACI remain largely unstudied |
| 6 — Agentic Workflow Automation | Autonomous or semi-autonomous execution of administrative tasks: prior authorization drafting, referral letters, inbox message triage, care coordination tasks | LLM-based document generation; EHR API and payer portal integration; multi-step task orchestration; human-in-the-loop confirmation gates (design-dependent) | Early commercial deployment; prior authorization AI growing rapidly (Menlo Ventures: 10x YoY growth, 2025); Abridge deploying real-time prior auth; significant variation in human oversight design | Absent from peer-reviewed literature; no independent safety, accuracy, or outcome data; regulatory classification as SaMD unresolved for tools with decision-support characteristics |
Evidence Review by Capability Layer: What the Literature Actually Shows
The evidence base for ACI is sharply stratified. Layer 1 has accumulated a meaningful body of peer-reviewed research. Layers 2 through 6 have almost none. This is not a gap that vendor adoption figures can fill.
Layer 1: Documentation — Where the Evidence Is
The documentation layer has the most rigorous evidence of any ACI capability. A randomized step-wedge controlled study at Providence Health found that ACI use was associated with 30.3% less reported burnout, 49.5% less frustration with documentation, and meaningful reductions in after-hours documentation time — commonly called "pajama time" — among early implementers. These are clinician-reported outcomes from a controlled design, which places them above the typical retrospective cohort evidence level common in health IT research.
The quality of AI-generated notes, however, is a separate question from documentation burden. The most current prospective real-world data comes from a 2026 study at UC Davis Health covering 7,545 notes generated by 31 volunteer physicians across two months. The findings are discussed in the safety signals section below. For full depth on documentation-layer accuracy and error taxonomy, the site's dedicated evidence review of LLM-powered ambient AI scribe clinical accuracy covers the literature in detail.
Layers 2–6: Where the Evidence Is Not
As of mid-2026, no peer-reviewed RCTs or prospective cohort studies have evaluated coding accuracy, order appropriateness, CDS alert quality, patient communication outcomes, or agentic task safety for ACI platforms. This is not a minor gap — it means health systems deploying these capabilities are doing so without the same clinical validation foundation that exists for documentation.
| Capability Layer | Peer-Reviewed RCTs | Prospective Cohort Studies | Retrospective Studies | Independent Validation | Primary Data Source (2026) |
|---|---|---|---|---|---|
| 1 — Documentation | Yes (≥2) | Yes (multiple) | Yes (multiple) | Partial | Peer-reviewed literature |
| 2 — Clinical Coding | None identified | None identified | None identified | None identified | Vendor-reported claims |
| 3 — Order Staging | None identified | None identified | None identified | None identified | Vendor documentation |
| 4 — Real-Time CDS | None identified | None identified | None identified | None identified | Vendor roadmaps; Mayo Clinic narrative review (secondary) |
| 5 — Patient Communication | None identified | None identified | None identified | None identified | Vendor documentation |
| 6 — Agentic Automation | None identified | None identified | None identified | None identified | Vendor documentation; market reports |
A 2025 editorial in JMIR Medical Informatics (Leung et al.) noted that there is currently no systematic data collection for evaluating the extent to which clinical errors or negative patient outcomes can be attributed to ambient AI scribe use. That observation applies even more forcefully to the extended capability layers, where error attribution frameworks do not yet exist in the published literature.
Specialty-Specific Deployment Patterns and Evidence Gaps
The performance of ACI documentation tools is not uniform across clinical settings. A 2026 narrative review from the Mayo Clinic group, published in Cardiovascular Diagnosis and Therapy, provides the most systematic specialty-specific analysis currently in the peer-reviewed literature — and its findings are sobering for subspecialty deployment.
Primary care physicians reported the highest satisfaction with ACI documentation tools, with 85% noting an improved work experience. Medical subspecialties — including oncology, cardiology, and dermatology — reported only 36.4% satisfaction, a statistically significant difference. Subspecialists also spent a mean of 3.75 additional minutes per appointment compared to primary care physicians using the same tools. The review explicitly confirms that no RCTs have evaluated ambient AI scribes specifically in cardiology, representing a clear research gap for one of the highest-volume subspecialty settings.
| Clinical Setting | Evidence Level for ACI Documentation | Reported Satisfaction / Efficiency | Key Deployment Challenges | Extended Capability Evidence (Layers 2–6) |
|---|---|---|---|---|
| Primary Care (Ambulatory) | Prospective cohort; RCT (Providence) | 85% satisfaction; consistent documentation burden reduction | Vocabulary breadth; EHR template variation | None in peer-reviewed literature |
| Medical Subspecialties (Cardiology, Oncology, Dermatology) | Narrative review synthesis; no subspecialty-specific RCTs | 36.4% satisfaction; 3.75 min/appointment additional time vs. primary care | Complex, jargon-dense clinical language; structured reporting requirements; specialty-specific note formats | None in peer-reviewed literature |
| Surgical Subspecialties | Limited; narrative review reports ~50% satisfaction | Moderate; operative note generation presents specific structured-data challenges | Procedural terminology; consent documentation; intraoperative context capture | None in peer-reviewed literature |
| Inpatient / ICU | Very limited; high-acuity barrier documented (Ohde et al., 2026) | Deployment barriers: noise environment, multi-clinician conversations, rapid status changes | Acoustic complexity; multi-speaker attribution; critical documentation accuracy requirements | None in peer-reviewed literature |
| Emergency Department | Limited; deployment described but not systematically studied | Workflow integration challenges: interruption-heavy environment, high patient volume, shift handoffs | Real-time note generation under time pressure; triage documentation accuracy | None in peer-reviewed literature |
| Behavioral Health | Very limited; privacy and consent concerns documented | Consent complexity; therapeutic relationship concerns; sensitive content handling | Patient consent for ambient recording; confidentiality obligations; trauma-informed care context | None in peer-reviewed literature |
| Telehealth | Limited; some deployment in synchronous video visits | Acoustic quality dependent on patient hardware; variable note quality | Audio quality variability; multi-party call attribution | None in peer-reviewed literature |
The high-acuity deployment barrier is particularly relevant for health systems planning inpatient or ICU rollouts. A 2026 perspective article in npj Digital Medicine (Ohde et al.) identified acoustic complexity, multi-clinician conversation attribution, and the clinical consequences of documentation errors in critical settings as specific barriers that ambulatory-validated tools have not been tested against.
Safety Signals Across the ACI Stack
The safety profile of ACI platforms involves several distinct risk mechanisms that operate differently across the capability layers. The documentation layer has the most empirical safety data; extended layers carry theoretical and emerging risks that have not yet been systematically characterized.
Documentation-Layer Safety: Current Real-World Evidence
The most current prospective real-world safety data for ACI documentation comes from a 2026 study at UC Davis Health, covering 7,545 notes generated by 31 volunteer physicians across July–August 2024, primarily in family medicine and internal medicine. The study found that accidental omissions were the most prevalent error type, occurring in 18% of evaluated notes. Hallucinations — content present in the AI-generated note but not in the clinical encounter — appeared in 11.5% of notes. Accidental inclusions (information from prior visits or other sources) occurred in 9.3% of notes. Bias was rare at 1.1%.
The most clinically significant finding: 5.3% of notes contained errors rated as posing serious or imminent risk of patient harm if not corrected before use. These were not edge cases in a stress test — they were errors in routine clinical notes from volunteer physicians who had opted into using the tool.
The omission-versus-hallucination distinction matters clinically. A hallucination — a fabricated finding or medication — is visible in the note and more likely to be caught by a reviewing clinician. An omission — a finding discussed in the encounter that does not appear in the note — is invisible. The clinician reviewing the note has no signal that something is missing. This asymmetry makes omissions the more dangerous error class despite appearing at a similar frequency in the literature.
Physician editing practices in the UC Davis study showed wide variation: the median percentage of AI-generated words changed was 9.0%, but individual physician rates ranged from 1.9% to 69.3%. Critically, 14.9% of notes were left entirely unedited before use. This pattern — a minority of notes receiving no human review — is the clearest signal of automation bias in current deployment data.
Cross-Layer Safety Signals
Several safety concerns apply across the full ACI capability stack, not only to documentation:
- Automation bias: As ACI platforms generate more content — notes, codes, orders, care gap alerts — the cognitive pressure on clinicians to review each output carefully increases while the available time does not. The 14.9% unedited-note rate in the UC Davis study suggests this is already occurring in documentation; the risk compounds as platforms generate more outputs across more layers.
- Cognitive debt: The 2025 JMIR editorial (Leung et al.) raised concern about long-term cognitive effects of delegating clinical synthesis to AI systems, citing preliminary evidence that LLM use for writing tasks may impair memory recall and reduce neural engagement. Whether this applies to clinical reasoning in ACI-heavy workflows is not yet studied.
- Note bloat: ACI-generated notes may be longer and more comprehensive than clinician-authored notes, which could obscure signal in downstream clinical review. The downstream impact on clinical quality is unstudied.
- The pajama-time paradox: ACI consistently reduces after-hours documentation time — a clear benefit. But if that time reduction comes partly from reduced editing of AI-generated content rather than from genuine workflow improvement, the safety trade-off is less favorable than burnout metrics alone suggest.
- Agentic layer safety unknowns: Layer 6 agentic tasks — prior authorization drafting, referral letters, inbox triage — introduce autonomous or semi-autonomous actions with downstream clinical and administrative consequences. No peer-reviewed safety data exists for these capabilities. Human-in-the-loop confirmation design varies significantly across platforms and is not standardized.
For the full error taxonomy and evidence review for the documentation layer, see the site's detailed analysis of LLM-powered ambient AI scribe accuracy and safety evidence.
Regulatory Classification and Governance Landscape
The regulatory environment for ACI tools is characterized more by open questions than settled classifications. This ambiguity has direct practical consequences for health systems, which bear most governance responsibility in the current framework.
The core classification question is whether ACI tools — particularly those with summarization and decision-support capabilities — qualify as Software as a Medical Device (SaMD) under FDA frameworks or equivalent EU MDR classifications. The 2026 npj Digital Medicine perspective by Ohde et al. summarizes the current state: pure transcription tools are less likely to be considered medical devices, but tools capable of summarization and clinical decision support have the potential to alter how information is communicated and influence clinical decision-making — which raises questions about oversight, safety standards, and accountability that current regulatory frameworks have not fully answered.
The NHS is currently the first health authority to release specific guidance on AI-enabled ambient scribing products in health and care settings. That guidance focuses on tools with summarization capability and requires regulatory scrutiny for those tools. No equivalent harmonized guidance exists under the FDA's SaMD framework or the EU MDR as of mid-2026.
Beyond device classification, ACI deployments generate data obligations that health systems must address independently of regulatory status. Ambient recording of clinical encounters implicates patient consent requirements that vary by state. Data retention policies for encounter audio and transcripts require explicit institutional decisions. Agentic workflow layers that access patient data across systems — to draft prior authorization letters or pull referral history — implicate interoperability and information-blocking obligations. For the data-access and interoperability implications of agentic ACI workflows specifically, see the site's analysis of the ONC information blocking rule and its implications for AI systems.
Evaluating ACI Platforms Beyond Documentation: A Governance Framework for Health Systems
When a health system signs an ACI contract in 2026, it is rarely signing a contract for documentation only. The governance framework for evaluating these platforms must be structured to match the capability layers being acquired, not the capability that received the most marketing attention.
One market dynamic worth understanding: Menlo Ventures' 2025 healthcare AI survey (700+ healthcare executives, August–September 2025, VC-perspective caveat applies) found that health system customers increasingly prefer to acquire coding, billing, prior authorization, scheduling, clinical decision support, and patient navigation capabilities from their incumbent EHR vendor rather than standalone AI vendors. This preference reflects integration risk concerns and consolidation fatigue — but it does not resolve the evidence gap for any of those capabilities regardless of which vendor provides them.
For Epic-specific governance considerations in ambient AI documentation deployment, the site's dedicated analysis of ambient AI scribes in Epic EHR addresses integration architecture and Epic-specific oversight obligations.
| ACI Capability Layer | Governance Obligation | Evidence Requirement Before Deployment | Clinical Oversight Design | Liability Consideration |
|---|---|---|---|---|
| 1 — Documentation | Physician attestation policy; editing audit capability; note-quality monitoring program | Peer-reviewed evidence available; institution should conduct internal pilot with error-rate monitoring | Mandatory physician review before note finalization; editing rate monitoring; unedited-note flagging | Physician remains legally responsible for attestation; institution responsible for oversight policy |
| 2 — Clinical Coding | Independent coding accuracy audit; revenue integrity oversight; appeal-rate monitoring | No peer-reviewed validation available; require vendor to provide independent accuracy data with methodology disclosure; conduct internal pre-deployment audit | Certified coder review of AI-suggested codes; denial-rate and audit-flag monitoring | Coding errors affect reimbursement, compliance, and audit exposure; institution bears compliance risk |
| 3 — Order Staging | Clinical appropriateness review; order acceptance/rejection rate monitoring; clinician override tracking | No peer-reviewed validation; require vendor accuracy claims with patient-population specifics; internal validation against clinical guidelines | Physician must confirm every staged order; no autonomous order entry; alert fatigue monitoring | Order errors have direct patient safety implications; liability follows the authorizing clinician and institution |
| 4 — Real-Time CDS | Alert content clinical review; false-positive rate monitoring; alert fatigue assessment | No ACI-specific CDS evidence; general CDS alert literature applies; institutional clinical review of alert logic required | Clinical governance committee review of alert rules; regular content audits; clinician override documentation | CDS liability is unsettled; institutions should document alert-review governance to demonstrate due diligence |
| 5 — Patient Communication | Patient consent for AI-generated communication; readability and accuracy review; multilingual accuracy validation | No peer-reviewed patient outcome data; pilot with patient comprehension assessment before broad deployment | Clinician review of AI-generated summaries before patient delivery in high-risk contexts; patient feedback mechanism | Patient communication errors may affect informed consent and care adherence; institution responsible for content accuracy |
| 6 — Agentic Automation | Human-in-the-loop confirmation design; data access scope limitation; audit log requirements; consent for automated actions | No peer-reviewed safety data; require vendor to disclose human-override architecture; internal safety testing required | No fully autonomous execution for clinically consequential tasks; mandatory human confirmation gate; error escalation pathway | Agentic errors may affect authorization decisions, referral accuracy, and patient access to care; regulatory classification unresolved; maximum institutional caution warranted |
On Adoption Thresholds and Vendor-Reported Metrics
Vendor materials frequently cite adoption thresholds as proxies for platform effectiveness. DeepScribe, for example, reports that organizations achieving 70% or greater clinician adoption (defined as using the platform for at least 50% of weekly encounters) consistently report stronger returns, and claims its own average adoption rate of 80%. These figures are vendor-originated and have not been independently validated.
- Adoption rate is a usage metric, not a clinical outcome metric. High adoption does not establish that notes are accurate, codes are correct, or orders are appropriate.
- The definition of adoption (50% of weekly encounters) is vendor-defined and may not align with institutional governance thresholds for clinical oversight.
- The UC Davis Health data shows that even among volunteer adopters — a high-engagement population — 14.9% of notes were left unedited. Adoption rate and oversight quality are not the same variable.
- Health systems should define their own adoption and oversight metrics independently of vendor benchmarks, anchored to clinical quality indicators rather than usage frequency.
Evidence Gaps and What Health Systems Should Monitor
The evidence gaps in ACI are not uniform — they are stratified by capability layer in a way that maps directly onto deployment risk. Health systems that understand this stratification can build monitoring programs calibrated to where the uncertainty is highest.
- No peer-reviewed data on coding accuracy for ACI-generated ICD-10, CPT, E&M, or HCC suggestions. Health systems deploying Layer 2 should establish internal coding audit programs with pre/post accuracy baselines before expanding deployment.
- No peer-reviewed data on order appropriateness for ACI-staged orders. Order acceptance rates, override rates, and downstream order outcomes (imaging utilization, lab redundancy) should be tracked from day one of deployment.
- No peer-reviewed data on CDS alert quality for ambient-conversation-triggered alerts. Alert fatigue from poorly calibrated ACI CDS may erode clinician attention to high-value alerts across the entire CDS ecosystem.
- No patient outcome data linking ACI use to clinical quality metrics. Documentation burden reduction and burnout improvement are important outcomes, but they do not establish that patients receive better care. This linkage is entirely unstudied.
- Subspecialty evidence gap confirmed and large. The Mayo Clinic review confirms no RCTs for cardiology-specific ACI. Health systems deploying in subspecialty settings should treat their deployment as a prospective evaluation, not an implementation of validated technology.
- Vocal biomarker integration is emerging but unvalidated. DeepScribe's vendor materials describe vocal biomarker-based diagnostic support as a near-term ACI capability. No peer-reviewed evidence supports this application in clinical practice as of mid-2026; health systems should treat vendor roadmap items in this category as speculative.
- Agentic safety architecture is vendor-dependent and unstandardized. The design of human-in-the-loop confirmation gates for Layer 6 agentic tasks varies significantly across platforms. Health systems should require explicit documentation of override architecture and audit logging before deployment.
- Patient and caregiver perspectives on ACI remain largely unstudied. Consent practices, patient comfort with ambient recording, and patient comprehension of AI-generated communication materials have not been systematically evaluated in the peer-reviewed literature.
The core governance obligation for health systems in 2026 is this: the evidence base that justified deploying Layer 1 documentation tools does not extend to the platform layers being added on top of it. Each capability layer requires its own evidence evaluation, its own clinical oversight design, and its own monitoring program. Deploying beyond documentation means accepting that the institution becomes, in part, the validation study.

Comments
Join the discussion with an anonymous comment.