For readers seeking foundational context on clinical AI before engaging with this ecosystem-level analysis, the overview of artificial intelligence in healthcare clinical applications provides relevant background.
Why Epic's AI Stack Demands Ecosystem-Level Governance
The concentration of AI decision-support in a single EHR platform is not an abstract risk. According to the ONC Data Brief on hospital predictive AI trends for 2023–2024, 71% of U.S. hospitals used predictive AI integrated into their EHR in 2024, up from 66% in 2023. Among hospitals using the market-leading EHR vendor — understood in industry reporting to refer to Epic by market share, though not named explicitly in the ONC document — that figure reached 90%, compared to 50% for hospitals on other platforms. Eighty percent of all hospitals sourced their predictive AI directly from their EHR developer.
What this means practically: when a governance gap exists at the platform level, it does not affect one hospital. It replicates across the majority of the U.S. hospital system simultaneously. A model deployed without adequate local validation, a self-service analytics tool accessed without appropriate data access controls, or a foundation model integrated before prospective outcome evidence exists — each of these conditions, if present in the platform's default configuration, scales instantly to hundreds of health systems.
This article maps the four-layer architecture of Epic's AI analytics ecosystem and defines the structured oversight obligations each layer creates. The layers are: (1) SlicerDicer and SideKick, the self-service cohort analytics and natural-language querying environment; (2) the embedded predictive model suite covering deterioration, sepsis, readmission, and other clinical risk domains; (3) the Cosmos real-world data platform and its associated CoMET and Curiosity foundation models; and (4) the Seismometer open-source local validation framework. Each layer carries distinct governance obligations. None can be governed adequately by treating it in isolation.

SlicerDicer and SideKick: Self-Service Analytics and the Governance Gaps They Open
SlicerDicer is Epic's self-service cohort analytics environment, enabling clinicians and analysts to define patient populations, apply clinical filters, and generate population-level analyses without routing requests through an IT or informatics intermediary. SideKick extends this capability by allowing users to construct queries using natural language, with an AI assistant interpreting the request and translating it into a structured cohort definition.
The governance challenge here is not the tool's capability — it is the shift in accountability that self-service creates. When an IT analyst constructs a cohort query, institutional review processes typically apply. When a physician or quality officer runs the same query directly through SlicerDicer, those checkpoints may not. The result is faster access to population data, but with accountability for query design, data interpretation, and downstream action resting with the requesting clinician or analyst rather than a centralized team.
SideKick adds a second layer of governance complexity. When a natural-language query is interpreted by an AI assistant, the translation from clinical intent to structured query may introduce errors that are not immediately visible to the requesting user. A clinician who asks for "patients with heart failure readmitted within 30 days" may receive a cohort that reflects a technically accurate but clinically incomplete interpretation of that phrase. Without a structured review step, the output may be acted upon as if it were a validated analysis.
- HIPAA minimum-necessary data access: SlicerDicer's self-service model requires role-based access controls that enforce minimum-necessary standards; without these, users may access patient-level data beyond the scope of their clinical or operational role.
- AI-generated query accountability: when SideKick translates a natural-language request into a cohort definition, the requesting user must be trained to verify the query logic before acting on outputs — the AI assistant does not guarantee clinical accuracy of the translation.
- Interpretation quality assurance: population-level analyses generated through SlicerDicer carry the same methodological risks as any observational analysis — confounding, selection bias, and data completeness gaps — and require appropriate statistical and clinical review before informing policy or care decisions.
- Audit trail requirements: health systems must ensure that SlicerDicer queries are logged, attributable, and reviewable, particularly when outputs are used to support quality improvement, population health programs, or regulatory reporting.
- Training obligations: removing the IT intermediary accelerates access but does not reduce the need for user competency; health systems should define minimum training requirements for SlicerDicer and SideKick access, analogous to training requirements for clinical decision support tools.
Epic's Embedded Predictive Model Suite: Clinical Use Cases and the Evidence Landscape
Epic's clinical environment includes dozens of embedded predictive models spanning deterioration risk, sepsis detection, readmission probability, length-of-stay forecasting, and other clinical risk domains. These models are integrated directly into the EHR workflow — surfacing as alert scores, risk flags, or care team notifications — rather than existing as separate applications requiring separate logins or data exports.
The clinical evidence supporting individual models varies substantially. For readers seeking detailed performance data on the Epic Sepsis Model — including AUROC figures, sensitivity and specificity at external validation, and comparisons with FDA-cleared alternatives — the clinical evidence analysis of AI sepsis prediction in hospitals covers this comprehensively. This section focuses on the governance implications of the model suite as a whole.
The Novant Health deployment of Epic's Deterioration Index provides the clearest documented evidence of what structured governance enables. The Deterioration Index is an ordinal logistic regression model using 17 clinical features, trained on approximately 325,000 observations across 130,000 patient encounters at three organizations, and available in Epic since 2017. Novant Health's New Hanover Regional Medical Center deployed it with deliberate governance infrastructure: threshold calibration by clinical area, creation of a dedicated Predictive Model Oversight Committee, and a phased clinician training program.
The outcome documented by Novant Health was a 22% reduction in mortality for at-risk patients and an estimated 153 lives saved over 11 months. The implementation team used SlicerDicer for continuous validation and optimization monitoring throughout the deployment. Critically, the case record documents that the first phase of the project — after the Deterioration Index score was made available to nurses but before formal training — resulted in no meaningful change in mortality. Outcomes improved only after the governance infrastructure was in place.
"The first phase of the project, after the DI score was made available to nurses but before formal training, resulted in no meaningful change in mortality."
This finding has direct implications for how health systems approach any embedded predictive model deployment. Model availability does not produce clinical outcomes. Governance infrastructure — threshold calibration, accountability structures, and clinician training — is the mechanism through which model availability translates into patient benefit.
| Model Category | Clinical Use Case | Governance Obligation |
|---|---|---|
| Deterioration Index | Early identification of clinical deterioration across inpatient units | Threshold calibration by unit type (ED, ICU, general floor); clinician training on score interpretation; oversight committee for ongoing monitoring |
| Sepsis prediction | Early sepsis identification using vital signs, lab values, and clinical documentation | Local population validation; alert threshold review to manage false positive burden; continuous drift monitoring |
| Readmission prediction | Identification of patients at elevated risk for 30-day readmission | Bias audit across demographic subgroups; integration with care coordination workflows; outcome tracking |
| Length-of-stay forecasting | Operational prediction of patient discharge timing | Validation against local case mix; workflow integration review to prevent over-reliance on point estimates |
Governance Failure Anatomy: What the ESM Case Reveals About Checkpoint Failures
The Epic Sepsis Model's deployment arc is the most thoroughly documented example of simultaneous governance checkpoint failures in a widely deployed clinical AI tool. This section uses it as a compact anatomy case — not a full performance review. For the complete deployment chronology, regulatory accountability analysis, and performance evidence, see the Epic Sepsis Model governance case study.
What makes the ESM case instructive at the ecosystem level is not any single failure — it is the simultaneous presence of six distinct governance checkpoint failures, each of which should have been an independent stop signal. A synthesis of the ESM governance record identifies the following:
- Single-site validation deployed to multi-site: the model was trained and validated at one institution but deployed across health systems with substantially different patient populations without site-specific revalidation.
- No prospective trial before scale: the model was scaled to broad clinical deployment without a prospective controlled trial demonstrating patient outcome benefit.
- High false positive rate triggering alert fatigue: external validation documented a false positive rate that generated alert burden sufficient to undermine clinical response — a predictable consequence of deploying without local threshold calibration.
- No continuous drift monitoring: once deployed, the model operated without systematic monitoring for performance degradation over time. Research indicates 91% of ML models degrade within one to two years without active monitoring.
- No independent external validation requirement: the deployment pathway did not require independent external validation before broad clinical use.
- Vendor performance claims not independently verified: performance metrics cited in vendor documentation were not verified by independent researchers before deployment decisions were made.
Cosmos, CoMET, and Curiosity: From Discrete Alerts to Foundation Model Trajectory Simulation
Epic Cosmos is the underlying real-world data platform that powers the company's most advanced AI capabilities. As of August 2025, Cosmos contains records from more than 300 million unique patients across 310 health systems, representing 16.3 billion clinical encounters. The scale of this dataset distinguishes Cosmos from any single-institution training corpus and enables model development that would be infeasible for individual health systems.
CoMET (Cosmos Medical Event Transformer) is a family of foundation models built on this dataset. According to the CoMET arXiv preprint, CoMET models are decoder-only transformers pretrained on 118 million patients representing 115 billion discrete medical events (151 billion tokens). The largest variant, CoMET-L, has up to one billion parameters and uses the Qwen2 architecture with random initialization. In zero-shot evaluations across 78 clinical tasks — including disease-specific outcome prediction, acute-on-chronic risk, incident disease detection, differential diagnosis, and operational forecasting — CoMET-L generally matched or outperformed task-specific supervised models.
Curiosity, announced September 3, 2025, represents the clinical-facing application layer built on Cosmos and CoMET. Rather than scoring a single risk dimension — as a sepsis alert or deterioration index does — Curiosity generates multiple plausible future timelines for a patient's health journey, accounting for diagnoses resolving or emerging, complications arising, and shifting care needs. It was trained on sequences of time-ordered medical events spanning more than 100 billion patient medical events from de-identified records. Beginning in February 2026, researchers from Cosmos-participating organizations gained access through a virtual laboratory to explore new use cases.
| Component | Architecture / Scale | Current Status (mid-2026) | Key Evidence Limitations |
|---|---|---|---|
| Cosmos | 300M+ patients, 16.3B encounters, 310 health systems (August 2025) | Active production data platform | Data gaps from patients receiving care outside Cosmos-contributing organizations |
| CoMET | Decoder-only transformer, up to 1B parameters, trained on 115B medical events from 118M patients; Qwen2 architecture | Research evaluation; zero-shot benchmarks published | No prospective validation studies; no subpopulation fairness analysis; no multimodal data (notes, images) in training |
| Curiosity | Built on Cosmos; trajectory simulation across 78+ evaluated clinical cases | Researcher virtual lab access from February 2026; not in broad clinical deployment | No prospective RCT evidence; case-by-case evaluation required before workflow integration |
The governance challenge posed by Cosmos-era foundation models is qualitatively different from the challenge posed by discrete prediction alerts. A sepsis risk score can be evaluated against a defined clinical outcome (sepsis diagnosis, ICU transfer) using standard performance metrics. A trajectory simulation that generates "plausible future health journeys" requires a different validation framework — one that assesses not just whether individual predictions are accurate but whether the full trajectory output is clinically interpretable, actionable, and equitable across patient subpopulations. That framework does not yet exist in standardized form.
Seismometer: Open-Source Local Validation and Its Practical Limits
Seismometer is Epic's open-source model validation tool, designed to address the local validation gap that platform-level deployment creates. Rather than relying on Epic's training-data performance figures, Seismometer enables health systems to ingest their own patient data and evaluate how a deployed model performs on their specific population. According to Healthcare IT News reporting from July 2025, the tool can compare AI models side by side, automate subgroup analysis across age, sex, race/ethnicity, and other demographics, detect performance drift over time, and evaluate model impact on patient outcomes, treatment speed, and user acceptance.
Michigan Medicine was an early adopter. The institution used Seismometer to validate the Epic Sepsis Model version 2 and to identify that appropriate alert thresholds differed by clinical department — the threshold calibrated for the emergency department was not appropriate for the ICU or general medical floors. This department-level calibration work, which Seismometer automated and visualized, had previously required time-intensive manual analysis.
UW Health subsequently adopted Seismometer as standard practice for AI model evaluation. Brian Patterson at UW Health noted its ability to calculate subgroup analysis across demographic groups as a key capability for equity-aware governance.
- What Seismometer does well: automates local performance evaluation; surfaces demographic subgroup disparities; enables side-by-side model comparison; detects drift; generates intuitive dashboards for non-technical stakeholders.
- What Seismometer does not provide: return-on-investment metrics or financial performance analysis; guidance on what to do when drift or bias is detected; clinical decision-making support for threshold calibration choices.
- Who faces the steepest implementation barrier: resource-constrained organizations — Federally Qualified Health Centers, critical access hospitals, small independent facilities — face a steeper implementation journey due to limited informatics staff, data infrastructure, and capacity to act on Seismometer outputs even when the tool is available.
A Governance Framework for Health Systems: Pre-Deployment Through Decommission
Effective governance of Epic's AI ecosystem requires a structured lifecycle framework covering every phase from pre-deployment evaluation through eventual model retirement. The Novant Health Deterioration Index implementation and UW Health's governance practices provide documented evidence of what this looks like in practice. The following four-phase framework synthesizes these cases and applies them across Epic's AI stack.
| Phase | Core Activities | Key Governance Outputs |
|---|---|---|
| 1. Pre-Deployment | Local population validation on institutional data; demographic bias audit across age, sex, race/ethnicity; threshold calibration by clinical area (ED, ICU, general floors); go/no-go criteria defined before activation | Validation report; bias audit results; calibrated alert thresholds; documented deployment decision with rationale |
| 2. Deployment Oversight | Clinician training on model interpretation and limitations (UW Health's five principles: ownership, privacy, hallucinations, recency, utility); workflow integration checkpoints; escalation paths for unexpected model behavior | Training completion records; workflow integration sign-off; escalation pathway documentation |
| 3. Continuous Monitoring | Drift detection cadence using Seismometer; performance disaggregation by subgroup; alert burden tracking; periodic revalidation against updated local data; review by AI oversight committee | Drift monitoring reports; subgroup performance dashboards; alert fatigue metrics; revalidation records |
| 4. Decommission | Criteria for model retirement defined prospectively: performance below threshold, clinical context shift, superseded by better-validated alternative, or patient safety concern identified; transition plan for clinical workflows dependent on the model | Decommission decision memo; workflow transition plan; post-retirement outcome monitoring |
The UW Health clinician training framework deserves particular attention because it addresses a governance gap that technical validation alone cannot close. UW Health requires staff who use AI tools to understand five principles: Ownership (the clinician is responsible for AI output as if they wrote it themselves); Privacy (third-party tools not embedded in Epic should be assumed unsafe for protected health information); Hallucinations and Confabulations (every AI output should be verified, not assumed accurate); Recency (AI models may not reflect current clinical guidelines or recent data); and Utility (how a prompt or query is constructed affects the quality of the output).
These principles apply with equal force to SlicerDicer and SideKick queries, embedded predictive model alerts, and — when Curiosity becomes available for clinical use — foundation model trajectory outputs. They are not Epic-specific training content; they are the minimum cognitive framework any clinician needs to use AI-assisted tools safely.
Organizational Architecture: AI Committees, CMIO Roles, and the Joint Commission–CHAI Framework
Governance frameworks require organizational structures to implement them. The ONC data brief documents that 66% of hospitals have a specific AI committee or task force, with division and department leaders cited as accountable parties at 60% of hospitals. However, the brief also notes that governance rigor varies substantially by hospital size and system affiliation — small, rural, independent, and critical access hospitals consistently lag in both adoption and evaluation rigor.
UW Health's Clinical AI and Predictive Analytics (CAIPA) Committee provides the most detailed publicly documented model for AI governance organizational design. Described in detail on EpicShare, CAIPA was established in 2021 as an evolution of algorithm workgroups that began in 2018. The committee has approximately 40 members with expertise spanning technology, clinical care, operations, compliance, and ethics. It meets bimonthly and spins off workgroups for specific clinical domains or tools. Its function is explicitly analogous to a Pharmacy and Therapeutics committee: it evaluates which AI tools to deploy, sets standards for ongoing monitoring, and maintains an organizational AI inventory accessible to staff.
In September 2025, the Joint Commission partnered with CHAI (Coalition for Health AI) to release a seven-domain governance framework for responsible AI adoption across U.S. health systems. Because the Joint Commission accredits the majority of U.S. hospitals, this framework functions as a de facto national standard for health system AI governance.
| Domain | Governance Requirement | Organizational Accountable Party |
|---|---|---|
| 1. Clear AI governance policies | Documented policies covering AI procurement, deployment, monitoring, and retirement | CMIO, Chief Compliance Officer |
| 2. Executive leadership involvement | C-suite accountability for AI governance outcomes, not just IT or informatics delegation | CEO, CMO, CMIO |
| 3. Regulatory and ethical compliance | Processes for assessing regulatory classification, bias risk, and ethical implications of AI tools | CMIO, Legal/Compliance, AI Committee |
| 4. IT and cybersecurity integration | AI tools evaluated within the organization's broader IT security and data governance framework | CIO, CISO |
| 5. Safety personnel engagement | Patient safety officers and quality teams involved in AI deployment decisions and incident review | Chief Quality Officer, Patient Safety Officer |
| 6. Clinical department involvement | Specialty-level clinical review of AI tools affecting specific departments or patient populations | Department Chiefs, AI Committee workgroups |
| 7. Continuous monitoring and post-deployment oversight | Systematic post-deployment performance review with defined escalation triggers | CMIO, AI Committee, Informatics team |
The CMIO is the primary accountable executive for AI governance integration across most of these domains. The CMIO role sits at the intersection of clinical authority, informatics expertise, and operational accountability — the combination required to translate governance frameworks into clinical practice. Health systems without a CMIO, or where the CMIO role does not include explicit AI governance accountability, should identify a named executive accountable for each of the seven domains before deploying AI tools at scale.
The Shared Responsibility Boundary: What Epic Provides vs. What Health Systems Must Own
One of the most consequential governance questions for health systems using Epic's AI stack is where vendor responsibility ends and institutional responsibility begins. This boundary is not always clearly communicated at the point of tool deployment, and health systems that assume vendor deployment implies institutional adequacy accept risks they may not have formally evaluated.

Based on Epic's AI trust and assurance suite documentation, Epic's side of the boundary includes: the Seismometer open-source validation tooling; an AI trust and assurance suite that automates data collection and mapping, provides near real-time performance metrics, generates demographic breakdown dashboards (age, sex, race/ethnicity), and uses a common schema extensible to custom and third-party models; and transparency documentation covering model training data, intended use, and known limitations.
The health system's side of the boundary includes everything that platform tools cannot substitute for: local validation on the institution's specific patient population; bias and subgroup auditing against local demographic data; post-deployment drift monitoring with defined response protocols; multidisciplinary accountability structures (AI committee, CMIO oversight, clinical department involvement); escalation paths for unexpected model behavior or safety concerns; and clinician training on AI tool use, limitations, and accountability.
"The only way to achieve outcomes is going to be with local workflows and local optimization... The vendors need to be responsible for ensuring that any standard algorithms are developed with minimal bias on datasets that are representative, but even a representative dataset may or may not look like the population of patients that I serve."
This framing from UCSD Health's CMO captures the core logic of shared responsibility: platform-level representativeness does not guarantee local validity. A rural Nebraska hospital's patient population differs fundamentally from a New York academic medical center's — in age distribution, comorbidity burden, social determinants of health, and care-seeking patterns. No training dataset, however large, eliminates the need for local validation. For broader context on how Epic's position in the vendor landscape shapes these governance obligations, the structured overview of AI companies in healthcare provides relevant landscape context.
Known Limitations and Unresolved Governance Challenges
Several structural governance challenges remain unresolved as of mid-2026 and are unlikely to be addressed by any single institution or tool deployment.
- Resource disparities between large and small hospitals: the ONC data brief confirms that small, rural, independent, government-owned, and critical access hospitals lag in both AI adoption and evaluation rigor. Seismometer's implementation journey is steeper for organizations without dedicated informatics staff. The governance frameworks described in this article are most readily implemented by large system-affiliated hospitals — the institutions that already have the most governance capacity. The hospitals with the least capacity face the greatest implementation barriers.
- Regulatory classification ambiguity for foundation model outputs: discrete prediction alerts like the Deterioration Index fit reasonably within existing Software as a Medical Device (SaMD) frameworks. Curiosity's trajectory simulations — probabilistic, multi-pathway, generated from 100+ billion patient events — do not fit neatly into those frameworks. In February 2026, Epic submitted a letter to HHS arguing that generative AI requires new policy support because it is probabilistic, resource-intensive, and requires continuous validation in ways that existing SaMD guidance does not address.
- Absence of standardized outcome-measurement frameworks for Cosmos-era models: there is no consensus methodology for evaluating whether a trajectory simulation like Curiosity produces clinical benefit at the population level. The 78-case benchmark evaluation demonstrates that CoMET matches or outperforms task-specific models in zero-shot settings, but zero-shot benchmark performance is not equivalent to prospective clinical outcome evidence.
- Multi-state legal complexity: health systems operating across multiple states face varying and evolving state-level AI governance requirements. The governance structures described in this article address clinical and operational obligations; legal compliance requirements vary by jurisdiction and should be evaluated with legal counsel rather than treated as uniform.
- Pace of capability expansion outrunning governance infrastructure: Epic's AI feature set is expanding rapidly — approximately 125 generative AI features were in development as of HIMSS 2025, and about two-thirds of Epic-using providers were already using generative AI features. Governance committees that meet bimonthly, as UW Health's CAIPA does, face a structural challenge in keeping pace with deployment velocity.
The governance obligation for health systems using Epic's AI ecosystem is not to achieve perfection before deploying any AI tool. It is to build the organizational infrastructure — committees, training programs, validation protocols, monitoring cadences, and escalation paths — that allows the institution to know when a tool is working, when it is not, and what to do in either case. The Novant Health case demonstrates that this infrastructure produces measurable clinical benefit. The ESM case demonstrates what happens when it is absent. The choice between those trajectories is made at the institutional level, not the platform level.

Comments
Join the discussion with an anonymous comment.