
What TEFCA's Scale-Up Actually Means for Clinical AI
The Trusted Exchange Framework and Common Agreement (TEFCA) grew from roughly 10 million exchanged records to approximately 464 million by the end of 2025. That figure is frequently cited as an interoperability success story. For clinical AI developers, it is something more specific: a signal that the data infrastructure precondition for population-scale AI has arrived.
Before TEFCA reached meaningful scale, clinical AI teams faced a structural problem that had nothing to do with model architecture. Real-world clinical data existed in fragmented silos — point-to-point exchange agreements between individual institutions, proprietary data feeds negotiated case by case, and EHR exports that varied in structure, completeness, and terminology. Training a model on this data meant training on a patchwork, not a population.
TEFCA's architecture changes this at the network level. Qualified Health Information Networks (QHINs) serve as exchange hubs that any participating organization can connect to through a single onboarding agreement — the Common Agreement. Once connected, a participating entity can query or receive data from any other participating entity without negotiating bilateral contracts. The exchange is governed by standardized terms, standardized query types, and increasingly, standardized data formats.
For AI systems, the architectural implication is direct. A model trained or validated on TEFCA-accessible data is drawing from a national cross-section of clinical encounters — not a single health system's patient panel, not a convenience sample from one EHR vendor's customer base. That shift in data provenance is not cosmetic. It affects generalizability, bias auditing, and the defensibility of a model's performance claims across diverse populations.
Why FHIR Is the Foundational Protocol for Clinical AI
HL7 FHIR (Fast Healthcare Interoperability Resources) is the data format that makes TEFCA's exchange volume useful for AI. Without it, a large volume of exchanged records is still a large volume of heterogeneous, semi-structured documents that require extensive preprocessing before any model can learn from them.
FHIR's design is resource-based. Each clinical concept — a patient, an observation, a medication, a diagnostic report — is represented as a discrete, addressable resource with a defined schema. Resources reference each other through structured identifiers rather than being embedded in monolithic documents. This architecture makes it possible to query for specific data elements, receive them in a predictable format, and compose them into training datasets without parsing narrative text for structured facts.
The terminology bindings are where FHIR becomes particularly significant for AI reliability. FHIR R4 and R4B require or strongly recommend binding specific data elements to controlled vocabularies: SNOMED CT for clinical findings and procedures, LOINC for laboratory observations and clinical measurements, and RxNorm for medications. When these bindings are enforced, a laboratory result for serum creatinine from a hospital in Minnesota and one from a clinic in Georgia will use the same LOINC code. A model trained on LOINC-coded observations does not need to learn that "Creat SerPl-mCnc" and "serum creatinine" are the same concept — they already share a code.
Prior document-based exchange standards — the Clinical Document Architecture (CDA) and its implementation guide Consolidated CDA (C-CDA) — carried structured data inside narrative XML documents. Extracting discrete data elements required parsing logic that was brittle, vendor-specific, and error-prone. A C-CDA document could technically comply with the standard while burying the data an AI model needed inside a free-text section. FHIR's resource model eliminates that ambiguity by design.
- FHIR resources are individually addressable via RESTful APIs — enabling targeted queries for specific data elements rather than bulk document retrieval.
- Mandatory terminology bindings (SNOMED CT, LOINC, RxNorm) reduce semantic ambiguity across sources, a prerequisite for reliable cross-site AI inference.
- FHIR's versioned schema (R4, R4B, R5) provides a stable contract for data pipelines — model training code written against FHIR R4 resources does not break when source EHRs are upgraded, as long as the FHIR version is maintained.
- FHIR Bulk Data Access (the $export operation) enables population-level data extraction at scale — the mechanism through which AI training datasets are assembled from FHIR-compliant EHRs.
- FHIR's structured representation of provenance, encounter context, and patient demographics provides the metadata that AI models need for stratified evaluation and bias auditing.
The TEFCA FHIR Roadmap: Phases and Their AI Architecture Implications
ONC and the Sequoia Project have structured TEFCA's FHIR integration as a phased progression. Each phase expands what AI systems can access and how they can access it. The phases are not simultaneous — they reflect a deliberate sequencing of technical complexity and governance readiness.
Understanding which phase is in production versus pilot versus planned is essential for AI teams making architectural decisions. Building a data pipeline that depends on QHIN-to-QHIN FHIR exchange before that capability is in full production creates deployment risk. The table below maps each roadmap phase to its AI capability implications and key architectural considerations.
| Roadmap Phase | Status (Q2 2026) | What It Enables for AI | Key Architectural Considerations |
|---|---|---|---|
| Facilitated FHIR | In production (2024) | FHIR-based individual patient queries through QHINs; supports point-of-care AI inference on current patient data | AI systems must implement SMART on FHIR authorization; query scope limited to individual patient context; latency constraints affect real-time inference use cases |
| QHIN-to-QHIN FHIR Exchange | Pilot (2025); full production timeline unconfirmed as of Q2 2026 | Cross-QHIN FHIR data access; enables broader population coverage for training datasets and multi-site validation | Data governance across QHIN boundaries requires careful Common Agreement compliance review; provenance tracking becomes more complex across network hops |
| End-to-End FHIR Exchange | Future; timeline not finalized | Seamless FHIR-native exchange from source EHR to AI consumer without document translation layers; supports longitudinal population-scale training pipelines | Eliminates CDA/C-CDA translation overhead; requires AI pipelines to be FHIR R4 or R5 native; opens pathway for continuous model retraining on real-world data streams |
The practical implication for AI teams is that the current production-ready infrastructure supports individual patient query patterns — appropriate for clinical decision support tools that need a patient's longitudinal record at the point of care. Population-scale training dataset assembly via TEFCA remains dependent on QHIN-to-QHIN capabilities reaching full production and on individual QHIN data access policies permitting bulk use cases.

USCDI as an AI Training Data Standard
The United States Core Data for Interoperability (USCDI) is ONC's versioned specification of the minimum data elements that certified health IT must be capable of exchanging. It is typically described as an interoperability standard. For AI developers, it functions as something more operationally significant: a de facto specification for what structured data will be reliably available across FHIR-compliant EHRs.
When ONC adds a data element to USCDI, it is signaling that certified EHRs will be required to support that element in FHIR-based exchange. That signal has a direct consequence for AI training data: elements in the current USCDI version are the ones an AI model can reasonably expect to find across a broad range of source systems. Elements not yet in USCDI may exist in some EHRs but will be absent or inconsistently structured in others.
USCDI v6 was published in July 2025. Its notable addition for AI applications was non-implantable unique device identifiers (UDIs) — enabling AI models that analyze device-related outcomes to access structured device data rather than parsing narrative text for device references. Draft USCDI v7 was published for public comment in January 2026. Its proposed additions include adverse event data and healthcare information attributes. If finalized, these additions would meaningfully expand the structured data available for pharmacovigilance AI, safety signal detection, and patient-reported outcome models.
| USCDI Version | Publication Status | Key Additions Relevant to AI | AI Training Data Implication |
|---|---|---|---|
| USCDI v4 | Final (2023) | Expanded clinical notes, care team members, health insurance information | Baseline for most currently deployed FHIR-certified EHRs; broadest cross-system availability |
| USCDI v5 | Final (2024) | Pregnancy status, implantable device data, expanded diagnostic imaging | Supports AI models in obstetrics, device surveillance, and radiology; availability varies by EHR upgrade cycle |
| USCDI v6 | Final (July 2025) | Non-implantable UDIs | Enables structured device data for device-outcome AI models; reduces reliance on NLP extraction from narrative text |
| USCDI v7 | Draft (January 2026); final status unconfirmed | Adverse events, healthcare information attributes | If finalized, expands structured data for pharmacovigilance and safety AI; not yet available in certified EHRs |
AI teams should treat USCDI version alignment as a model scoping decision. A model trained on USCDI v6 elements can be expected to perform across a broader range of real-world deployments than one that depends on v7 draft elements. Documenting which USCDI version a model's training data corresponds to is also a defensible practice for regulatory submissions and clinical validation transparency.
Emerging AI-Native FHIR Patterns: CDS Hooks, SMART on FHIR, and Model Context Protocol
Three integration patterns define how AI systems currently connect to FHIR infrastructure at the point of care. Each serves a different function in the AI deployment architecture, and each carries different implications for product design and regulatory positioning.
CDS Hooks is the HL7 standard for triggering clinical decision support at defined workflow moments within an EHR session. When a clinician opens a patient chart, orders a medication, or signs a note, the EHR fires a hook to an external CDS service — which can be an AI model — and receives a structured response (a "card") that surfaces in the clinician's workflow. CDS Hooks is the primary mechanism for point-of-care AI inference that needs to operate within the EHR rather than as a standalone application. It is supported by major EHR vendors and is referenced in ONC certification criteria.
SMART on FHIR (Substitutable Medical Applications, Reusable Technologies) provides the authorization and launch framework for applications that embed within or connect to EHR sessions. It handles OAuth 2.0-based access token flows, scoping of FHIR resource access, and EHR-embedded app launch sequences. For AI applications that need to access a patient's FHIR data during a clinical encounter — rather than in batch — SMART on FHIR is the authorization layer that makes that access both technically possible and auditable.
The third pattern is newer and less settled. ONC's HTI-5 proposed rule, published in December 2025, explicitly referenced Model Context Protocol (MCP) as an example of emerging standards that "standardize how applications provide context to LLMs." MCP is a protocol originally developed to enable AI agents to access external data sources — including FHIR APIs — in a structured, permissioned way. ONC's acknowledgment of MCP in proposed rulemaking is significant as a directional signal: it indicates that the agency is actively considering how agentic AI systems should interact with FHIR infrastructure and that successor standards to CDS Hooks may eventually be formally recognized.
- CDS Hooks: Use for AI inference triggered at specific EHR workflow moments (chart open, medication order, discharge). Best suited for models that produce actionable outputs within a clinical session.
- SMART on FHIR: Use for AI applications that need scoped, authorized access to patient FHIR data during or between encounters. Required for any app that embeds within an EHR or accesses patient data via FHIR APIs.
- Model Context Protocol (MCP): Monitor as an anticipated direction for agentic AI systems that need to query FHIR resources as part of multi-step reasoning workflows. Not yet formally adopted in ONC rulemaking.
HTI-5's Proposed FHIR-Forward Reset: What Removing 34 Certification Criteria Means for AI Developers
ONC published the Health Data, Technology, and Interoperability: Certification Program Updates, Algorithm Transparency, and Information Sharing (HTI-5) proposed rule in December 2025. One of its most consequential provisions for AI developers is the proposed removal of 34 certification criteria from the ONC Health IT Certification Program.
The 34 criteria proposed for removal are primarily non-FHIR transport methods and document-based exchange requirements that were built into the certification program when CDA and C-CDA were the dominant exchange formats. The specific categories targeted include Direct Project messaging criteria, CDA creation performance criteria, and C-CDA exchange criteria.
For AI developers, the significance is architectural. Currently, EHR vendors certified under the ONC program must maintain compliance with both FHIR-based exchange requirements and legacy document-based exchange requirements simultaneously. This dual compliance burden has two effects on the AI ecosystem: it slows EHR vendor investment in FHIR capabilities (because engineering resources are split), and it means that AI products integrating with certified EHRs must handle both exchange paradigms or accept limited connectivity.
If HTI-5 is finalized as proposed, the removal of these criteria would create a cleaner FHIR-native baseline. EHR vendors would no longer need to maintain certified Direct Project or CDA-based exchange capabilities alongside FHIR. AI products designed as FHIR-native applications would face fewer integration edge cases and could assume a more consistent data exchange environment across certified EHR deployments.
| Criteria Category Proposed for Removal | Legacy Function | AI Architecture Implication if Removed |
|---|---|---|
| Direct Project messaging criteria | Secure email-based clinical message exchange between providers | Eliminates a non-FHIR exchange pathway that AI systems have had to accommodate; simplifies integration surface |
| CDA creation performance criteria | EHR ability to generate Clinical Document Architecture documents on demand | Reduces pressure to support CDA-based data extraction; FHIR Bulk Data becomes the de facto population data access method |
| C-CDA exchange criteria | Consolidated CDA document exchange for transitions of care and referrals | Removes the requirement for AI systems to parse C-CDA documents for structured data; FHIR resources become the primary structured data source |
The information blocking and agentic AI dimensions of HTI-5 are covered in depth in the site's existing analysis of the ONC information blocking rule. This section focuses specifically on the FHIR architectural implications of the certification criteria changes.
Data Access Compliance Essentials for AI Systems on TEFCA and FHIR
Accessing clinical data through TEFCA and FHIR APIs does not exempt AI systems from the compliance obligations that govern all health data use. AI teams building on this infrastructure need to navigate three overlapping compliance frameworks before data reaches a training pipeline or inference endpoint.
Information blocking rules prohibit practices that unreasonably restrict access to, exchange of, or use of electronic health information. For AI systems, the relevant question is whether the workflows through which AI accesses or uses FHIR data could be characterized as information blocking — particularly when AI outputs are used to make decisions that affect patient care. The site's existing analysis of the ONC information blocking rule covers the agentic AI dimensions of this question in depth; AI teams should review that analysis before designing data access architectures.
HIPAA's minimum necessary standard applies to all protected health information accessed for AI training or inference, regardless of whether that access occurs through a FHIR API or any other mechanism. De-identification under HIPAA's Safe Harbor or Expert Determination methods is required before PHI can be used for AI development outside of a covered entity's own treatment operations. FHIR Bulk Data exports contain PHI by default — de-identification must be applied at the pipeline level, not assumed from the exchange format.
TEFCA's Common Agreement imposes data use obligations on all participating entities. These include restrictions on secondary use of data obtained through TEFCA exchange, requirements for purpose-of-use documentation, and audit logging obligations. AI systems that access data through TEFCA-connected QHINs inherit these obligations — they cannot treat TEFCA-sourced data as freely available for any downstream use.
- Verify that the intended AI data use case falls within a permitted TEFCA purpose-of-use category before initiating any QHIN data access.
- Apply HIPAA de-identification to all FHIR Bulk Data exports before data enters AI training pipelines. Document the de-identification method (Safe Harbor or Expert Determination) and retain the documentation.
- Implement audit logging for all FHIR API queries made by AI systems, including the requesting entity, query scope, timestamp, and data elements accessed.
- Review TEFCA Common Agreement data use restrictions with legal counsel before using TEFCA-sourced data for AI model training, as distinct from treatment-purpose inference.
- Assess whether AI outputs that restrict or filter data access to clinicians could constitute information blocking under ONC's exception framework — particularly relevant for AI triage or prioritization tools.
- Confirm SMART on FHIR authorization scopes are minimally permissive — request only the FHIR resource types and patient populations the AI application genuinely requires.
Practical Architecture Guidance for AI Teams Building on This Infrastructure
The following principles are intended for health system AI teams and clinical AI developers making architecture decisions in the current TEFCA/FHIR environment. They are not vendor recommendations — they are structural considerations grounded in the regulatory and technical landscape as of Q2 2026.
Design FHIR data pipelines against a declared USCDI version
Rather than building training pipelines that assume all FHIR data elements will be present, scope pipelines to the USCDI version that corresponds to the EHR certification cohort you are targeting. If your deployment targets health systems running EHRs certified under ONC's 2015 Edition Cures Update criteria, USCDI v3 or v4 is the realistic baseline for cross-site data availability. Documenting this scoping decision makes model applicability claims auditable and defensible.
Use CDS Hooks for inference, SMART on FHIR for authorization — not the reverse
CDS Hooks is the right pattern for AI models that need to surface outputs at a defined workflow moment. SMART on FHIR handles the authorization layer that governs what FHIR data the AI application can access. These are complementary, not interchangeable. AI products that conflate them — for example, using SMART on FHIR launch as the inference trigger rather than as the authorization mechanism — create brittle integrations that break when EHR vendors update their CDS Hooks implementation.
Evaluate QHIN participation as a data access strategy, not a marketing claim
Connecting to a QHIN provides access to the TEFCA exchange network, but the practical data access it enables depends on which organizations are participating through that QHIN, what purpose-of-use categories they support, and whether QHIN-to-QHIN exchange is in production for your use case. AI teams evaluating QHIN participation should request specific documentation of participating organization coverage and data availability for their target patient population before treating QHIN connectivity as a solved data access problem.
Handle FHIR version transitions explicitly in model development
FHIR R4 is the current dominant version in certified EHRs. FHIR R5 introduced breaking changes in several resource definitions. AI models trained on R4-structured data may encounter schema differences when deployed against R5 sources. Version-aware data ingestion — explicitly declaring the FHIR version of source data and validating resources against the correct schema before training — prevents silent data quality degradation as the EHR ecosystem migrates.
Monitor the EHIgnite Challenge as a product opportunity signal
ONC announced the EHIgnite Challenge in February 2026 as an initiative to catalyze AI tools that transform raw FHIR export data into clinically actionable outputs. The challenge's existence signals that ONC has identified a specific gap: FHIR Bulk Data exports are technically accessible but not yet routinely converted into usable clinical intelligence at the point of care. AI teams with capabilities in FHIR data transformation and clinical NLP should confirm the challenge's current status and scope, as it may represent both a funding opportunity and a product validation pathway.
Open Questions and the Regulatory Horizon
The infrastructure described in this article is real and growing, but it is not settled. AI teams and health systems building on TEFCA and FHIR should maintain active monitoring across several open questions that will materially affect architectural decisions and compliance obligations in the next 12 to 24 months.
- HTI-5 finalization and scope. The proposed rule's removal of 34 certification criteria, its provisions on algorithm transparency, and its treatment of AI-native exchange patterns are all subject to change before finalization. The comment period response and ONC's final rule publication — timeline unknown as of June 2026 — will determine which proposals become binding requirements.
- USCDI v7 final publication. Draft USCDI v7 (January 2026) proposed adverse event data and healthcare information attributes. If finalized, these additions will expand the structured data available for pharmacovigilance and patient safety AI. The final publication date and any changes from the draft should be tracked against ONC's official USCDI versioning record.
- QHIN-to-QHIN production deployment. Piloted in 2025, full production deployment of QHIN-to-QHIN FHIR exchange remains the critical infrastructure milestone for population-scale AI training data access through TEFCA. Production timelines should be verified against current Sequoia Project milestone reports.
- ONC's evolving position on agentic AI definitions. ONC issued a FAQ in December 2025 addressing when agentic AI systems constitute actors subject to information blocking rules. This definitional question will shape how AI orchestration systems, multi-step reasoning agents, and AI-driven data access workflows are regulated. The existing site analysis of the information blocking rule covers this in depth.
- Model Context Protocol in formal rulemaking. ONC's HTI-5 citation of MCP as an example of AI-native successor standards is a directional signal, not a regulatory adoption. Whether MCP or a successor protocol enters formal ONC rulemaking — potentially in HTI-6 or a subsequent rule — will determine whether AI-FHIR integration patterns become certification requirements rather than voluntary best practices.
- TEFCA participation breadth and equity implications. TEFCA's exchange volume is growing, but participation is not geographically or demographically uniform. AI models trained on TEFCA-sourced data inherit the representativeness gaps of the participating network. As ONC and HHS publish data on TEFCA participation demographics, AI teams should assess whether their training data reflects the populations their models will be deployed against.
The TEFCA and FHIR infrastructure described here represents a genuine architectural shift in what is possible for clinical AI — not a policy aspiration but an operational reality that is expanding in scope quarter by quarter. The teams that will build durable AI systems on this infrastructure are those that treat the regulatory roadmap as a design constraint, not an afterthought.
Comments
Join the discussion with an anonymous comment.