Federated Learning in Healthcare AI: Definition and Privacy Guide

Definition and Origin

Federated learning (FL) is a distributed machine learning paradigm in which a shared model is trained across multiple data-holding institutions without any raw data leaving its originating site. Instead of pooling patient records in a central repository, each participating institution trains on its local dataset and transmits only the resulting model parameter updates — weights or gradients — to a central aggregator, which combines them into an improved global model.

The term was formally introduced by McMahan et al. at Google Brain in 2016, who described it as "communication-efficient learning of deep networks from decentralized data" and proposed the Federated Averaging (FedAvg) algorithm as its core aggregation method. In healthcare, the one-sentence framing is: FL allows hospitals and research institutions to train shared AI models across distributed patient datasets without any raw patient data leaving its originating institution.

How a Federated Training Round Works

Each training cycle in a federated system follows a defined sequence. The process repeats across many rounds until the global model converges to acceptable performance.

Global model initialization. The central aggregator initializes a global model — typically a neural network with a defined architecture — and distributes identical copies to all participating sites.
Local training. Each site trains its local copy of the model using only its own patient data for a defined number of iterations. The raw data never leaves the institution.
Transmission of model updates. Each site sends its locally updated model weights or gradients — not any patient records — back to the aggregator over a secure channel.
Aggregation via Federated Averaging. The aggregator combines the incoming updates using FedAvg: a weighted average of all updates, with each site's contribution weighted proportional to its local dataset size. The result is a new global model that incorporates learning from all participating sites.
Redistribution. The updated global model is redistributed to all sites, beginning the next training round. The cycle repeats until convergence criteria are met.

Radial infographic showing a central aggregator node connected to four hospital institution nodes, with model update arrows flowing inward and locked data icons anchored at each site. — In federated learning, only model parameter updates travel to the central aggregator — patient data remains locked at each institutional node.

Why Healthcare Specifically Needs Federated Learning

The data-silo problem in healthcare is not primarily a technical limitation — it is a legal and operational one. HIPAA in the United States and GDPR in the European Union impose significant restrictions on the cross-institutional transfer of protected health information. Centralizing patient data from multiple hospitals into a single training dataset requires data use agreements, de-identification procedures, and legal bases for processing that are difficult to establish at scale, particularly across international boundaries.

FL is a structural response to these constraints. By keeping data local and sharing only model knowledge, it enables multi-site training that was previously legally infeasible in many configurations. Several additional healthcare-specific factors reinforce the need for this approach:

Rare disease sample-size problem. For conditions with low prevalence — rare cancers, uncommon genetic disorders, atypical presentations — no single institution accumulates sufficient cases to train a well-generalizing model. FL allows aggregation of model knowledge across dozens of sites without pooling their data.
Scanner and protocol heterogeneity. Medical imaging data varies substantially across institutions due to different scanner manufacturers, acquisition protocols, and post-processing pipelines. Training on diverse local datasets via FL can improve model robustness to this variation.
Data sovereignty requirements. National regulations in many jurisdictions require that patient data remain within specific geographic or jurisdictional boundaries. FL enables international research collaboration while respecting these constraints.
Institutional data governance. Many health systems have internal policies prohibiting or heavily restricting external data sharing, independent of legal requirements. FL reduces the governance burden by eliminating the need for data transfer agreements in many configurations.

Privacy Enhancement Layers Beyond Baseline FL

Baseline FL — the standard McMahan et al. architecture with no additional privacy mechanisms — does not provide a formal privacy guarantee. The four principal privacy-enhancement techniques that can be layered onto FL each address different threat categories and carry distinct trade-offs in computational cost, communication overhead, and model utility.

Differential Privacy

Differential privacy (DP) adds calibrated statistical noise — typically Gaussian or Laplace — to model updates before they are transmitted to the aggregator. This provides a mathematically bounded privacy guarantee expressed as an epsilon-delta (ε, δ) parameter pair: smaller epsilon values indicate stronger privacy protection but greater degradation of model utility.

Published healthcare FL benchmarks indicate that models trained with differential privacy at ε=4 can reach within approximately 5% of the performance achievable without DP, representing a modest utility cost for a meaningful formal guarantee. Global DP — where the aggregation step itself is made differentially private — is preferable to local DP (where each institution privatizes its own updates) because it allows more data to be combined before noising, improving utility at a given privacy level.

Homomorphic Encryption

Homomorphic encryption (HE) allows the aggregator to perform mathematical operations — specifically, computing the weighted average of model updates — directly on encrypted ciphertext without ever decrypting the individual contributions. The aggregator sees only the aggregated result, never the individual site's update in plaintext.

The principal limitation is computational cost: HE operations are orders of magnitude more expensive than equivalent plaintext operations, making it impractical for large models or high-frequency training rounds without substantial infrastructure investment. Combining HE with transfer learning to reduce model weight size is one approach that has been explored in the literature.

Secure Multi-Party Computation

Secure multi-party computation (SMPC) enables a group of parties to jointly compute a function — such as the aggregate of their model updates — without any party exposing its individual input to any other party or to the aggregator. Cryptographic protocols distribute the computation so that no single node holds enough information to reconstruct another's contribution.

SMPC's primary cost is communication overhead: the protocol requires multiple rounds of encrypted message exchange between participants, which scales poorly with the number of sites and the size of the model.

Confidential Computing

Confidential computing uses hardware-based trusted execution environments (TEEs) — secure enclaves on modern processors — to protect model updates while they are being processed in memory. TEEs provide both data-in-use confidentiality (the operating system and other processes cannot read enclave memory) and execution integrity (the computation cannot be tampered with). This approach offers broader threat coverage than software-only methods because it addresses threats from privileged software, including the cloud provider's own infrastructure.

Comparison of privacy-enhancement layers for federated learning in healthcare. These mechanisms are additive: no single layer addresses all threat categories.
Mechanism	Privacy Guarantee Type	Threat Category Addressed	Primary Cost	Healthcare Deployment Maturity
Differential Privacy	Formal (ε-δ bound)	Inference from model outputs	Utility reduction (~5% at ε=4)	Used in published clinical studies
Homomorphic Encryption	Computational (encrypted aggregation)	Inspection of individual updates by aggregator	Extreme computational overhead	Prototype/research stage
Secure Multi-Party Computation	Cryptographic (joint evaluation)	Input exposure during aggregation	High communication overhead	Prototype/research stage
Confidential Computing (TEE)	Hardware-enforced execution integrity	Privileged software and infrastructure threats	Hardware dependency and cost	Early deployment in select consortia

Residual Privacy Threats: What Baseline FL Does Not Prevent

The most important misconception about federated learning is that keeping raw data local is equivalent to keeping patient information private. It is not. Baseline FL — without differential privacy, homomorphic encryption, SMPC, or confidential computing — provides no formal privacy guarantee. The model updates transmitted between sites carry information about the local training data, and that information can be exploited by adversaries with access to those updates.

There is currently no formal guarantee of privacy in the baseline federated learning model. — Sadilek et al., npj Digital Medicine, 2021

The documented threat categories are specific and serious:

Gradient inversion and model inversion attacks. An adversary with access to local model updates — including a compromised aggregator or a malicious federation participant — can mathematically reconstruct the training data from those updates. State-of-the-art gradient inversion attacks have demonstrated pixel-level reconstruction of medical images from model updates when the attacker has access to updates created from small local batches. This is not a theoretical risk: it has been demonstrated empirically in peer-reviewed research.
Membership inference attacks. These attacks determine, with statistical confidence, whether a specific patient record was included in a site's training data — effectively revealing that a particular individual was a patient at that institution for a particular condition.
Data attribute inference attacks. Beyond membership, adversaries can recover subsets of patient attributes — such as demographic characteristics or comorbidities — from model update patterns, even without reconstructing full records.
Model poisoning attacks. A malicious participant in the federation can deliberately manipulate its local model updates to corrupt the global model's behavior — introducing targeted misclassifications or degrading performance on specific subpopulations — while appearing to cooperate normally.

FL Topology Types: Cross-Silo and Cross-Device

Not all federated learning deployments have the same structure. The two principal topologies differ substantially in scale, participant characteristics, and operational requirements — and only one is typically relevant for clinical AI development.

Cross-silo FL is the dominant topology for clinical AI development. Cross-device FL is relevant for patient-generated data applications but introduces additional challenges around device heterogeneity and participant trust.
Characteristic	Cross-Silo FL	Cross-Device FL
Number of participants	Small to moderate (2–100 institutions)	Very large (thousands to millions of devices)
Participant type	Hospitals, health systems, research consortia, regulatory agencies	Patient smartphones, wearables, IoT health devices
Connectivity	Stable, high-bandwidth institutional networks	Intermittent, variable-bandwidth consumer devices
Compute capacity	High (institutional servers, GPUs)	Low (mobile CPUs, embedded processors)
Trust model	Known, accountable participants with legal agreements	Largely anonymous, heterogeneous participants
Data per participant	Large (thousands to millions of patient records)	Small (single patient's longitudinal data)
Healthcare relevance	Hospital consortia, multi-center trials, regulatory collaboration	Patient-generated health data, remote monitoring, consumer health apps

Cross-silo FL — a small number of large, trusted institutional participants with stable connectivity — is the frame relevant to hospital consortia, multi-center research networks, and regulatory science collaborations. The majority of published clinical FL studies and all documented regulatory-authority deployments operate in this topology.

Documented Real-World Healthcare Deployments

The peer-reviewed evidence base for FL in clinical practice has grown substantially since 2021, with several landmark studies demonstrating feasibility at meaningful scale. The deployments below represent the most cited and methodologically documented examples as of mid-2026.

Selected peer-reviewed FL healthcare deployments. All studies are cross-silo topology. Performance claims reflect conditions specific to each study's dataset, population, and privacy configuration.
Study / Deployment	Clinical Task	Scale	Key Finding	Source
Dayan et al., Nature Medicine, 2021	COVID-19 oxygen requirement prediction	71 hospitals across 20 countries	FL model outperformed all but one local model and generalized across diverse hospital populations without raw data sharing	Nature Medicine
Pati et al., Nature Communications, 2022	Rare brain tumor (glioblastoma) segmentation	71 sites internationally	FL model matched or exceeded centralized training performance on rare tumor subtypes; DP at ε=4 reached within ~5% of non-private scores	Nature Communications
Sadilek et al., npj Digital Medicine, 2021	Eight diverse health studies (diabetes, heart disease, SARS-CoV-2, MERS-CoV, EHR mortality)	Multiple institutions, cross-silo and cross-device	Federated models with explicit DP reproduced the same clinical conclusions as centralized models across all eight studies	npj Digital Medicine
Choudhury et al., JMIR AI, 2025	Lung cancer gross tumor volume (GTV) segmentation	12 hospitals across 8 nations, 4 continents	Personal Health Train infrastructure demonstrated international FL with secure aggregation server preventing model inversion attacks	JMIR AI
RACOON Network, 2024	Lung pathology segmentation in radiology	6 German university hospitals	FL outperformed less complex alternatives across all evaluation scenarios; documented real-world infrastructure challenges in scanner heterogeneity	PubMed
Horst et al. (TRICIA), Frontiers in Drug Safety, 2025	Medical device incident triage (regulatory science)	Swissmedic, FDA, DKMA (synthetic data proof-of-concept)	First known FL deployment by regulatory authorities; cross-silo configuration tested at GSRS 2024 with heterogeneous regulatory data	Frontiers in Drug Safety

Federated models can achieve similar accuracy, precision, and generalizability, and lead to the same interpretation as standard centralized statistical models while achieving considerably stronger privacy protections. — Sadilek et al., npj Digital Medicine, 2021

These positive findings should be read alongside the methodological critique literature. A 2025 systematic review of FL healthcare studies through May 2024 concluded that the vast majority were not appropriate for clinical use due to methodological flaws, including privacy concerns, generalization issues, and communication costs — finding that the effectiveness of FL in healthcare is significantly compromised by these issues in the published literature as a whole.

Limitations and Open Challenges

The gap between FL's theoretical promise and its clinical deployment readiness is real and documented. The following challenges represent the principal unresolved barriers as of mid-2026:

Non-IID data heterogeneity. The dominant unresolved technical challenge. Hospital populations, scanner protocols, clinical workflows, and documentation practices differ systematically across institutions. When local datasets are not independently and identically distributed (non-IID), FedAvg convergence degrades — the global model may perform well on average but poorly for specific site populations. This heterogeneity can also introduce or amplify representational bias in the federated model; see the Algorithmic Bias in Healthcare AI article for the broader bias framing.
Communication overhead. Each training round requires transmitting full model weight updates across the network. For large models — including modern deep learning architectures — this creates substantial bandwidth requirements that scale with model size and the number of training rounds. Adding SMPC or HE multiplies this overhead further.
Compute requirements and access equity. Participating in a federated consortium requires local GPU infrastructure capable of training large models. Under-resourced institutions — community hospitals, safety-net systems, facilities in low- and middle-income countries — may lack the infrastructure to participate, potentially reproducing the same representational gaps that FL was intended to address.
Reproducibility gaps. The 2025 systematic review identified significant methodological variability across published FL healthcare studies, with inconsistent reporting of privacy configurations, data heterogeneity measures, and external validation procedures. This limits comparability across studies and the ability to assess real-world generalizability.
Institutional trust and incentive alignment. Technical feasibility does not resolve the governance question of why institutions should contribute their data-derived model knowledge to a shared model that may disproportionately benefit competitors. Federated consortia require legal frameworks, data use agreements, and aligned incentive structures that are often more difficult to establish than the technical infrastructure.
Model drift in continuously updated federated models. FL models that are updated continuously as new local data arrives face the same drift risks as centrally trained models — with the added complexity that drift may originate from changes at any participating site and may not be detectable without coordinated monitoring across the consortium. See Building an Institutional Monitoring Program for Clinical AI Model Drift for monitoring frameworks applicable in this context.

Regulatory and Compliance Context

Federated learning supports compliance with HIPAA and GDPR by reducing the need to transfer protected health information across institutional or jurisdictional boundaries. However, FL does not automatically guarantee compliance under either framework. Institutions must independently establish a valid legal basis for any processing of personal data that occurs in the course of FL — including the local training step, the transmission of model updates, and the use of any aggregated model that may retain information about training subjects.

Under HIPAA, the question of whether model updates constitute protected health information has not been definitively resolved. Neural networks can unintentionally memorize training data, which has led some legal analysts to argue that a model trained on PHI should itself be classified as PHI. Institutions should obtain legal guidance specific to their configuration rather than assuming FL eliminates HIPAA obligations.

The EU AI Act adds a further regulatory layer for FL-based medical devices. While federated learning itself is not explicitly regulated by the Act, FL-based AI systems used in clinical decision support or medical device contexts may qualify as high-risk AI under the Act's provisions — triggering conformity assessment, transparency, and post-market monitoring requirements. The precise scope of these obligations for FL configurations was not fully adjudicated as of mid-2026, and regulatory guidance in this area is still developing.

The most concrete regulatory-authority FL deployment to date is the TRICIA proof-of-concept, documented by Horst et al. in Frontiers in Drug Safety (2025). This project extended Swissmedic's AI tool for medical device incident triage to partner regulatory agencies — the FDA and the Danish Medicines Agency (DKMA) — using a cross-silo FL configuration tested with synthetic data at the 14th Global Summit on Regulatory Science (GSRS 2024). The authors noted that, to their knowledge, no FL project had previously been implemented by regulatory authorities, making this a significant first proof-of-concept for regulatory science collaboration.

Several related paradigms are sometimes conflated with federated learning. Each is distinct in architecture and use case:

Split learning. In split learning, the neural network is divided between the client and a server at a designated "cut layer": the client processes data through the first portion of the network and sends intermediate activations (smashed data) to the server, which completes the forward and backward pass. Unlike FL, the full model is never resident on the client, and the client does not perform a complete local training step. Split learning has distinct privacy properties — including vulnerability to label inference attacks — and is not interchangeable with FL.
Swarm learning. Swarm learning replaces the central aggregator with a blockchain-governed peer-to-peer network in which participating nodes collectively manage model aggregation without any single coordinating authority. This eliminates the central aggregator as a single point of failure or trust, but introduces blockchain governance complexity. It is a variant of the FL paradigm, not a synonym for it.
Distributed learning (broader category). Distributed learning is the broader class of techniques in which model training is distributed across multiple nodes or datasets. Federated learning is a specific form of distributed learning characterized by data locality, heterogeneous participants, and communication efficiency. Not all distributed learning is federated.
Transfer learning. Transfer learning involves adapting a pre-trained model to a new task or domain, typically through fine-tuning on a smaller local dataset. It is sometimes combined with FL — for example, distributing a pre-trained foundation model to sites for federated fine-tuning — but the two concepts address different problems. See Transfer Learning and Fine-Tuning in Clinical AI for a dedicated treatment.
Foundation models. Large pre-trained models trained on broad datasets are increasingly being explored as candidates for federated fine-tuning across institutions. FL and foundation models are complementary rather than competing concepts. See Foundation Models in Healthcare: Definition, Architecture, and Clinical Scope for the foundation model framing.

Federated Learning in Healthcare AI: Definition, Privacy Mechanisms, and Clinical Evidence

Definition and Origin

How a Federated Training Round Works

Why Healthcare Specifically Needs Federated Learning

Privacy Enhancement Layers Beyond Baseline FL

Differential Privacy

Homomorphic Encryption

Secure Multi-Party Computation

Confidential Computing

Residual Privacy Threats: What Baseline FL Does Not Prevent

FL Topology Types: Cross-Silo and Cross-Device

Documented Real-World Healthcare Deployments

Limitations and Open Challenges

Regulatory and Compliance Context

Suggest Improvements

Comments

Definition and Origin

How a Federated Training Round Works

Why Healthcare Specifically Needs Federated Learning

Privacy Enhancement Layers Beyond Baseline FL

Differential Privacy

Homomorphic Encryption

Secure Multi-Party Computation

Confidential Computing

Residual Privacy Threats: What Baseline FL Does Not Prevent

FL Topology Types: Cross-Silo and Cross-Device

Documented Real-World Healthcare Deployments

Limitations and Open Challenges

Regulatory and Compliance Context

Related Terms and Distinctions

Suggest Improvements

Comments