Algorithmic Bias in Healthcare AI: Definition and Mitigation

Canonical Definition and Scope

In healthcare AI, algorithmic bias refers to any systematic and unfair difference in how AI predictions are generated for different patient populations that could lead to disparate care delivery. This definition, grounded in Hasanzadeh et al. (npj Digital Medicine, 2025), distinguishes algorithmic bias from two adjacent but distinct phenomena: random error and AI bias in non-clinical contexts.

Random error is stochastic — it affects all predictions unpredictably and does not systematically disadvantage any particular group. Algorithmic bias, by contrast, is directional and patterned: it consistently produces worse predictions, higher false-negative rates, or lower-quality outputs for identifiable subpopulations defined by race, ethnicity, sex, age, socioeconomic status, or geographic origin.

This entry should also be distinguished from two related but separate glossary concepts. Model drift describes performance degradation over time as the statistical distribution of incoming data shifts away from the training distribution — a temporal phenomenon that may or may not produce demographic disparities. AI hallucination describes the generation of plausible-sounding but factually incorrect outputs by generative models — a distinct failure mode unrelated to fairness across population subgroups. Algorithmic bias, as defined here, is specifically a fairness and equity problem: it concerns whether an AI system performs differently — and worse — for some patients than for others.

Why Algorithmic Bias in Healthcare Is Distinctively Consequential

Bias in a consumer recommendation algorithm may produce a suboptimal product suggestion. Bias in a clinical AI system can delay a diagnosis, exclude a patient from care management, or systematically underestimate disease severity in an already-disadvantaged population. The stakes are categorically different — and the documented scale of the problem makes this a patient safety issue, not merely an ethical abstraction.

Two large-scale reviews quantify the scope. Kumar et al., cited in the Hasanzadeh 2025 review, found that 50% of sampled healthcare AI studies demonstrated high risk of bias, frequently attributable to absent sociodemographic data, imbalanced datasets, or weak algorithm design; only 20% were rated low risk. Chen et al., also cited in that review, found that 83% of 555 neuroimaging-based AI models were rated at high risk of bias, with 97.5% of included subjects drawn from high-income regions — a data dominance pattern sometimes described as WEIRD (Western, Educated, Industrialized, Rich, Democratic) bias.

Patient safety: Biased AI outputs can cause clinicians to under-triage, misclassify, or deprioritize patients from underrepresented groups, leading to delayed or inappropriate care.
Health equity: AI systems trained predominantly on data from high-income, predominantly White populations may perform measurably worse for Black, Hispanic, Indigenous, female, older, or low-socioeconomic-status patients — amplifying pre-existing disparities rather than reducing them.
Regulatory compliance: The FDA's January 2025 draft guidance on AI-enabled device software functions explicitly identifies bias as a safety and effectiveness concern requiring active control across the total product lifecycle.
Institutional liability: Health systems deploying biased AI tools carry operational and legal risk if those tools produce systematically disparate outcomes across protected demographic groups.

Taxonomy of Bias by Origin Category

Algorithmic bias in healthcare AI does not originate from a single source. It enters and propagates across every phase of the model lifecycle, from the initial framing of a clinical problem through post-deployment surveillance. A four-origin taxonomy — human, data, algorithmic, and deployment — provides the most useful organizational structure for both detection and mitigation.

Four-quadrant diagram showing the four origin categories of algorithmic bias in healthcare AI: human-origin bias, data bias, algorithmic bias, and deployment bias, each represented with distinct icons and color coding. — The four origin categories of algorithmic bias in healthcare AI. Each quadrant represents a distinct source domain requiring different detection and mitigation strategies.

Human-Origin Biases

Human-origin biases enter the AI pipeline through the decisions, assumptions, and blind spots of the people who design, fund, label, and validate clinical AI systems. Five subtypes are well-documented in the literature:

Implicit bias: Subconscious attitudes held by developers, clinicians, or data labelers that influence model design choices, annotation decisions, and feature selection without conscious awareness.
Systemic bias: Institutional norms and structural inequities embedded in the healthcare system — differential access to care, historical underrepresentation in clinical trials, racially stratified diagnostic criteria — that are encoded into training data and perpetuated by models trained on that data.
Confirmation bias: The tendency of developers to weight or select data that confirms pre-existing hypotheses about how a model should behave, potentially excluding disconfirming evidence from underrepresented populations.
Training-serving skew: A mismatch between the population distribution in historical training data and the population distribution in the current deployment environment. A model trained on data from a tertiary academic medical center may perform poorly when deployed in a rural community hospital.
Concept shift: Changes in the meaning of clinical concepts over time — evolving diagnostic criteria, updated ICD coding conventions, shifting disease prevalence — that cause a model's training-time labels to no longer accurately reflect the current clinical reality.

Data Biases

Data biases arise from how training datasets are assembled, sampled, and measured. They are among the most commonly identified bias sources in published healthcare AI literature, but they are not the only source — a point this taxonomy is designed to make explicit.

Representation bias: Training data that underrepresents certain demographic groups, producing models that generalize poorly to those groups. A canonical example: convolutional neural networks trained on chest X-ray datasets from academic medical centers have been shown to underdetect disease in Black, Hispanic, female, and low-socioeconomic-status patients — populations underrepresented in those training corpora.
Selection and sampling bias: Non-random data collection that systematically over- or under-samples specific groups. The UK Biobank — a widely used genomics and imaging research dataset — exhibits healthy volunteer bias: participants are disproportionately healthy, older, White, and from higher socioeconomic backgrounds, making models trained on it poorly calibrated for sicker or more diverse populations.
Measurement bias: Systematic differences in how data is collected or recorded across sites, devices, or protocols. In medical imaging, variation in scanner manufacturers, field strengths, slice thicknesses, and contrast protocols across acquisition sites can introduce measurement bias that degrades model performance at sites with different equipment. In pathology, staining protocol variation across laboratories produces analogous effects.

Algorithmic Biases

Algorithmic biases arise from modeling choices made during development — choices about how to preprocess data, which features to include, and what objective function to optimize. Two subtypes are particularly consequential in clinical AI:

Aggregation bias: Applying a single model or preprocessing pipeline uniformly across heterogeneous subgroups, obscuring meaningful within-group variation. A model that predicts average glycemic control across a population may perform acceptably on aggregate metrics while systematically underperforming for specific ethnic subgroups with different disease trajectories.
Feature selection and proxy variable bias: Using a measurable variable as a proxy for a clinically meaningful outcome when that proxy is itself inequitably distributed across demographic groups. The Obermeyer et al. 2019 study (discussed in the case studies section below) is the canonical illustration: using healthcare cost as a proxy for health need introduced systematic racial bias because unequal access to care meant less money was spent on Black patients than on equally sick White patients.

Deployment Biases

Deployment biases emerge after a model is released into live clinical environments. They are structurally distinct from training-phase biases: a model can be developed with rigorous fairness controls and still produce inequitable outcomes in practice due to how clinicians interact with its outputs. This category is systematically underaddressed in the literature and in institutional AI governance frameworks.

Automation bias: The tendency of clinicians to over-rely on AI outputs, either accepting incorrect recommendations without sufficient independent review (commission errors) or failing to act when the AI does not flag a concern that the clinician would otherwise have identified (omission errors). Automation bias can amplify the clinical impact of any residual model bias.
Feedback loop bias: When clinicians routinely accept AI-generated labels or recommendations, those accepted outputs may be incorporated into future training cycles — reinforcing and potentially amplifying the original model's biases over successive retraining iterations. This is particularly concerning in systems with continuous learning or frequent model updates.
Dismissal bias and alert fatigue: High false-positive rates — often themselves a product of model bias against certain demographic groups — produce alert fatigue, leading clinicians to systematically ignore or dismiss AI warnings. If a model generates disproportionately more false positives for one population subgroup, clinicians may habituate to dismissing those alerts, resulting in missed true positives for that group.

Four-origin taxonomy of algorithmic bias in healthcare AI, with lifecycle entry points and illustrative examples.
Origin Category	Subtype	Entry Point in Lifecycle	Example
Human	Implicit bias	Conception, annotation	Labeler assumptions influencing ground truth labels
Human	Systemic bias	Conception, data selection	Historical underrepresentation in clinical trials encoded into training data
Human	Confirmation bias	Data selection, validation	Excluding disconfirming evidence from model evaluation
Human	Training-serving skew	Deployment	Academic center model deployed in community hospital with different case mix
Human	Concept shift	Post-deployment	ICD coding changes causing label-reality mismatch over time
Data	Representation bias	Data collection	CNN trained on predominantly White chest X-ray datasets underperforming for Black patients
Data	Selection/sampling bias	Data collection	UK Biobank healthy volunteer bias
Data	Measurement bias	Data collection	Multi-site MRI acquisition parameter variation
Algorithmic	Aggregation bias	Preprocessing, training	Single model applied uniformly across heterogeneous subgroups
Algorithmic	Proxy variable bias	Feature selection	Healthcare cost as proxy for illness need (Obermeyer 2019)
Deployment	Automation bias	Clinical deployment	Clinician accepts AI output without independent review
Deployment	Feedback loop bias	Retraining	Clinician-accepted AI labels reinforcing future training cycles
Deployment	Dismissal bias / alert fatigue	Clinical deployment	False-positive fatigue causing ignored warnings for specific subgroups

Fairness Metrics and the Accuracy–Fairness Tradeoff

Detecting algorithmic bias requires operationalizing what fairness means for a given clinical application. Four principal fairness metrics are used in clinical AI evaluation, each reflecting a different conception of equitable model behavior. Critically, these metrics are mutually non-equivalent: satisfying one can mathematically preclude satisfying another, and optimizing any single metric can reduce overall model accuracy or worsen performance on a different fairness dimension.

Principal fairness metrics used in clinical AI evaluation. No single metric resolves the accuracy–fairness tradeoff.
Fairness Metric	Definition	Clinical Interpretation	Key Limitation
Demographic parity	The AI produces positive predictions at equal rates across demographic groups, regardless of true outcome rates.	A sepsis alert fires at the same rate for Black and White patients.	Does not account for true differences in disease prevalence; may require predicting positive outcomes for patients who do not have the condition.
Equalized odds	The AI achieves equal true positive rates and equal false positive rates across groups.	The model correctly identifies sepsis — and incorrectly flags non-sepsis — at the same rates across demographic groups.	Satisfying equalized odds and demographic parity simultaneously is mathematically impossible when base rates differ across groups.
Equal opportunity	The AI achieves equal true positive rates across groups (false positive rates may differ).	Patients with sepsis are identified at the same rate regardless of race or sex.	Permits disparate false positive rates, which may produce differential alert burden across groups.
Counterfactual fairness	The AI would produce the same prediction for an individual if their protected attribute (e.g., race) were different, holding all else constant.	A risk score does not change if a patient's recorded race changes in the input data.	Difficult to implement in practice; requires causal modeling and may be sensitive to how protected attributes are defined.

The accuracy–fairness tradeoff is an acknowledged open challenge in clinical AI, not a solved problem. A JMIR scoping review of bias mitigation in primary care AI found that attempts to improve calibration for specific demographic groups sometimes exacerbated false positive and negative rate differences between groups, or led to overall model miscalibration. Improving fairness for one subgroup along one metric can degrade performance for another subgroup or on a different metric.

Mitigation Frameworks Mapped to the AI Lifecycle

Because algorithmic bias enters the AI pipeline at multiple points, mitigation must be applied across the full model lifecycle — not only at the data preprocessing stage. The following framework maps specific, evidence-supported mitigation techniques to six lifecycle phases.

Horizontal AI model lifecycle pipeline from Conception through Data Collection, Pre-processing, Training, Deployment, and Surveillance, with amber and red bias entry points and a green fairness mitigation band, showing diverse patient silhouettes on the left and an AI output icon on the right. — Bias entry points and mitigation interventions across the AI model lifecycle. Warm indicators mark stages where bias commonly enters; the green band represents the fairness mitigation layer that must span all phases.

Phase 1: Conception

The framing of a clinical AI problem — what outcome to predict, which population to target, what constitutes ground truth — embeds assumptions that propagate through every subsequent phase. Mitigation at this stage is structural:

Assemble diverse development teams that include clinicians, patients, ethicists, and representatives from communities likely to be affected by the system.
Apply explicit DEI (diversity, equity, and inclusion) principles to problem scoping, ensuring that the populations most likely to be harmed by bias are centered in the design process.
Build in confirmation bias vigilance: require that development teams explicitly identify and document assumptions they are making about the target population and the ground truth definition.

Phase 2: Data Collection

Source training data from diverse institutions, geographic regions, and patient populations — not only from large academic medical centers with predominantly White, high-SES patient populations.
Apply the STANDING Together initiative standards (published September 2022) for dataset diversity reporting, which provide a structured framework for documenting the demographic composition and limitations of AI training datasets.
Plan prospective external validation datasets that include underrepresented groups from the outset of data collection, rather than treating external validation as an afterthought.

Phase 3: Pre-processing

Oversampling techniques: SMOTE (Synthetic Minority Oversampling Technique) and ADASYN (Adaptive Synthetic Sampling) generate synthetic examples for underrepresented groups to rebalance training distributions.
Reweighting and relabeling: Assign higher loss weights to underrepresented groups during training, or correct known labeling errors that disproportionately affect specific subpopulations. The JMIR scoping review identified these as among the most promising preprocessing interventions.
Robust imputation: Handle missing data with methods that do not introduce additional demographic disparities — for example, by imputing separately within demographic subgroups rather than using population-wide mean imputation.
NLP extraction of social determinants: Use natural language processing to extract social determinants of health (housing instability, food insecurity, transportation barriers) from unstructured EHR notes, enabling models to incorporate equity-relevant contextual factors that are absent from structured data fields.

Phase 4: In-processing (Model Training and Validation)

Adversarial training: Train the model alongside an adversarial component that attempts to predict protected attributes from the model's representations, penalizing the model for encoding demographic information in its internal features.
Fairness-aware loss functions: Incorporate fairness constraints directly into the optimization objective, penalizing prediction disparities across demographic subgroups during training.
Federated learning: Train models across distributed datasets held at multiple institutions without centralizing patient data, enabling exposure to more diverse populations while preserving data privacy.
Stratified cross-validation: Ensure that validation folds preserve demographic distributions, so that subgroup performance is evaluated on representative samples rather than pooled averages that may mask disparities.
Counterfactual testing and red teaming: Systematically test model behavior by modifying protected attributes in input data and observing prediction changes; use red team exercises to actively probe for failure modes in underrepresented subgroups.
Explainability tools (SHAP, LIME): Apply SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) to audit which features drive predictions for different demographic subgroups, identifying proxy variables or spurious correlations.

Phase 5: Post-processing and Deployment

Subgroup calibration: Calibrate model outputs separately for demographic subgroups so that predicted probabilities reflect true event rates within each group, not only across the pooled population.
Threshold adjustment: Apply group-specific decision thresholds to equalize a chosen fairness metric (e.g., equal opportunity) across subgroups, with explicit documentation of the tradeoffs this introduces.
Shadow deployment: Run the model in parallel with existing clinical workflows before full deployment, monitoring subgroup performance in the actual clinical environment without yet acting on outputs.
Human-in-the-loop review: For high-stakes decisions, require human clinician review before acting on AI outputs. Design interfaces that present AI outputs as decision aids — with uncertainty estimates and demographic performance context — rather than as directives, to counteract automation bias.
Saliency maps and visual explanations: In imaging AI, provide clinicians with saliency maps showing which image regions drove a prediction, enabling detection of cases where the model is attending to artifacts or demographic proxies rather than clinically relevant features.

Phase 6: Post-deployment Surveillance

Implement real-time monitoring dashboards that track model performance metrics (sensitivity, specificity, positive predictive value) stratified by demographic subgroup — not only for the aggregate population.
Set automated threshold alerts that trigger review when subgroup performance metrics diverge beyond pre-specified bounds from baseline.
Monitor for concept drift: track whether the relationship between input features and clinical outcomes is shifting over time in ways that may disproportionately affect specific subgroups.
Implement feedback loop detection: audit whether clinician-accepted AI outputs are being incorporated into retraining datasets, and assess whether doing so is reinforcing or correcting existing demographic performance gaps.

Mitigation techniques mapped to six AI lifecycle phases. Effective bias control requires interventions at every phase, not only at data preprocessing.
Lifecycle Phase	Primary Bias Risk	Key Mitigation Techniques
Conception	Human-origin biases (implicit, systemic, confirmation)	Diverse team composition, DEI principles, explicit assumption documentation
Data collection	Representation bias, selection bias, measurement bias	Diverse sourcing, STANDING Together standards, prospective validation datasets
Pre-processing	Representation imbalance, missing data bias, proxy variables	SMOTE, ADASYN, reweighting, relabeling, robust imputation, NLP extraction of social determinants
Training / validation	Aggregation bias, proxy variable bias, overfitting to majority groups	Adversarial training, fairness-aware loss functions, federated learning, stratified cross-validation, counterfactual testing, SHAP/LIME
Post-processing / deployment	Automation bias, residual subgroup calibration gaps	Subgroup calibration, threshold adjustment, shadow deployment, human-in-the-loop, saliency maps
Post-deployment surveillance	Feedback loop bias, concept drift, alert fatigue	Real-time subgroup performance dashboards, automated alerts, drift monitoring, feedback loop audits

Canonical Case Studies

Case Study 1: Proxy Variable Bias in Health Risk Stratification (Obermeyer et al., 2019)

In 2019, Obermeyer and colleagues published a landmark analysis in Science demonstrating that a widely used commercial health risk-stratification algorithm contained substantial racial bias arising from a feature selection choice — not from any explicitly discriminatory design intent.

The algorithm predicted healthcare costs as a proxy for health need, on the reasonable assumption that sicker patients generate higher costs. However, because structural inequities in the U.S. healthcare system mean that Black patients have historically received less care than equally sick White patients, the cost proxy systematically underestimated the health needs of Black patients. At any given risk score, Black patients had approximately 26.3% more chronic conditions than White patients — meaning the algorithm rated them as less sick than they actually were.

The clinical consequence was direct: Black patients were enrolled in care management programs at substantially lower rates than their health needs warranted. After the algorithm was modified to use health status measures rather than cost as the primary prediction target, the percentage of Black patients receiving additional care management support increased from 17.7% to 46.5% — a remediation that required no new data collection, only a change in the outcome variable being predicted.

Case Study 2: Representation Bias in Cardiac MRI Segmentation

A second case study, reported in the Hasanzadeh et al. 2025 review, illustrates representation bias and its remediation in a medical imaging context — a domain where aggregate performance metrics frequently obscure demographic disparities.

An nnU-Net cardiac MRI segmentation model trained primarily on UK Biobank data achieved a Dice Similarity Coefficient (DSC) of 93.5% for White subjects — a result that would typically be considered clinically acceptable. However, DSC for Black and Mixed-race subjects was as low as 84.5% — a gap of approximately nine percentage points driven by underrepresentation of those groups in the UK Biobank training data.

Three mitigation strategies were evaluated, each with measurable but distinct outcomes:

Stratified batch sampling: Rebalancing training batches to include proportionally more examples from underrepresented groups improved Black subject DSC from 85.88% to 93.07% — closing most of the gap without requiring racial information at inference time.
Fair meta-learning: A meta-learning approach optimized for cross-group fairness produced intermediate improvements, with performance gains distributed more evenly across all demographic subgroups.
Protected group models: Training separate models for specific racial groups achieved the best DSC for Black subjects (92.15%) but required racial information to be available at inference time — a practical and ethical constraint that limits deployability in many clinical settings.

Regulatory and Governance Context

Three regulatory and governance frameworks are directly relevant to algorithmic bias in clinical AI. Each operates at a different level of authority and specificity.

FDA Draft Guidance on AI-Enabled Device Software Functions (January 2025)

On January 7, 2025, the FDA issued draft guidance FDA-2024-D-4488: "Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations." This is the most specific U.S. regulatory document to date addressing algorithmic bias in AI-enabled medical devices.

The guidance explicitly states that its goal is to address transparency and bias and to ensure that device benefits extend to all relevant demographic groups, including age, sex, race, and ethnicity. It frames bias as a safety and effectiveness concern — not merely an ethical one — and recommends that bias be controlled by considering the representativeness of data when developing, testing, and monitoring AI-enabled devices.

Requires demographic subgroup testing across the total product lifecycle (TPLC), including post-market settings.
Recommends transparency measures so that clinicians and patients understand the demographic groups for which a device has been validated.
Encourages early FDA engagement through the Q-submission process for AI-enabled devices with potential demographic performance gaps.
Covers post-market performance monitoring requirements, including ongoing surveillance for bias in real-world deployment.

Separately, the FDA's December 2024 Final Guidance on Predetermined Change Control Plans (PCCPs) for AI-enabled device software functions provides a mechanism for sponsors to manage planned AI model modifications — including bias-correcting retraining — without requiring additional marketing submissions for each update. For a full treatment of the PCCP mechanism, see Predetermined Change Control Plan (PCCP): The FDA Mechanism for Iterative AI/ML Medical Device Updates.

NIST AI Risk Management Framework (AI RMF 1.0, January 2023)

The NIST AI Risk Management Framework (AI RMF 1.0), released January 26, 2023, provides a voluntary framework for managing AI risk across the full AI system lifecycle. While not healthcare-specific, its three-category bias taxonomy is directly applicable to clinical AI governance:

Systemic bias: Bias arising from historical, social, and institutional inequities embedded in data and system design — corresponding to the human-origin and data bias categories in the four-origin taxonomy above.
Computational and statistical bias: Bias introduced through modeling choices, optimization objectives, and algorithmic design — corresponding to the algorithmic bias category.
Human-cognitive bias: Bias arising from human interpretation and use of AI outputs — corresponding to the deployment bias category, including automation bias and dismissal bias.

The NIST AI RMF is intended for voluntary adoption and provides a companion Playbook with specific practices for each framework function (Govern, Map, Measure, Manage). Health systems and AI developers may use it as a governance reference alongside FDA requirements.

WHO Ethical AI Principles

The World Health Organization's ethical principles for AI in health include equity and inclusiveness as core requirements, explicitly calling for AI systems that do not perpetuate or amplify existing health disparities. The WHO framework is non-binding but provides an internationally recognized reference point for health equity considerations in AI governance, particularly relevant for organizations operating across multiple jurisdictions.

Limitations and Open Challenges

Despite substantial progress in understanding and categorizing algorithmic bias in healthcare AI, several structural challenges remain unresolved as of mid-2026. These are not gaps that better engineering alone can close.

The accuracy–fairness tradeoff has no clean resolution. Optimizing a model for any single fairness metric typically degrades performance on at least one other fairness metric or reduces overall accuracy. Clinical teams must make explicit, documented choices about which tradeoffs are acceptable for a given application — and accept that no choice is neutral.
Data sparsity for underrepresented groups limits subgroup validation power. Even when diverse training data is available, sample sizes for specific demographic subgroups are often too small to support statistically reliable subgroup performance estimates. A model may appear unbiased in validation simply because the subgroup sample was too small to detect a real performance gap.
WEIRD data dominance remains pervasive. The finding that 97.5% of neuroimaging AI training subjects come from high-income regions is not an outlier — it reflects a structural pattern across clinical AI development. The populations most likely to be harmed by biased AI are the populations least represented in the data used to build and validate it.
Concept drift erodes post-deployment fairness gains over time. A model that achieves equitable subgroup performance at deployment may develop demographic performance gaps as population distributions shift, coding conventions change, or clinical practice evolves. Post-deployment fairness is not a one-time achievement; it requires continuous monitoring.
Standardized bias reporting requirements are absent. There is no mandatory, standardized framework for reporting demographic subgroup performance in clinical AI publications or regulatory submissions comparable to CONSORT for clinical trials. The result is that bias is frequently undisclosed, making cross-study comparison and systematic review of bias patterns difficult.
Algorithmic bias is a symptom of structural inequity, not only a technical engineering problem. As Walker and colleagues argue in a Nursing Outlook analysis, bias in healthcare AI reflects power imbalances in who creates, develops, and deploys health technologies. Technical mitigation strategies are necessary but not sufficient; addressing algorithmic bias at its roots requires redistributing decision-making power toward the communities most likely to be affected.

Algorithmic Bias in Healthcare AI: Definition, Taxonomy, and Mitigation Frameworks