AI for Health: A Cross-Specialty Landscape of FDA-Cleared Tools, Clinical Evidence, and Equity Gaps

A structured synthesis of where AI currently stands across major clinical specialties — covering FDA-cleared device concentrations, peer-reviewed evidence quality, active clinical trials, and the bias and equity concerns that cut across all of them.

The phrase "AI for health" covers a wide range of things that don't behave the same way in practice. A radiology AI that flags suspected pulmonary emboli on CT operates under different evidence standards, regulatory requirements, and failure modes than an LLM-based clinical documentation tool or a sepsis prediction model embedded in an ICU EHR. Treating them as a single category obscures the decisions that actually matter.

This page organizes the current state of AI across five specialties where FDA-cleared devices and peer-reviewed literature are concentrated enough to support structured comparison. The goal is to help readers locate relevant device records, evidence appraisals, and regulatory context without having to reconstruct the landscape from scratch.

Where FDA-Cleared AI Is Concentrated

As of Q2 2026, radiology accounts for the largest share of FDA-authorized AI/ML-enabled medical devices by a significant margin. Cardiology and pathology follow at a distance. Gastroenterology and primary care have smaller but growing authorization counts, with primary care's AI tools skewing toward administrative and documentation functions rather than diagnostic SaMD.

Relative FDA authorization concentration by specialty as of Q2 2026. Counts are approximate — see individual FDA device records for verified authorization status.
SpecialtyAuthorization ConcentrationPrimary Use CategoriesDominant Pathway
RadiologyHighest — majority of total FDA AI/ML device authorizationsTriage, detection, measurement, workflow prioritization510(k)
CardiologyHigh — second largest concentrationECG interpretation, arrhythmia detection, imaging analysis510(k)
PathologyModerate — growing rapidly with digital pathology adoptionSlide analysis, cancer grading, cell counting510(k) and De Novo
GastroenterologyLower — concentrated in endoscopy AIPolyp detection during colonoscopy510(k) and De Novo
Primary CareLower — skews toward administrative AIDocumentation (AI scribe), risk stratification, prior auth510(k) for clinical tools; many admin tools are not SaMD

Radiology

Radiology is the most saturated specialty for AI deployment — and also the one with the most documented performance heterogeneity. Chest X-ray AI, CT triage tools, and mammography CAD systems have accumulated the longest post-market track records. The evidence base is real but uneven: retrospective studies vastly outnumber prospective ones, and external validation on demographically diverse populations remains the exception rather than the rule.

What's Cleared and What It Does

The bulk of cleared radiology AI falls into three functional categories: detection and flagging (e.g., intracranial hemorrhage, pulmonary embolism, pneumothorax on CT), measurement and quantification (e.g., nodule sizing, organ volume), and workflow triage (prioritizing worklist order based on suspected findings). Most of these are cleared as 510(k) devices using predicate comparisons to earlier CAD tools.

A smaller number of tools — particularly in mammography screening and chest CT lung nodule management — have gone through De Novo authorization, establishing new device classifications. These tend to have more detailed performance data in the FDA submission record.

Evidence Quality and Known Gaps

  • Most published performance studies use retrospective, single-institution datasets. Performance figures from these studies often don't replicate in multi-site prospective deployments.
  • External validation — testing on a dataset entirely separate from the training and tuning set — is present in a minority of published radiology AI studies.
  • Demographic composition of training datasets is inconsistently reported. Known gaps include underrepresentation of darker skin tones in dermatology-adjacent imaging, and limited data from lower-resource healthcare settings outside the U.S. and Europe.
  • Model drift — performance degradation as scanner hardware, imaging protocols, or patient populations shift — has been documented in deployed radiology AI but is rarely reported systematically post-market.

Active Clinical Trials

Several prospective trials are evaluating radiology AI in real workflow conditions rather than retrospective image sets. NCT05046886 (AI-assisted chest X-ray reading in emergency settings) and NCT04936776 (AI triage for CT pulmonary angiography) represent the type of prospective design the field needs more of. ClinicalTrials.gov NCT numbers for radiology AI trials can be searched directly at ClinicalTrials.gov using the intervention term "artificial intelligence" filtered by radiology condition terms.

Cardiology

Cardiology AI has two distinct clusters: ECG-based tools and imaging-based tools. These have different evidence profiles, different regulatory histories, and different deployment realities.

ECG AI

AI algorithms applied to 12-lead ECG data represent one of the more clinically mature applications in the field. Several tools have demonstrated prospective validation across large, multi-site datasets. The Mayo Clinic-developed ECG AI work — detecting conditions like low ejection fraction, atrial fibrillation, and hypertrophic cardiomyopathy from ECG waveforms — has been among the most rigorously published. Importantly, some of this work has moved from publication to prospective randomized trials, which remains rare in healthcare AI.

Consumer-grade ECG AI (single-lead, wearable) has a different evidence profile. FDA clearance exists for atrial fibrillation detection in consumer devices, but the performance data in real-world populations — particularly older adults with comorbidities, who are the highest-risk group — is thinner than the device clearance record suggests.

Cardiac Imaging AI

AI tools for echocardiography (automated chamber measurement, ejection fraction calculation) and cardiac CT (coronary artery calcium scoring, stenosis detection) are cleared and in deployment at major health systems. The echocardiography AI space has seen particularly rapid adoption because it addresses a real workflow bottleneck: manual measurement of cardiac function is time-consuming and has known inter-reader variability. Automated measurement tools can reduce that variability — but the degree to which they improve clinical outcomes, rather than just workflow efficiency, is less established.

Pathology

Digital pathology AI is at an inflection point. The underlying infrastructure requirement — whole-slide imaging scanners and the digital workflow to support them — limited adoption for years. As more pathology labs complete the transition to digital workflows, AI tools for slide analysis are moving from research settings into clinical deployment.

FDA-cleared pathology AI tools cover applications including prostate cancer grading (Gleason scoring assistance), breast cancer detection on core needle biopsy, and mitotic figure counting. Some of these have gone through De Novo authorization rather than 510(k), which means they established new regulatory classifications and required more detailed performance characterization in their submissions.

Unresolved Questions in Pathology AI

  • Staining variability: AI models trained on slides from one lab's staining protocol can show degraded performance on slides from another lab. This is a known generalization problem with limited standardized solutions.
  • Rare cancer types: Most validated pathology AI tools target high-prevalence cancers. Performance on rare histologic subtypes is largely uncharacterized.
  • Pathologist-AI interaction: Studies show that when AI provides a concurrent read, pathologist behavior changes — sometimes improving accuracy, sometimes anchoring to the AI output even when it's wrong. The net clinical effect depends heavily on how the tool is integrated into the workflow.
  • Regulatory scope: Some pathology AI tools are positioned as "decision support" rather than diagnostic devices, which affects both the FDA pathway and the evidentiary standard expected.

Gastroenterology

Gastroenterology AI is narrower in scope than radiology or cardiology, but the concentration in one application — polyp detection during colonoscopy — has produced some of the most rigorous clinical trial data in the entire healthcare AI field.

Computer-aided detection (CADe) tools for colonoscopy have been evaluated in multiple randomized controlled trials, with several reporting statistically significant improvements in adenoma detection rate (ADR) compared to unassisted colonoscopy. This is a harder evidentiary bar than most healthcare AI studies attempt. The RCT results are not uniformly positive — some trials show ADR improvements while others show no significant difference — and the clinical significance of detecting additional small adenomas remains an open debate among gastroenterologists.

Beyond Polyp Detection

AI applications in upper GI endoscopy (Barrett's esophagus surveillance, gastric cancer detection) are earlier in their regulatory and evidence trajectory. Several tools have CE marking in Europe; FDA authorizations in this sub-area are more limited. Capsule endoscopy AI — automated reading of small bowel capsule studies — addresses a genuine workflow bottleneck (a single capsule study can generate 50,000+ frames) and has cleared FDA, though prospective outcome data remains limited.

Primary Care

Primary care AI doesn't fit neatly into the diagnostic imaging paradigm that dominates the other specialties on this page. The applications are more heterogeneous: risk stratification models embedded in EHRs, AI-generated clinical documentation (AI scribes), chronic disease management tools, and administrative automation for prior authorization and scheduling.

AI Scribes and Ambient Documentation

Ambient AI documentation tools — which listen to a clinical encounter and generate a draft note — have achieved the fastest adoption rate of any AI category in primary care. Several major health systems have deployed them at scale. The driver is not clinical efficacy in the traditional sense but physician burnout: documentation burden is a well-documented contributor to burnout, and tools that reduce it have strong adoption incentives regardless of whether they've been evaluated in RCTs.

Most ambient documentation tools are not regulated as SaMD because they generate draft notes for physician review rather than making autonomous clinical decisions. This means they don't appear in FDA device records — which creates a regulatory gap. Hallucination risk (AI-generated text that sounds plausible but contains factual errors) is a real concern in this category. A note that misattributes a medication, documents a finding that wasn't discussed, or fabricates a patient statement could reach the permanent medical record if the reviewing physician doesn't catch it.

Risk Stratification in EHRs

Predictive models embedded in EHR platforms — flagging patients at risk for deterioration, readmission, or specific conditions — have been deployed widely but studied inconsistently. The Epic Deterioration Index and similar tools have real-world deployment data, but independent prospective validation is limited. Several published studies have documented algorithmic bias in commercial risk stratification tools: a widely cited 2019 study in Science found that a commercial algorithm used across health systems systematically underestimated the care needs of Black patients relative to equally sick white patients, because it used healthcare cost as a proxy for health need.

Cross-Specialty Bias and Equity Concerns

Algorithmic bias in healthcare AI is not a hypothetical risk — it has been documented in deployed tools across multiple specialties. The mechanisms vary: training data that underrepresents certain populations, proxy variables that encode historical inequities, and performance metrics that don't disaggregate results by race, sex, age, or socioeconomic status.

Known bias and equity concerns by specialty. Evidence status reflects published literature as of Q2 2026. Absence of documented bias does not indicate absence of bias — it may indicate absence of study.
SpecialtyDocumented Bias ConcernMechanismEvidence Status
RadiologyLower sensitivity for certain findings in non-white patients on chest X-ray AITraining dataset skewed toward academic medical center populationsPublished retrospective analyses; limited prospective confirmation
CardiologyECG AI trained on predominantly white, male populations; performance gaps in women and non-white patients for some conditionsHistorical ECG dataset compositionPublished; flagged in multiple validation studies
PathologyStaining protocol variation disproportionately affects labs in lower-resource settingsModel brittleness to domain shift; not a demographic bias per seDocumented in multi-site validation studies
GastroenterologyPolyp detection ADR improvements less consistent in lower-volume endoscopists; limited data on diverse patient populationsMost RCTs conducted in high-volume academic centersPartially documented; ongoing trials addressing this
Primary CareRisk stratification models underestimate care needs of Black patients; documented in commercial EHR-embedded toolsCost as proxy for need; historical utilization data encodes access inequityPublished peer-reviewed evidence (Obermeyer et al., Science 2019)

One structural problem: most FDA submissions do not require disaggregated performance data by race, ethnicity, or sex as a condition of clearance. Some submissions include it voluntarily; many don't. This means the FDA device record often cannot answer the question of whether a tool performs equally across patient populations.

How to Use This Landscape Page

This page is a navigation layer, not a standalone reference. The structured information lives in the linked records.

  • To verify whether a specific tool is FDA-cleared and for what indication, go to the FDA AI Device Records section and filter by specialty.
  • To evaluate the quality of evidence behind a specific tool or application, see the Evidence Appraisals section, which covers study design, dataset composition, and limitations.
  • To track how FDA regulatory policy on AI/ML devices has changed over time, the Regulatory Tracker maintains a chronological record of guidance documents and policy actions.
  • For real-world deployment accounts — how tools actually perform outside controlled study conditions — see Clinical Deployment Reports.

Discussion

Clinical experience, implementation questions, and workflow observations from clinicians and administrators are welcome.

Comments

Join the discussion with an anonymous comment.

Loading comments...