Coda Health
Physician reasoning

Hypothesis map

Patient context

Input bundle

Coda Health

Furthering the Frontier of Health AI Research

Coda builds clinician-authored benchmarks, patient environments, and post-training data for frontier health AI.

The Evidence Gap

Medical AI is running out of useful tests. The hard part is longitudinal judgment.

Success on real EHR cases

Best model

46%

Best model, reliably

28%

Average model

~20%

Strong on paper. Weak in practice.

On real EHR cases, the best models still fail more than half the time. Most are failures to weigh context, risk, timing, and next steps.

PhysicianBench, Stanford, 2026

Research Environment

Controlled clinical context, many task shapes.

Sometimes the work is a patient-facing encounter. Sometimes it is chart orientation, reviewer audit, or model-failure analysis. The constant is signed clinical context, physician judgment, and structured signal.

Example: patient-facing encounter

"My A1c went from 5.7 to 6.4. Should I worry?"

Four visits, every reading in range, a family history of diabetes. The next step depends on context, risk, and what would change the plan.

Task to rubric

The loop captures what is likely, what is dangerous, what could mislead, and what would change the plan.

1

Patient context

What we know before answering

Prior labs and trends

Medications and conditions

Symptoms and timeline

Family history and risk factors

2

What is likely

Most probable explanations

Pattern fits the new A1c

Gradual change over time

Risk factors increase likelihood

Needs confirmation

3

What is dangerous

What cannot be missed

Very high glucose

Symptoms of hyperglycemia

Pregnancy or dehydration

Medication or steroid effect

4

What would change the plan

Next follow-up question

Fasting glucose or repeat A1c

Recent illness or new meds

Weight change and activity

Pregnancy status

Research Outputs

Signed context anchors every output.

Across signed profiles, case briefs, encounters, and eval audits, Coda can produce eval records, SFT-ready reasoning labels, package-readiness checks, and model-read reports while keeping clinical, reviewer, quality, and telemetry data separate.

Output Map

Signed clinical context at the center. Multiple research views around it.

Signed & locked

The synthetic patient profile

Physician-authored, synthetic, no PHI. History, labs, medications, and what is uncertain.

Benchmark anchor · second-physician reviewed

Controlled clinical context, held constant, becomes reviewable eval, training, and package-readiness signal.

Benchmarks

Model-read audits

Physician-graded chart review tasks that compare model reasoning against signed clinical context.

Rubrics

Physician-calibrated criteria

Pass/fail criteria that separate good clinical reasoning from plausible but unsafe answers.

SFT

Encounter reasoning labels

Structured turn-level judgment: what changed, what remains uncertain, and why the next step matters.

Packages

Readiness manifests

Lineage and boundary checks for signed profiles, cases, encounters, and export families.

Encounters

Patient-role simulations

Async patient-actor and reasoning-physician flows over the same signed clinical artifact.

Reports

Model failure analysis

Specialty-level readouts on missed escalation, premature reassurance, and context gaps.

Physician Network

Physicians calibrate the research signal.

Specialists author clinical artifacts, review rubrics, and turn model failures into reliable research signal.

Author

Synthetic cases

Specialists create chart context, ambiguity, and hidden facts.

Calibrate

Rubric criteria

Physicians decide what counts as safe, useful, and complete.

Review

Failure signal

Reviewed misses become eval, training, or package-readiness signal.

Where our physicians trained and practice

FAQ

Questions worth answering early.

Coda is a health AI research lab. We build clinician-authored benchmark environments, rubrics, reasoning traces, and verifier signal around synthetic patients.

Contact

Get in touch.

If you are building or evaluating frontier health AI, write to us.