Patient context
What we know before answering

Hypothesis map
Input bundle

Coda builds clinician-authored benchmarks, patient environments, and post-training data for frontier health AI.
The Evidence Gap
Success on real EHR cases
Best model
46%
Best model, reliably
28%
Average model
~20%
On real EHR cases, the best models still fail more than half the time. Most are failures to weigh context, risk, timing, and next steps.
PhysicianBench, Stanford, 2026
Research Environment
Sometimes the work is a patient-facing encounter. Sometimes it is chart orientation, reviewer audit, or model-failure analysis. The constant is signed clinical context, physician judgment, and structured signal.
Example: patient-facing encounter
"My A1c went from 5.7 to 6.4. Should I worry?"
Four visits, every reading in range, a family history of diabetes. The next step depends on context, risk, and what would change the plan.
Task to rubric
The loop captures what is likely, what is dangerous, what could mislead, and what would change the plan.
What we know before answering
Prior labs and trends
Medications and conditions
Symptoms and timeline
Family history and risk factors
Most probable explanations
Pattern fits the new A1c
Gradual change over time
Risk factors increase likelihood
Needs confirmation
What cannot be missed
Very high glucose
Symptoms of hyperglycemia
Pregnancy or dehydration
Medication or steroid effect
Next follow-up question
Fasting glucose or repeat A1c
Recent illness or new meds
Weight change and activity
Pregnancy status
Research Outputs
Across signed profiles, case briefs, encounters, and eval audits, Coda can produce eval records, SFT-ready reasoning labels, package-readiness checks, and model-read reports while keeping clinical, reviewer, quality, and telemetry data separate.
Output Map
Physician-authored, synthetic, no PHI. History, labs, medications, and what is uncertain.
Benchmark anchor · second-physician reviewed
Controlled clinical context, held constant, becomes reviewable eval, training, and package-readiness signal.
Physician-graded chart review tasks that compare model reasoning against signed clinical context.
Pass/fail criteria that separate good clinical reasoning from plausible but unsafe answers.
Structured turn-level judgment: what changed, what remains uncertain, and why the next step matters.
Lineage and boundary checks for signed profiles, cases, encounters, and export families.
Async patient-actor and reasoning-physician flows over the same signed clinical artifact.
Specialty-level readouts on missed escalation, premature reassurance, and context gaps.
Physician Network
Specialists author clinical artifacts, review rubrics, and turn model failures into reliable research signal.
Author
Specialists create chart context, ambiguity, and hidden facts.
Calibrate
Physicians decide what counts as safe, useful, and complete.
Review
Reviewed misses become eval, training, or package-readiness signal.
Where our physicians trained and practice



FAQ
Coda is a health AI research lab. We build clinician-authored benchmark environments, rubrics, reasoning traces, and verifier signal around synthetic patients.
Contact
If you are building or evaluating frontier health AI, write to us.