You've been a care coordinator at CamMed for eight months. You know most of your patients by name — their medications, their family situations, which ones call in just to chat and which ones only call when something's wrong.
It's 4:47 PM on a Tuesday. Your shift ends in 13 minutes. You're finishing your last chart note when the phone rings.
You pick up. The caller sounds urgent.
"Hi — I'm calling about my father, Robert Meadows. He called me about an hour ago, and he sounded really confused. He thinks he took his blood pressure medication twice this morning, but he can't remember. I need to know what he's on and what the dose is — please, I don't have a lot of time here."
How do you respond?
"You just shared that you're accessing Robert's chart with an unverified caller. You have no way to confirm this is his daughter — or that she's authorized to receive his information. HIPAA requires identity verification before disclosing PHI, even when the urgency feels real. Especially then."
The urgency was real. But verification had to come first.
The pressure you felt was real — that's by design. Try the call again with the sequence in mind: safety first, then identity.
"You protected Robert's privacy — but you left his daughter without direction in a potential emergency. A possible medication overdose isn't a reason to come in. It's a reason to call 911. Privacy protection and patient safety aren't in conflict here. They both have an answer. You only gave one."
You were half right. See what a complete response looks like.
"You did two things in the right order. You redirected to emergency services first — removing urgency from the information request. Then you moved to verify identity before accessing any record. That sequence is the principle: Safety. Verify. Inform. In that order, every time — whether the caller is a worried daughter or a stranger with a story."
You used the correct sequence — Safety, then Verify, then Inform. Restart to build the habit, or go back and explore the other paths.
Single-column layout. Robert Meadows calls about his blood pressure medication. Three response paths: share information without verifying (HIPAA violation), refuse to help entirely (incomplete), or route to safety first then verify identity (correct). Narration via Artlist.io AI voice. No captions, no coach, no behavioral scaffolding.
The point of v1 was never polish — it was to make the scenario real enough to react to. Stakeholders can't tell you what they want in the abstract. They can tell you what's wrong with something concrete. V1 exists so someone can say "yes, that's how the call goes" or "no, that wording's off." The prototype is the question, not the answer.
MED — Minimum Effective Dose. Borrow from pharmacology: the smallest dose that produces a real response. V1 is the MED for a stakeholder conversation. A written description of the scenario would produce a nod. A broken interactive prototype produces a reaction — and reactions are data.
Three tools. No production team. No narration booth. Claude built the interaction layer, ChatGPT generated the scene images, Artlist.io provided the AI voice. Working prototype in a single session.
Right column added. As narration plays, the transcript highlights word-by-word in real time, synchronized to an SRT subtitle track — the same way YouTube captions work, adapted for a training scenario.
Multimodal Dual-channel input: audio through the ears, text through the eyes, both carrying the same content at the same time. For learners who grew up reading YouTube captions, this is the natural mode. The modality principle in multimedia learning theory (Mayer, 2001) predicts better retention when information is presented through complementary channels rather than redundant ones.
The version-switching buttons in the header were also added here — a meta-design choice. Switching versions without a page reload means the demo itself is the argument for iterative development.
The YouTube generation premise: a large share of adult learners have watched thousands of hours of captioned video. The reading-along habit is already trained. V2 leans into that instead of fighting it.
The SRT parser is custom-built: it reads a standard subtitle file format, syncs cue timings to the audio element, and drives text highlighting via a requestAnimationFrame loop. No external library. The same engine powers the caption bar in v3.
Text highlighting moved from the right column to a caption bar overlaid directly on the scenario image. The learner watches the scene — not a transcript. The image becomes immersive; the text is supporting, not competing.
Bjork The generation effect. After the caller audio ends, the response choices are withheld for two seconds before appearing. That pause is not a loading state — it's a deliberate learning mechanism. Robert Bjork (UCLA) found that when learners are prompted to formulate a response before seeing options, retention improves even if the mental answer is wrong. Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In Metcalfe & Shimamura (Eds.), Metacognition. MIT Press.
Desirable Difficulty Audio-gated UI. The Continue button and the choice options are both hidden until audio completes. You can't skip ahead. Bjork calls constraints like this "desirable difficulties" — conditions that slow apparent progress but improve durable learning.
A right-edge gradient also appears after audio ends: a slow white pulse at the edge of the image that signals forward navigation without cluttering the scenario during playback.
Friction is a feature. Every constraint added in v3 slows the learner down slightly. Each one is intentional. The two-second pause before choices isn't a UX transition — it's the generation effect at work. The audio gate isn't an accessibility decision — it's a desirable difficulty. The distinction matters for how you explain the design to stakeholders who want to know why you didn't "just make it faster."
On wrong-answer screens, the narrator is replaced by a live AI coach. The coach knows which screen the learner is on, what choice they made, and the correct protocol sequence (Safety → Verify → Inform).
Socratic The coach never gives the answer. One question at a time. It waits for a response, then asks the next question. This is a direct implementation of Socratic questioning: the learner must reconstruct the correct reasoning rather than receive it. Elaborative interrogation research (Pressley et al., 1992) consistently shows that self-generated explanations produce better transfer than reading a correct answer.
AI Backed by Claude Haiku via the Anthropic API. A [READY] token in the system prompt tells the model to signal when the learner has understood — typically within 2–3 exchanges. At that point, input locks and a "Try the call again" button appears. The learner goes back to practice, not to read more.
Chips Context-sensitive quick-reply buttons. The coach offers "Tell me more" and "Hint?" from the first exchange. The "I get it" chip is deliberately withheld until the learner has completed at least two full exchanges — it cannot be used as an early escape from the reflection. Every third exchange in Challenge Mode requires a typed or spoken response; chips are disabled for that round.
Voice input via the Web Speech API — browser-native, no infrastructure cost. Speaking a rationale more closely mirrors how a real coaching conversation would feel than typing one.
UX Auto-hide controls. The play button and autoplay toggle fade in when the mouse enters the image and disappear after 3 seconds of idle or on mouseleave — the same behavior YouTube uses. Controls appear only when needed; the image stays fully immersive. The toggle turns green when autoplay is on.
SFX Phone ring sound effect. A single ring plays 400ms after the intro narration ends — the moment the script says "your phone rings." A brief beat of tension before the learner clicks through to the call.
Debrief Win screen with takeaways + streak. Getting the answer right no longer drops the learner immediately back to the start. A dedicated debrief screen surfaces three key takeaways, a 🔥 streak counter (consecutive correct calls), and a "Complete your training" CTA. Completion should feel like something.
Preview Challenge Mode — password-gated. A second AI coach mode lives behind the debrief screen. Instead of corrective Socratic questioning, it runs escalating scenario variants — harder caller types, ambiguous authority claims, edge cases with no clean answer. Difficulty rises across five levels over nine exchanges. Currently in development and password-protected. Want early access? Email cameron@cameronstewart.com.
Why Socratic and not corrective? When a learner makes the wrong call, most training tools show them the right answer. This sim asks a question instead. That's harder for the learner, intentionally. A learner who reconstructs the correct reasoning under a bit of pressure is more likely to apply it under real pressure than a learner who read a feedback screen.
The coach persona is precise, non-judgmental, and never gives the answer away. It's closer to a clinical supervision model than a quiz answer key — which fits the healthcare compliance context.
On the tool stack: Claude built the interaction logic and powers the coach. ChatGPT generated the scene images. Artlist.io provided the AI voice. No production team. No narration booth. No video shoot. Four versions with live AI, behavioral science scaffolding, and voice input — built in days, not months.