If you work in pharma, you already know the problem. Due to data lag, claims-based alerts arrive weeks after the event. Lab-based alerts get closer, but often the campaign’s goal is to increase testing, so you’re chasing the same signal you’re trying to create. EMR alerts sometimes come in earliest, but coverage is thin and much of what matters sits in unstructured notes.
Buying another data source can trim a few weeks of cycles, but you’re still reacting. By the time the rep gets a meeting, the treatment decision has often been made.
The fix isn’t another data source. It’s a different question. Not what just happened? but what’s about to?
That’s the shift from alerting to anticipating: seeing the decision coming, months before it’s visible in the data. When we applied this approach to a rare oncology indication, the results were hard to ignore:
- ~3x more alert volume than existing sources
- ~2.8x more accuracy than the prevailing ML model
- ~3 months of additional lead time before the key treatment decision
- ~10% market share lift among the doctors we engaged.
A better ML model isn’t the answer
Predictive models in pharma aren’t new. But the build cycle is structurally constrained.
Each model takes six to nine months to stand up, requires a hypothesis from a Chief Medical Officer, and relies on a data scientist to manually engineer features. This inherently caps what the model can see at what humans already thought to look for.
These models work. They just don’t scale, and they’re blind to subtle signals that unfold over time. To see what no one thought to look for, you need a model that learns directly from the journey itself.
Teaching a model to read the patient’s timeline
We took the Transformer architecture, the same architecture that powers today’s large language models, and taught it to read patient journeys the way an LLM reads language.
Because the model learns directly from the full sequence, there’s no need to manually engineer features. You just need to give the model examples of patients that you’re looking for, and the model works out the patterns that predict those patients. That collapses deployment from six-to-nine months down to about one, and the model surfaces signals no analyst would have thought to pre-specify.
Two things make patient journeys harder than language, and the architecture handles both. Timing matters — how far apart two events are, whether they happen in parallel, the gap between a diagnosis and the first claim for treatment. Context matters even more. The same clinical event can mean very different things depending on what surrounds it, and the model reads all of it at once.
What’s a transformer model, actually?
Transformers are the architecture behind tools like ChatGPT and Claude. They read sequences — words in a sentence, sentences in a paragraph — and learn which combinations carry meaning in context.
We apply the same idea to patient journeys. The “words” aren’t English. They’re clinical events: a diagnosis, a test, a prescription, a switch. The “sentence” is the patient’s full longitudinal journey.
Just as a word means different things depending on the sentence around it, a clinical event means different things depending on the journey around it. The model learns those patterns directly from the data — no one has to tell it what to look for.
Consider a liver function test. On its own, it’s routine, e.g. a doctor checking for alcohol-related damage, or monitoring a medication. But in the context of a patient already diagnosed with a rare ocular cancer, with a history of enucleation and escalating surveillance, that same test means something very different: the doctor is starting to think about metastasis. Same test, different context, entirely different meaning. A traditional model would register the lab result and miss the story. A transformer reads the whole sequence and sees what the test is really telling you.
That’s what “anticipatory” means in practice. Not a faster alert on the same observable event, but a read on the decision forming upstream of the event itself.
No “why,” no action
Most “predictive” tools end up underused: commercial teams can’t reallocate budget or redirect reps on a number that arrives without a why.
Every prediction we generate is accompanied by the underlying clinical context and patient signals that informed it—not just a score, but a transparent view into why the model believes a patient is at risk.
The model infers likely clinical intent by learning from what physicians do—the sequence of tests, treatments, and interventions across similar patients. It incorporates the tumor’s primary location, the natural history of the disease, and typical patterns of metastasis—where it tends to spread, and when.
Then for each individual patient, the model provides a clear rationale: for example, increased liver testing (aligned to common metastatic pathways), combined with clinical history such as a prior enucleation indicating higher-risk disease, signals elevated risk. This explanation is generated for every patient, one by one—enabling decisions grounded in clinical context, not a black-box score.
Validation works the same way. Before a single alert goes to the field, we run a retrospective time-travel validation. We build a window of historical data, pretending we deployed the model then, and checking what it predicted against what actually happened. The output is an honest read on expected accuracy before a team commits resources. In live production, the model often performs as well as, or better than, the retrospective run, because predicting and promoting to the right doctors begins changing behavior in the market itself. Which is exactly why the model retrains continuously.
The result is a prediction you can explain, audit, and improve on — which is the only kind a commercial team will ever act on at scale.
The anticipation advantage
The opportunities teams ask us about most sit at the core of commercial performance:
- Who’s on a path to diagnosis before the claim appears?
- Who’s ready to initiate therapy, and on what?
- Who is about to switch therapies?
- When will churn happen first?
Each of those is, structurally, the same problem, a decision forming upstream of the signal a traditional system would wait for. Each of them is solvable by a model that reads the full journey.
Because the architecture doesn’t depend on manual feature engineering, none of these is a six-month rebuild. And every prediction is sharpened with the inclusion of new data sources (ex. genomics, social determinants, prescriber characteristics, lab testing).
Commercial success in pharma isn’t won after the decision has been made. It’s won by seeing it coming. That’s the anticipation advantage, and it’s what a transformer-based architecture, validated performance, and transparent reasoning finally make real, at a pace and depth traditional analytics can’t match.
—————
Eric Chung is co-founder of Prospection, an anticipatory intelligence company helping pharmaceutical teams identify and act on patient and provider decisions before they happen.
