Whole cell simulation modeling

AlexKChen · March 27, 2026, 8:17am

does Daphne’s koller’s PGMs work well in response to KGs of biology (zitnik)? Or do they not mix at the same level of resolution? [the same thing as trying to model the exact sources/sinks/reaction rates in ALL parts of ALL cells of just H2o2, or full perturb-seq analysis on everything that affects H2O2+MG+every other major ROS, its formation, or its sinks?]. it’s too computationally intractible to be useful across most people?

But you can still compress the questions for every single ROS in a more compressible format than the ROS textbooks have written

4:03 AM

This is a really interesting question and I want to push back on some of the framing before agreeing with parts of it.

Where the Koller-Zitnik interface actually breaks down

The issue isn’t really that PGMs and biological KGs “don’t mix” — it’s that they operate at fundamentally different levels of epistemic commitment. Koller’s PGM work (and I mean the full scope, not just the textbook — her work on template models, relational Markov networks, the Daphne-specific contribution of making structure learning tractable via sufficient statistics) assumes you can specify the conditional independence structure. The whole machinery of d-separation, message passing, variational inference — it all presumes you know, or can learn, the graph topology with some fidelity.

Zitnik’s KGs (and the broader biomedical KG ecosystem — SPOKE, Hetionet, Bioteque, etc.) are… honestly more like curated assertions with confidence-weighted edges? They encode “protein X interacts with protein Y” or “drug Z treats disease W” but the semantics of those edges are wildly heterogeneous. An edge in a KG might mean direct physical binding, or it might mean “someone observed a correlation in a GWAS.” Koller’s formalism needs those to be different things. A KG mostly doesn’t care.

So the mismatch is: PGMs need precise conditional probability specifications at the edges, and KGs give you topological proximity in a vaguely-defined relational space. You can’t just slap a factor graph on top of a KG and expect the inference to mean anything biologically precise.

But here’s where I think you’re being too pessimistic

The H2O2 analogy is instructive but maybe misleading. Yes — if you tried to write down the full PGM for every source, sink, reaction rate of H2O2 across every compartment of every cell type, you’d get something computationally intractable and also just… wrong, because the parameters would be underdetermined by like 6 orders of magnitude. Same deal if you tried to expand to all ROS + methylglyoxal + every downstream adduct.

But that’s the wrong aspiration. Nobody actually needs the full joint distribution P(every ROS species | every cellular context). What people need are conditional queries: “given this perturbation, what happens to this downstream readout, in this cell type.” And THAT is compressible.

The compression argument — where I think you’re right but maybe for the wrong reasons

You’re right that the textbook treatment of ROS biology is weirdly incompressible in its current form. Halliwell & Gutteridge is like 900 pages and yet somehow you still can’t answer basic quantitative questions from it. But I’d argue that’s a failure of representation, not a statement about the underlying complexity.

The thing is — most of the reaction network for ROS is actually pretty low-rank if you think about it right. There are maybe 5-10 major production pathways (mitochondrial complex I/III, NOX family, xanthine oxidase, peroxisomal oxidases, a few others), maybe 5-8 major scavenging systems (catalase, GPx family, Prx family, SOD isozymes feeding into those), and the rest is combinatorics of compartmentalization + regulation. The effective dimensionality is way lower than the textbook makes it seem.

So you could probably build something like a… structured latent variable model? Not a full PGM in the Koller sense, but something that learns a compressed representation of the regulatory logic from perturb-seq-scale data, with the KG providing structural priors on which latent dimensions should couple to which.

This is basically what the Lotfollahi / Theis group has been circling around with CPA and scGen and their successors — learning perturbation response manifolds without trying to write down the full mechanistic model.

Where I’d push back on you specifically

You frame this as “too computationally intractable to be useful across most people” — but intractable for what? For a grad student trying to predict what happens when they knock out GPX4 in their specific cell line? That’s actually pretty tractable with modern foundation models trained on perturbation data. For someone trying to build a complete mechanistic model of redox biology? Yeah, intractable and also probably unnecessary.

The question is whether the compressed representation you’re imagining would actually help people ask better questions, or whether it’d be a neat intellectual object that doesn’t outperform just… training a large model on all the perturb-seq data and querying it directly. I genuinely don’t know the answer to that, and I think you should be honest about not knowing either rather than assuming the compression is inherently valuable.

What’s the actual downstream question you want this compression to answer?

does ANYONE mix koller’s PGMs with KGs? or use them for modeling drug effects? PGMs are not super widely used bc observability for them is bad. Michael Levin might model the diffusion rates of electroceuticals better, he has a kit of modules to do the REAL TIME PDEs, whereas KGs don’t even produce any implementation for PDE diffusion rates (+sources and sinks) of molecules like adderall

4:14 AM

Okay so let me think about this honestly rather than pattern-matching to what sounds smart.

Does anyone actually combine Koller-style PGMs with KGs in practice?

Honestly… not really? Not in the way you’d expect given how much both fields get cited. There’s some work that uses Bayesian networks on top of biological knowledge graphs — like the PARADIGM model from Josh Stuart’s group (which Koller was actually involved in!) for cancer pathway inference. That’s probably the closest real example. And some of the causal inference on GWAS stuff uses DAG-based reasoning with KG-derived structure. But it’s a surprisingly thin literature for two fields that should theoretically complement each other perfectly.

And I think the reason is what you’re gesturing at — the observability problem. PGMs need you to either observe the variables or have strong enough priors to marginalize over them. In biology, the ratio of latent to observed variables is just… brutal. You’re trying to infer a joint distribution where you can measure maybe 5% of the relevant quantities at any given time, and the measurement process itself perturbs the system. Koller’s machinery is beautiful but it was developed in a regime where you could at least imagine observing most of your variables (computer vision, NLP, medical diagnosis with defined symptom sets).

On drug effect modeling specifically

PGMs for drug effects are almost nonexistent in any serious pharmacology pipeline and I think that’s telling. The pharma industry tried Bayesian networks for adverse event prediction in like the 2010-2015 era and mostly abandoned it. What won out was either mechanistic ODE/PDE models (QSP — quantitative systems pharmacology) for specific pathways, or brute-force ML on omics readouts.

The reason is exactly your point — a drug like adderall doesn’t just have a “state” in some graph, it has a spatiotemporal concentration profile that evolves according to actual PDEs with tissue-specific diffusion coefficients, transporter kinetics, metabolism rates. A PGM node saying “adderall: high/medium/low” is throwing away precisely the information you’d need to predict anything useful about dosing, timing, or tissue-specific effects.

The Levin comparison is interesting but I want to complicate it

You’re right that Levin’s group (and really the broader bioelectricity modeling community — Pietak & Levin’s BETSE simulator specifically) actually does solve the real-time PDEs for ion diffusion, gap junction coupling, voltage gradients across tissues. That’s a genuine advantage — they produce quantitative predictions about what voltage pattern you’ll see at time T given intervention X.

But here’s my pushback: Levin’s models work because bioelectricity is a relatively low-dimensional system compared to full biochemistry. You’ve got maybe 10-20 ion species that matter, a handful of channel/pump types per cell, and gap junctions with measurable conductances. The PDE system is stiff but solvable. If you tried to do the equivalent for “every molecule affected by adderall” — dopamine, norepinephrine, all their metabolites, every downstream signaling cascade, receptor desensitization kinetics, gene expression changes — you’d be writing down a PDE system with thousands of coupled species and you simply don’t have parameters for most of them.

So Levin’s approach works in his domain not because PDE-based modeling is universally better than KGs, but because bioelectricity happens to sit in a sweet spot where the relevant variables are mostly observable and the physics is well-characterized.

What I think is actually true here

KGs are honestly kind of a cop-out? They’re what you build when you can’t write down the dynamics but you still want to say something structural. They’re useful for hypothesis generation and link prediction but they don’t explain anything in the mechanistic sense. And PGMs in Koller’s sense are what you build when you have clean conditional independence structure and decent observability — which biology mostly doesn’t give you.

The gap you’re identifying is real: there’s no good formalism that bridges between “here’s a knowledge graph saying these things interact” and “here’s the actual PDE system governing how drug X diffuses and acts.” QSP tries to be that bridge for specific pathways but it doesn’t scale.

I’m less convinced than you seem to be that compression of ROS biology into some intermediate representation solves this though. What would you actually DO with a compressed representation that you can’t do with either the KG (for hypothesis generation) or a fitted PDE model (for quantitative prediction in a specific narrow context)? I keep coming back to this question and I don’t have a satisfying answer. Do you?

AlexKChen · March 27, 2026, 8:56am

Both, annoyingly.

They do mix, but not at the same default resolution. Zitnik-style biomedical KGs are built to integrate heterogeneous entities and relations across many biological scales. Koller’s PGMs are built to express probabilistic dependence, and her own relational extensions, especially probabilistic relational models, explicitly add objects, attributes, relations, and even uncertainty over links. More recent biomedical work does the same kind of fusion with probabilistic KGs and causal KGs. So the sane framing is not “PGM versus KG.” It is “KG as scaffold and evidence layer, PGM or SCM as uncertainty-and-intervention layer.” (Zitnik Lab)

Where the marriage gets ugly is your “exact H2O2 sources/sinks/reaction rates in all parts of all cells” scenario. PrimeKG-like graphs are mostly qualitative and mesoscale. They tell you what is connected, associated, or perturbed across proteins, pathways, phenotypes, diseases, and drugs, not the full local kinetics of peroxide handling in each compartment over time. The causal-KG literature says this pretty directly: KGs and causal models remain partly disconnected because KGs encode facts, while causal models need formal probabilistic and interventional semantics. Current ROS modeling is still much narrower: recent H2O2 work is hepatocyte-specific ODE modeling, ROS reviews still talk about only a handful of mechanistic models, and whole-cell reviews say only a handful of whole-cell models exist and current simulators are slow enough that more complex systems need accelerated HPC. (PMC)

So no, they do not naturally mix at the same resolution if by “resolution” you mean exact source/sink/rate accounting everywhere. The bigger problem is not only compute. It is identifiability. Exact inference in arbitrary Bayesian networks is NP-hard, and even approximate inference is NP-hard in general. Whole-cell modeling reviews also emphasize the mismatch between available data and what is needed to parameterize and validate complex models. A universal model spanning H2O2, MG, the broader reactive-stress network, all compartments, all perturbations, and person-to-person variation would blow up in both state space and unmeasured parameters, because biology refuses to become tidy just to make our lives pleasant. (arXiv)

Perturb-seq points the same way. Exhaustive combinatorial screens are impossible with current methods because the number of possible interactions grows exponentially. The response from the field has been compressed Perturb-seq, which assumes effects are sparse and modular and tries to recover them from a much smaller number of composite samples. That is not some hacky compromise. It is the actual clue: tractability comes from factorization, not from pretending you can enumerate everything like an especially deranged spreadsheet. (Nature)

So your last sentence is basically right. You can compress ROS questions better than textbooks usually do. Textbooks are arranged around molecules, discovery history, and pathology chapters because humans adore narrative clutter. Models should be arranged around reusable modules and query types. Koller’s module networks are very relevant here: they group many variables into modules that share parents and parameters, which reduces complexity and often generalizes better than one giant flat Bayesian network. For redox biology, that suggests modeling modules like sources, sinks, regeneration capacity, sensors, damage outputs, and repair programs, instead of every reaction everywhere. (Stanford AI Lab)

A workable compressed representation would track, for each reactive-stress branch and compartment:

source module
sink or regeneration module
shared cofactors
regime
observable readouts
intervention handles
uncertainty

Then most useful questions collapse to four types:

Dominance: which source or sink dominates here?
Threshold: what saturates first?
Coupling: which branches compete for shared cofactors?
Control: what minimal perturbation moves the system from signaling to damage, or back?

That third one is where MG belongs. A good compressed question is not “simulate every MG and H2O2 reaction in every cell.” It is “under what cell states does MG handling start tying up enough GSH to impair peroxide buffering?” MG detox through the glyoxalase system uses GSH, and MG exposure can transiently deplete available GSH, so that is a biologically real coupling and a finite query rather than a doomed omniscience project. (PMC)

So the verdict is:

As an exact universal simulator across most people: no, too intractable and too underdetermined.
As a compressed probabilistic atlas: yes, very useful.

The best stack is usually: KG for structure and evidence, hierarchical PGM/SCM for uncertainty and interventions, mechanistic ODE or stochastic simulation only for a few subsystems where kinetics actually matter, and compressed perturbation experiments to fill the most informative gaps. That gets you something scientifically usable instead of a monument to computational masochism.