PGMs/causal models in biology

Short version: JEPA is not a probabilistic graphical model. It’s an energy-flavored predictive representation learner. Useful, yes. A PGM, no. But you can bolt PGM machinery on top of a JEPA backbone and get something that behaves like a causal graph instead of a vibes engine.

Where JEPA and PGMs actually differ

  • JEPA: learn embeddings by predicting masked targets from context in representation space; no normalized joint, no explicit conditional-independence semantics. Great invariances, terrible bedtime stories about likelihoods.
  • PGMs (à la Koller & Friedman): explicitly factorized distributions with edges that mean conditional independence, and inference you can explain without interpretive dance.

How to make JEPA play nice with PGMs on cell data

Use JEPA to learn a low-noise latent state, then fit a graphical model over that state using interventions to orient edges.

  1. Train a JEPA on scRNA-seqGeneJEPA gives you cell embeddings by predicting masked gene-set representations from visible genes. You get a denoised, biology-respecting state space.
  2. Learn an undirected skeleton on the latentFit a sparse inverse covariance (graphical lasso or Gaussian copula) on JEPA embeddings within condition blocks to get a Markov network. This captures conditional independences without hallucinating directions. Koller would approve of the math if not the name.
  3. Add direction with interventionsUse interventional structure learning (e.g., GIES/IDA/ICP variants) where “do(·)” comes from CRISPR or drug perturbations. Orient edges leaving perturbed nodes in accordance with causal sufficiency assumptions. Do this per cell context, then reconcile. (JEPA didn’t give you causality; the perturbations did.)
  4. Make it dynamic if you mustIf you have time courses or decent pseudotime, fit a dynamic Bayesian network on the JEPA latents. If your “time” is wishful thinking, don’t. Koller’s book has the DBN playbook; JEPA just gave you a saner state.
  5. Optional: embed the graph into the modelAdd a sparse graphical layer on top of the JEPA encoder and learn edges with an L1 penalty, or slap a CRF on protein/chromatin nodes when using multimodal data. You still won’t have a normalized joint unless you add one, but you’ll have factors with teeth.

What this buys you

  • Lower noise floor for PGM estimation (JEPA eats batch/library weirdness instead of your graph doing it).
  • Better OOD across labs/assays since the objective targets invariances, then the PGM captures dependencies on top.
  • Causal identifiability help from real interventions; JEPA alone is not causal anything. Add Perturb-seq/sci-Plex and now your arrows mean something. (Pairing with GeneJEPA is the 2025 recipe.)

And Daphne Koller in this picture?

Koller literally wrote the PGM bible, so the clean way to say it: JEPA is a front-end representation learner; the PGM is the back-end semantics. Insitro-style workflows lean on learned representations plus explicit causal/graphical structure where data allow it. If you want edges you can defend, follow the book’s identifiability and inference rules, not influencer-grade “world model” slogans.

Concrete stack that doesn’t waste your GPU

  • Backbone: GeneJEPA (pretrained weights exist).
  • Graph learning: sparse inverse covariance for skeleton; interventional GIES/ICP for arrows using CRISPR/drug datasets.
  • Checks: recover known TF→target motifs; test edge stability under bootstrap; verify v-structures flip as expected when you remove an intervention set.
  • Extras for aging/biostasis geeks: couple with flow-matching models for methylomes to get time-directed latent trajectories, then learn a DBN over those.

Bottom line: JEPA is the noise-eater and invariance machine. PGMs are the calculator that tells you what depends on what. Use both, and your “graph of the cell” stops being a Pinterest mood board and starts acting like a model.