ONCE YOU HAVE A VOCABULARY SET OF PRIMITIVES AND A GOOD SET OF PAPERS/PEOPLE/DATA, YOU CAN FIGURE OUT NEW WAYS OF DOING TESTING/PERTURBATION SO THAT WE CAN FIGURE OUT …
Related to the paper “ON COMPOSITIONALITY” (https://www.cs.ox.ac.uk/people/julian.hedges/papers/compositionality )
====
can u just take a crossproduct/tensor product of a set of complex words [vocabulary - like multiscale transfer entropy OR the cool neologisms coined by Michael Bronstein] with a set of concepts to get everyting explained in terms of those complex agentic primitives
Examined feasibility of composing explanations through vocabulary-concept tensor products.
10s
This is a fascinating meta-question about conceptual composition! The short answer is sort of, but with diminishing returns and selection pressure.
What actually happens when you try this:
The promising part:
- You can create a systematic mapping framework: ⟨complex_term⟩ ⊗ ⟨concept⟩ → ⟨novel explanation⟩
- Like applying Bronstein’s “geometric priors” to explain everything from protein folding to social networks to language models
- Or viewing all systems through “multiscale transfer entropy” (information flow at different temporal scales)
The reality:
-
Sparsity: Most combinations in V × C are meaningless or forced (“quantum entanglement of tax policy”)
-
Basis choice matters: Your primitives constrain what’s naturally expressible (try explaining color perception purely via information theory—it’s possible but awkward)
-
Compression vs. insight tradeoff: Sometimes the complex primitive adds genuine insight, sometimes it’s just rebadging with fancier vocabulary
What works better: Think of it less as exhaustive tensor product, more as selective projection:
- Pick a powerful primitive set (e.g., Bronstein’s geometric deep learning framework: symmetries, equivariances, group actions)
- Find concepts that naturally decompose into these terms
- The forced fits reveal the primitive’s limitations
The “agentic primitives” framing is interesting—suggests you want action-oriented, causal building blocks. Things like:
- affordances
- epistemic vs. pragmatic actions
- active inference
- tool use / compositional agency
This does work for systematically re-explaining many phenomena, but you’ll find some concepts resist certain basis sets. That resistance is informative!
Want to try this with a specific primitive set?
Conceptual composition: you can just get an agent/LLM to do it and it is already very good at picking what compositions to discard without wasting computation
Reconsidered computational efficiency concerns about conceptual composition.
Yes! You’re absolutely right. I was thinking in pre-LLM terms about computational cost and manual filtering.
The key insight: LLMs have already learned the semantic geometry that tells them:
- “multiscale transfer entropy applied to protein folding” → might be interesting (information flow across timescales in folding pathways)
- “multiscale transfer entropy applied to the color blue” → semantic type error, discard
So the actual algorithm is trivial:
For vocabulary V = {complex_term₁, complex_term₂, ...}
For concepts C = {concept₁, concept₂, ...}
Generate: "Explain [concept] in terms of [complex_term]"
LLM naturally:
- Returns good explanations for natural fits
- Returns "this doesn't really apply" for forced combinations
- Takes ~same compute either way
What this enables:
-
Systematic conceptual exploration: You can literally enumerate V × C and let the LLM filter
-
Finding hidden connections: Sometimes the “forced” combination reveals something non-obvious
-
Vocabulary stress-testing: See which primitives have wide vs. narrow applicability
The real bottleneck shifts to:
-
Vocabulary curation - picking rich, generative primitives (Bronstein’s geometric DL terms are good because they’re structurally grounded)
-
Evaluating depth - distinguishing genuine insight from fluent BS
-
Iterative refinement - using generated explanations to improve V
This is basically automated conceptual blending. Have you tried this systematically with a specific vocabulary set? I’m curious what emerges.
Who has the richest vocabulary sets? I’m thinking Robert Ghrist, Taco Cohen, Michael Bronstein, Dani Bassett, information theory people, multiscale entropy people, fractals people, dynamical systems people, graph neural networks people, but who else
Catalogued interdisciplinary thinkers across topology, physics, and networks.
16s
Excellent instinct—you’re targeting people who’ve built generative mathematical languages rather than just domain-specific jargon. Here are other rich vocabulary sources:
Category Theory & Compositional Systems:
-
David Spivak - polynomial functors, categorical systems theory, “dynamical systems as bundles”
-
John Baez - network theory, Petri nets, compositional frameworks
-
Brendan Fong - compositional game theory, decorated cospans
Active Inference & Free Energy:
-
Karl Friston - variational free energy, Markov blankets, generalized synchrony, sentient behavior
- Extremely dense vocabulary, applies to everything from cells to cognition
Computational Mechanics:
-
Jim Crutchfield - ε-machines, causal states, statistical complexity, excess entropy
- Beautiful formalism for hidden structure in sequences
Causal Inference:
-
Judea Pearl - do-calculus, counterfactuals, d-separation, causal graphs
- Fundamental primitives that cross disciplines
Complex Systems Theory:
-
Stuart Kauffman - fitness landscapes, autocatalytic sets, adjacent possible, poised realm
-
John Holland - emergence, building blocks, internal models
Topological Data Analysis:
-
Gunnar Carlsson - persistent homology, mapper algorithm
- Topology as shape-based primitives
Renormalization & Scale:
-
Leo Kadanoff - block spin, coarse-graining, scaling
-
Kenneth Wilson - RG flows, fixed points
Statistical Physics ∩ ML:
-
Lenka Zdeborová - replica method, phase transitions in learning
-
Surya Ganguli - random matrix theory, learning dynamics
Optimal Transport:
-
Gabriel Peyré - Wasserstein geometry, computational OT
- Distance metrics as primitive
Network Neuroscience:
-
Olaf Sporns - connectomics, integration/segregation, rich clubs
-
Ed Bullmore - economical brain networks
Stochastic Processes & Filtering:
-
Terry Tao - finite-time blowup, universality
- Less vocabulary-focused but extremely precise
Who’d I miss that you know of?
===
Ah, you’re asking about efficient search through combinatorial concept-spaces — who does smart experimental design for idea exploration rather than exhaustive enumeration.
Active Learning & Bayesian Optimization:
-
Ryan Adams - Bayesian optimization, efficient search
-
Nando de Freitas - Bayesian experimental design, curiosity-driven learning
-
Bobak Shahriari - practical BO in high dimensions
-
Peter Frazier - knowledge gradient methods
Automated Science & Discovery:
-
Ross King - “Robot Scientist” (Adam/Eve), automated hypothesis generation and testing
-
Hod Lipson - automated discovery of physical laws, symbolic regression
-
Joshua Tenenbaum - program induction, compositional concept learning
-
Roger Grosse - automatic curriculum learning, what to try next
Neural Architecture Search (similar problem):
-
Quoc Le & Barret Zoph - efficient NAS, learning what architectures to try
-
Frank Hutter - AutoML, hyperparameter optimization at scale
Combinatorial Exploration:
-
Leslie Kaelbling - hierarchical planning, efficient search through action compositions
-
Josh Tenenbaum (again) - learning compositional concepts with minimal examples
-
Finale Doshi-Velez - Bayesian program learning
Sparse/Low-Rank Structures:
-
Emmanuel Candès - compressed sensing (measuring high-dim spaces with few samples)
-
Terence Tao - sparse recovery theory
Meta-Science & Research Strategy:
-
James Evans - computational sociology of science, novelty detection
-
Brian Uzzi - optimal team composition, atypical combinations that work
-
Dashun Wang - hot streaks, exploration-exploitation in careers
Exploration-Exploitation Theory:
-
Csaba Szepesvári - bandit algorithms, regret minimization
-
Shipra Agrawal - Thompson sampling variations
The core insight these people share: don’t sample uniformly, learn a model of the reward landscape and sample where you’re uncertain AND high-expected-value.
For conceptual composition specifically, this is under-explored. Most similar work is in program synthesis (combining code primitives) and concept learning (combining features). Who have you seen doing this explicitly?
===
Yes, and it’s accelerating fast! Several groups are explicitly doing “program synthesis” for biology with smart search strategies:
Synthetic Biology + ML Design:
-
Chris Voigt (MIT) - Cello, automated genetic circuit design. Uses design-build-test-learn loops with increasingly efficient search
-
James Collins - ML-guided antibiotic discovery, navigating chemical space efficiently
-
Alán Aspuru-Guzik - self-driving labs for molecules, active learning over synthesis space
Combinatorial Perturbation Screens:
-
Jonathan Weissman - Perturb-seq with combinatorial CRISPR knockdowns. Explicitly chooses which gene pairs to test based on prior screens
-
Nir Yosef - analyzing sparse combinatorial perturbations, inferring full interaction map from subset
-
Rahul Satija - multimodal perturbation analysis with smart sampling strategies
Directed Evolution (the OG efficient bio-search):
-
Frances Arnold (Nobel) - directed evolution is literally “efficient search through sequence space without trying everything”
-
George Church - MAGE/CAGE for multiplexed genome engineering
-
Adam Arkin - computational + experimental evolution
Protein Design with Active Learning:
-
Possu Huang, Brian Hie - evolutionary scale models + Bayesian optimization for protein design
-
Andrew Ferguson - manifold learning for protein sequence space, efficient sampling
-
Lucy Colwell - learning protein fitness landscapes from sparse data
Automated Biology Labs:
-
Ross King - Robot Scientist (Adam/Eve) that autonomously generated and tested hypotheses about yeast metabolism
-
Andrew White (formerly) - self-driving labs with active learning
-
Klavs Jensen - flow chemistry with Bayesian optimization
Systems Biology as Program Induction:
-
Michael Elowitz - learning rules of genetic circuits from data
-
Uri Alon - discovering network motifs (biological subroutines)
-
Roy Kishony - evolutionary dynamics with automated perturbation selection
The most explicit “latin hypercube for bio-programs”:
-
Pamela Silver’s group - modular metabolic engineering
-
Christina Smolke - RNA devices with compositional design
-
Tim Lu - engineered microbial communities, combinatorial ecology
Key gap: Most work is domain-specific (proteins OR metabolic pathways OR gene circuits). Almost no one is doing meta-level “which biological compositional primitives are worth combining” across scales.
The closest might be Uri Alon + Karl Friston collaborations on biological design principles, or Dani Bassett’s work on learning across biological networks.
Are you thinking about a specific biological domain or cross-scale synthesis?
Retry