Cathedral

In 2008, Noah Goodman, Vikash Mansinghka, Daniel Roy, Keith Bonawitz, and Joshua Tenenbaum published “Church: A Language for Generative Models”. The core idea was simple and powerful: if you can write a program that generates data by flipping coins, drawing from distributions, and branching on stochastic outcomes, you can condition that program on observations and recover a posterior distribution over everything you did not observe. The program is the model.

Church is a beautiful framework, and it has been central to a long line of computational cognitive science research from Tenenbaum’s group at MIT. The companion textbook, Probabilistic Models of Cognition, remains one of the best introductions to this way of thinking.

One of my favorite recent papers in that same lineage is “From Word Models to World Models” by Wong et al. The core idea is to have large language models generate probabilistic programs, then use Bayesian inference over those programs to do the actual reasoning. I find this a very compelling picture of how LLMs and symbolic world models might fit together.

If you want to reproduce that work today, though, the original software stack is a problem. Church itself is in Scheme, effectively inaccessible on modern systems (it was not easy to find the source), and the paper’s WebChurch path depends on tooling that is deprecated and unmaintained. That leaves an awkward gap between the elegance of the idea and the practical reality of running, modifying, and extending the examples.

Cathedral is an attempt to close that gap by bringing Church’s core design into Python’s scientific computing ecosystem. The goal is to make this style of probabilistic programming easy to write, easy to read, and practical in Python.

Cathedral

Cathedral’s core primitives are minimal:

  • A decorator for defining generative models
  • flip, sample, condition, observe, and factor for sampling and conditioning
  • mem for Church-style stochastic memoization
  • Multiple inference modes, from exact enumeration on small discrete models to importance sampling and MH

These compose with the full power of Python like higher-order functions, data structures, and other libraries. This enables defining models with stochastic control flow, recursion, and variable-structure traces. Here’s a simple example, close to the tug-of-war world model from the paper:

from cathedral import model, infer, mem, sample, flip, Normal

@model
def tug_of_war():
    strength = mem(lambda person: sample(Normal(0, 1)))
    lazy = lambda person: flip(0.25)

    def pulling(person):
        return strength(person) / 2 if lazy(person) else strength(person)

    def total_pulling(team):
        return sum(pulling(player) for player in team)

    def beat(team1, team2):
        return total_pulling(team1) > total_pulling(team2)

    return {
        "alice_sue_beat_bob_tom": beat(["alice", "sue"], ["bob", "tom"]),
        "alice_stronger_than_bob": strength("alice") > strength("bob"),
    }

posterior = infer(
    tug_of_war,
    condition=lambda r: r["alice_sue_beat_bob_tom"],
    num_samples=1000,
)
posterior.probability("alice_stronger_than_bob")  # > 0.5

Under the hood

Cathedral uses a trace-based execution model. Every call to sample() or flip() records a Choice object that holds the distribution, sampled value, and log-probability. All choices from a single execution form a Trace. Trace state is threaded implicitly using Python’s contextvars, and inference engines run the model repeatedly, collecting and weighting traces.

The Posterior object that comes back from infer() supports the analysis you’d expect: means, credible intervals, probability queries, and histograms, plus diagnostics like effective sample size, acceptance rates, and log marginal likelihood, as well as a bridge to ArviZ for certain model classes.

There’s also a scope-capture system that uses stack introspection to automatically build hierarchical trace visualizations, along with a trace_to_dot() function for Graphviz output. These are useful for debugging models.

What’s next

Cathedral currently supports four relatively simple inference methods: rejection sampling, importance sampling, single-site Metropolis-Hastings following Wingate, Stuhlmuller, and Goodman (2011), and exact enumeration for discrete models. The MH implementation handles structural changes correctly, but it still uses prior proposals, which mix poorly on continuous models with narrow posteriors. There is no gradient-based inference yet, and there are no variational or particle methods.

On the expressiveness side, Cathedral already implements Church’s core primitives and stochastic memoization, but it does not yet include some of the more advanced machinery from the original system, like constraint propagation, conservative trace updates, and exchangeable random primitives (XRPs).

The roadmap is to close that gap incrementally while keeping the model-authoring surface simple.

  1. Better MCMC: The highest-ROI next step is improving Metropolis-Hastings with custom proposals: Gaussian drift for continuous variables, swap-style proposals for discrete ones, and adaptive step-size tuning during burn-in.

  2. Gradient-based inference: The biggest remaining gap relative to systems like PyMC and Stan is Hamiltonian Monte Carlo. I would like to explore implementing gradient-based inference, likely starting with lightweight autodiff in pure Python before considering heavier backends.

  3. More efficient trace machinery: Beyond better proposals, Cathedral needs the classic performance ideas that made Church-like systems practical: exchangeable random primitives, constraint propagation for tightly conditioned models, and conservative trace updates so inference can avoid recomputing the entire program on every move. Basic execution could also take advantage of parallelism very easily.

Cathedral has been a lot of fun to build, and I’ve learned a ton. I’m only just starting to use it seriously, and I’m excited to push beyond the existing work with LLMs. That paper came out in 2023, and models have advanced so quickly since then that it’ll be fascinating to see what they can do with tools like this now.

Thanks to Eric Ma and Max Wall for many helpful and inspiring conversations on these topics. Thanks also, of course, to everyone who has developed software in this space and made it available. Cathedral is on GitHub and is MIT licensed. Please try it out, file issues, and contribute if you’re interested. I’d love to see what a community working on this could do.