Alignment compilers
The alignment compiler, which we discuss toward the end of this paper, is a class of technologies that translate system-level goals into signals that align local behavior toward those goals. An example of an alignment compiler is an NGDP futures market, which translates macroeconomic goals into signals that reliably induce participants in the market to align their behavior with those goals, thereby constructing those goals in the aggregate.
The term “alignment compiler” is inspired from the idea of the anatomical compiler, a hypothetical technology for converting morphological goals into signals that induce the cells to build that shape. Anatomical compilers are a kind of alignment compiler: they align the behaviors of cells with morphological goals.
From a dynamic systems perspective, an alignment compiler is a mechanism that shapes a dynamical system so that the desired macrostate becomes the natural attractor. More specifically, it maps a target macrostate into a structured environment of local interactions such that the desired macrostate emerges as an attractor of the system’s dynamics.
The appeal of an alignment compiler is that it offers an efficient means of controlling a system. Assembling a body shape and managing the macroeconomy are really hard to do. If we try to do it ourselves, we’d have to build really complex models that take a lot of work, and we’d still fail a lot of the time. Alignment compilers turn over all the hard work to the assembling system itself, drawing upon its own knowledge and competencies, effectively automating the process of achieving our goals.
How do alignment compilers work? One possibility is to make a giant lookup table associating every goal with a pattern of signals that align the parts of the system toward that goal. There are a few big problems with this method. First, the lookup table would be enormous, and filling it in would be a massive empirical undertaking. Second, the lookup table doesn’t exist in principle: a system will have many different ways of achieving an outcome, and many different patterns of signals will align the system with that outcome (degeneracy, multifunctionality, etc.). Third, agents and problems and environmental conditions are constantly changing, necessitating an anticipatory, allostatic mechanism, not a fixed list of responses.
Alignment compilers will have to find out, in real time, what signals achieve alignment from the system being aligned. I find it useful to think of alignment compilers as asking the following question to a system: What do you need us to do to get you to produce the outcomes that we want? The goal is to get the system not just to give you an answer but a true answer.
This is challenging for a couple of reasons. First, there are a lot of answers a system can give; only a tiny subset are true. Second, the system doesn’t know the true answer. It has to figure it out—the true answer is assembled, not referenced.
The trick is to arrange constraints that reduce degrees of freedom and thereby enable the system to produce the true answer. Markets, for example, do this by taking signals from the economic agents about the state of the system and feeding those signals back as constraints, forcing or enabling agents to form plans consistent with a larger body (e.g., a body of 8 billion people that spans the Earth). Prices aggregate dispersed information and expose inconsistencies between agents’ plans; these inconsistencies create opportunities for profit and loss, which in turn pressure agents to revise their behavior. Agents don’t just get to pick whatever plans they want; only plans that are mutually consistent with the larger system can persist. The market thus reduces the system’s degrees of freedom by eliminating incoherent configurations, enabling the collective construction of outcomes that reflect the knowledge of the entire system.
When the system is a market, the alignment compiler just needs to put some high-level task constraints, like a commitment to a futures contract, and the system will assemble behavior consistent with those constraints by getting all of the economic agents to supply information that in the aggregate is the system level answer to the question.
Alignment compilers work by encoding task constraints that realign the system toward a goal. The system in turn figures out the pattern of signals that gets the system components’ behavior consistent with that goal, with dishonesty punished by constraints on inconsistency. Then we just supply whatever the system tells us we need to do to get the desired outcome.


At a first glance, it seems that a challenge to extending the concept of an anatomical compiler to an AI alignment compiler is the heterogeneity of AI models. A specific instance of an anatomical compiler, I presume, would be purpose-built for a specific type of cellular network in a specific type of organism, consisting of a specific set of cell types and signaling pathways exchanging specific types of messages. Whereas there is a whole world of AI models beyond the headlining LLMs coming into existence with an endlessly wide variety of meanings symbolically encoded in the information being exchanged with other humans and AIs in each of their respective environments.
(Although to be complete, I could also mention here that the electricity supply and other elements of keeping the hardware running, and/or the economic tokens some of them are given with which to subcontract other digital services, could be implemented into AI models as meaningful signals, and this realm would be more homogeneous across them all.)
Are there any thoughts on this?
Has there been any thought given to whether an AI alignment compiler would be put into use while a model is being trained, or post-training (i.e., when it’s deployed), or both?