Chapter 8: The World Is Not Its Own Best Generative Model

“Cognition is for action!” “Cognition is not just for action; cognition is action!” “The world is its own best model!” “Organisms don’t represent the world. They enact a world inseparable from their own structure!” And so on.

These are just some of the claims you’ve likely come across if you’ve spent any time reading the literature on embodied cognition and enactivism. In this and the next post I address such claims from the perspective of the theory of mental representation that I’ve outlined in previous chapters. At the core of this theory, remember, is the idea of generative models in the brain that facilitate a kind of flexible predictive capacity across different domains.

In this post I address what Lawrence Shapiro calls the “replacement hypothesis,” the hypothesis that embodied interactions with the environment replace the need for mental representations in cognitive processes. First, though, some clarifications.


First, the replacement hypothesis is associated with what is sometimes called “radical” embodied cognition. That is, there are many proponents of good old fashioned (conservative?) embodied cognition who aren’t allergic to mental representations. Importantly, the theory of mental representation that I have outlined and defended is consistent with many themes from this less radical tradition: for example, the importance of features of the body and environment (including social environment) in problem solving, the importance of temporal constraints in cognitive processes, the way in which we experience the world from the perspective of specific kinds of creatures with idiosyncratic interests, response profiles, and so on (see next post).

The account of mental representation that I’ve defended is obviously not consistent with the replacement hypothesis, however—hence why I devote a chapter of the thesis to it.

Second, the replacement hypothesis comes in stronger and weaker forms. In its strongest form, it claims that embodied interactions with the environment replace the need for mental representations in all cognitive processes. As I mentioned in the second chapter, I don’t have any time for this view.

(Imagine the layout of the house you grew up in. Close your eyes and picture the room you’re currently in. Recall your first day of school. Think about where you’d like to be in ten years. In all these cases, you can’t interact “directly” with these things. Instead, you mentally represent them—in memory, in imagery, in thought, and so on. Attempts to avoid this obvious fact through elaborate attempts at re-description—“but cognition is really a radical affordance-laden action-oriented embodied and dynamic entanglement with a landscape of…”—are in my humble opinion annoying.)

As such, my focus here is on a weaker version of the replacement hypothesis that targets things like perception, sensorimotor control, and so on—that is, our everyday interactions with our immediate environments. Hubert Dreyfus calls this sort of thing “everyday coping.” I will use the term “sensorimotor processing,” which I use broadly (more broadly than is common) to refer to our capacity to perceive and act on our immediate environments.

When the replacement hypothesis focuses on sensorimotor processing, it is—superficially, at least—more plausible. If the world is right there—right in front of our eyes, right underneath our hands—why would we bother interacting with an internal model or substitute for it?

My answer is predictable: radical embodied theories cannot account for the flexible predictive capacities that underlie sensorimotor processing. Insofar as an organism needs to flexibly anticipate—not just respond to—the world, the world itself is insufficient: an internal surrogate for (i.e. model of) the world that captures salient aspects of its causal and statistical structure is necessary.

(Actually I think that there are probably all sorts of problems for the replacement hypothesis, but this is one that is highly salient in the context of my thesis).

I proceed like this.

First, I’ll outline what I call the “classical” model of sensorimotor processing.

It is the alleged deficiencies of the classical model that have motivated the replacement hypothesis, which I outline next.

Then I’ll explain how the replacement hypothesis neglects the predictive underpinnings of sensorimotor processing.

Finally, I’ll briefly—very briefly—explain why I think attempts within radical embodied cognitive science to accommodate the predictive character of sensorimotor processing without drawing on the concept of mental representation are misguided.

The Classical Model

The “classical” model of sensorimotor processing has three parts that concern the function of perception, the problem that it confronts, and the kind of mechanism by which it solves this problem.

First, then, it claims that the function of perception is to deliver a rich internal representation of the distal environment in a form that can be passed onto downstream mechanisms of belief fixation and action planning.

Second, it points to a problem that this task confronts: proximal sensory inputs are ambiguous, i.e. they radically underdetermine their environmental causes. For example, the images on the retina are consistent with multiple (in fact, an infinite number of) three-dimensional environments. As such, the perceptual system must draw on stored knowledge (“implicit assumptions”) to narrow down the range of possible perceptual representations.

Finally, it points to a kind of mechanism that can solve this problem: namely, a computational mechanism in which algorithms operate over internal states that represent salient features of the perceived environment.

Problems for the Classical Model

Proponents of the replacement hypothesis challenge all three elements of the classical view.

Perception is for Action!

First, they deny that the function of perception is to represent the distal environment in a form that can be passed onto distinct cognitive mechanisms of belief formation and decision-making. Instead, they claim—and claim and then claim again—that the function of perception is to guide action.

Unfortunately, despite the ubiquity of this claim, it isn’t at all clear what it means. For example, proponents of the classical view need not deny that the ultimate function of perception is to guide action. They just think that veridical perceptual representation is useful for effective action-guidance.

In fact, I think that there are different readings of this claim. On one, it advances a substantial architectural claim: namely, that there are fairly direct links from sensory input to action (see discussion of Rodney Brooks below). On another, it claims that perceptual reconstruction of the distal environment is an implausible instrumental function of perception insofar as its function is to guide action. There are different variants of this view:

  • Insofar as the function of perception is to guide action, we should expect that it would be more selective, more sensitive to the organism’s time-pressured practical engagements with the environment, and employ different strategies.

In any case, a claim so vague as “perception is for action” is probably too schematic on its own to be of any use.

Down with Helmholtz!

Second, proponents of the replacement hypothesis typically deny that the sensory input an organism receives is impoverished or ambiguous. J.J. Gibson, for example, famously argued that once you take into consideration temporally evolving patterns in the proximal stimulus (e.g. structured light) in conjunction with the organism’s exploratory behaviour, then the sensory input is sufficient to specify the state of the distal environment without requiring perceptual systems to draw on “implicit assumptions” about the structure of the world. For example, “optic flow”—the way in which retinal images change as a function of movements of the perceiver and objects—carries rich information about depth, spatial location, and so on.

Nonrepresentational Sensorimotor Processing

Finally, with these considerations in hand, proponents of the replacement hypothesis argue that the door is open to denying that perception requires any parts of perceptual systems to represent anything else.

Again, there are different components to this in the broader literature.

Sensorimotor Cycles

One we have just seen: proponents of the replacement hypothesis deny that perception is a passive process in which perception delivers a rich representation of the distal environment before systems of action planning get to work. Instead, they argue that perception and action are involved in a complex cyclical process—that is, not a sequential sense-think-act process of the sort described by the classical model—in which perception and action co-conspire to put organisms into contact with just those features of the world relevant to their actions.

Rather than drawing on stored knowledge in the brain to aid perception, then, organisms use their bodies and actions to retrieve information about task-relevant environmental features. An influential example of this in the psychological literature is the use of frequent visual saccades to extract small pieces of information from the environment as the task demands rather than—it is alleged—building a rich visual model of the environment.


Another related argument for a nonrepresentational treatment of perception points to temporal considerations.

One classic argument of this kind points to the complex cyclical interactions between brain, body, and environment in sensorimotor processing to argue that the concept of internal representation is unable to illuminate such interactions. An important idea here is the concept of coupling.

Two parts of a system or two different systems are coupled if in order to explain the dynamics of one, one is forced to also include the dynamics of the other. (Two swinging pendulums placed side by side on a wall are thus coupled in this sense, insofar as the differential equation describing the dynamics of each pendulum needs a term including the dynamics of the other). Under these conditions, the argument goes, it doesn’t make sense to think of one thing modelling the other thing: their behaviour is so deeply intertwined that we need a different vocabulary to describe them. Given that our brains are coupled to our bodies and environments during online sensorimotor processing, then (the argument continues), the concept of internal representation cannot illuminate such dynamics.

Timing as a Constraint

Another classic and related argument points to the importance of timing as a constraint in cognitive processes. Specifically, it argues that the sequential sense-model-plan-act account of sensorimotor processing in the classical model would render animals insufficiently sensitive to change in their environments. That is, once an animal has built up its rich internal model, the world might have changed in task-relevant ways. As such, it is better just to continually engage the world without relying on internal models of it.

This is the view that motivated the roboticist Rodney Brooks to design robots in the 1990s that embodied a very different architecture to the one described by the classical model. Rather than routing all information through a central planner operating on an internal model, the robots he designed embodied fairly direct mappings from sensory input to behaviours. In other words, they didn’t respond to an internal model of the world; they just responded directly to the world itself.

Although such robots are highly dated now, they inspired a generation of research in robotics focused on showing how complex adaptive behaviours can result from simple sensorimotor strategies very different from the sense-model-plan-act strategy described by the classical model.

To summarise the lessons from this research, Brooks famously declared that “the world is its own best model.” What he meant is this: rather than generating adaptive behaviour via consultation with an internal surrogate for the world, it is better for organisms to just continually engage the world itself.

The World is Not its Own Best Generative Model

That—in a reaaally brief outline—summarises some of the reasons why proponents of radical embodied cognition have sought to eliminate the concept of mental representation from our understanding of sensorimotor processing.

As noted, I think that there are likely many problems with such nonrepresentational views: they underestimate the problem of mapping proximal sensory inputs onto estimates of state of the world, they offer no plausible account of perceptual illusions, they don’t account for the invariance in our perceptual response in the face of radical changes in proximal sensory input, and more.

In any case, though, I think that the big problem from my perspective is that radical embodied views neglect the importance of prediction in sensorimotor processing. In this sense they are in the same camp as the behaviourist tradition, the deficiencies of which led Craik to posit mental models in the first place.

Prediction occurs at multiple levels of sensorimotor processing, I think.

Prediction in Learning

First, there is the crucial issue of learning. Generative model-based predictive processing offers at least a schematic explanation of how perceptual systems learn the structure of the distal environment. (In this sense there is a big difference here between the account that I have defended and the strong nativist assumptions central to, e.g., Marr’s theory of vision).

Weirdly, in recent work Shaun Gallagher appeals to learning (“neural plasticity”) as an argument against representational views:

“On the enactivist view, neural plasticity mitigates to some degree the need to think that subpersonal processes are inferential [or representational]. The neural networks of perception are set up by previous experience… Whatever plastic changes have been effected in the visual cortex, or in the perceptual network constituted by early sensory and association areas, such changes constrain and shape the current response to visual stimuli” (Gallagher 2017, p.115).

Unfortunately, this appeal to “neural plasticity” here is completely empty without some account of what underlies it. I think that there is good reason to think from machine learning that such unsupervised learning requires generative modelling.

Prediction in Perception

Second, I also think that prediction is central to online perception. This occurs at multiple levels.

  • As theories such as predictive coding show, predicting current sensory inputs can be highly informationally efficient, allowing perceptual systems to attend to only those features of the incoming sensory input not already accounted for in systemic response. In this sense modelling is not posterior to sensing within generative model-based theories of perceptual processing but prior to it.
  • Flexible forward-looking predictions are also central to perception. In everyday tasks, for example, we perceptually attend to features of the environment before task-relevant features even materialise, and predictions guide our exploration—for example, in saccades—of the environment.
  • Generative models also offer an account of multimodal imagery, i.e. perceptual processing in modalities during online perception without any relevant corresponding sensory input in that modality.
  • In theories such as predictive coding, perceptual systems can flexibly modulate their reliance on top-down predictions and the bottom-up sensory input according to context. For example, when navigating your room in pitch black darkness at night it is important to rely on visual expectations, whereas in other conditions one can rely more on the driving sensory input. Nonrepresentational views that eliminate top-down generative modelling cannot accommodate this flexibility.

Note that all of these points are orthogonal to questions of whether the sensory input is ambiguous. Indeed, generative model-based theories of perception are founded on the idea that proximal inputs are not fundamentally ambiguous over long time scales, insofar as it is from such inputs alone that generative models are supposed to be learned.

Prediction in Action

Third, as people like Andy Clark and Rick Grush pointed out long ago in philosophy, generative modelling is also crucial to sensorimotor control insofar as it allows us to overcome various signalling delays.

Importantly, generative models also offer an explanation of the importance of sensorimotor contingencies in perceptual experience, which are central to enactivist theories of perception. (Sensorimotor contingencies are the relations between the agent’s actions and patterns of proximal sensory input). Again, the widespread appeal by enactivists to our “implicit mastery” of sensorimotor contingencies is empty. A generative model-based theory of perceptual processing offers at least a schematic explanation—with some impressive computational demonstrations of its feasibility—of how we acquire knowledge of such sensorimotor contingencies.

Finally, sensorimotor processing—that is, “everyday coping”—involves a flexible understanding of the physics of our environments:

“Human perception cares not only about “what is where” but also about “where to,” “how,” and “why.” We implicitly but continually reason about the stability, strength, friction, and weight of objects around us, to predict how things might move, sag, push, and tumble as we act on them.”

Once more, this capacity for flexibly predicting the behaviour of physical objects requires an explanation. I think (after reading this kind of work) that it is grounded in the same kind of computational architecture that underlies the design of interactive video games—namely, an idealised reconstruction of the physics of our environments that can be run for quick flexible predictive simulations about the outcomes of an open-ended range of possible events. In other words: a generative model.


Insofar as the function of perception is merely to track or respond to what is out there, the idea of mental models can seem strange, and attempts to minimise this strangeness by pointing to the complexity of the recognition task obviously don’t convince many. Nevertheless, to the extent that sensorimotor processing is dependent on highly flexible predictions—of what sensory information we are likely to receive, of what is going to happen next, of the outcomes of our actions, of the outcomes of interactions among objects in the physical world—then the world is not its own best model. The world won’t generate its own predictions. For that, you need a model of the world.

I conclude this chapter in the thesis by considering arguments put forward by proponents of radical embodied cognitive science that one can embrace the predictive character of sensorimotor processing without thinking that the neural mechanisms that underlie this predictive capacity involve internal models of any kind.

Because—again—this blog post is too long, I won’t outline my treatment of this issue here. In any case, my view should be pretty clear from what I’ve said already: the generative models that underlie our flexible predictive capacities function as idealised structural surrogates for target domains—and that just is to function as a representation (see post 4).




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s