Chapter 2: The Mathematics of Limited Capacity
Bounded Rationality
Herbert Simon did not arrive at his 1971 observation about attention scarcity through a single epiphany. The insight grew from decades of work in organizational behavior, decision theory, and cognitive psychology that began well before the digital age made information abundance a daily experience for most people. Simon's central contribution was reframing the rational actor model that dominated economics. The standard model assumed that decision-makers could process all available information, evaluate all possible alternatives, and select the optimal choice. Simon argued this was a useful fiction at best. Real people operate under what he called bounded rationality: their cognitive capacity is limited, their information is incomplete, and their time is finite.
The implications were immediate and wide-ranging. If decision-makers cannot optimize, they must use other strategies. Simon identified satisficing as the primary alternative. A satisficer does not search until finding the best possible option. Instead, they establish an aspiration level—a threshold of acceptability—and select the first alternative that meets it. The strategy is not lazy. It is rational under constraints. Searching exhaustively costs time and cognitive effort, and those costs must be weighed against the marginal benefit of finding a marginally better option.
The aspiration level itself is dynamic. When satisficing repeatedly produces satisfactory results, the threshold rises. When it fails, the threshold drops. This adaptive mechanism means that satisficing is not a static heuristic but a learning process that calibrates itself against environmental feedback. The system converges toward a balance between search cost and outcome quality that reflects the actual conditions the decision-maker faces.
Gerd Gigerenzer's work on the adaptive toolbox extended Simon's framework into cognitive psychology. Gigerenzer documented the specific heuristics that humans employ under different conditions and argued that these heuristics are not cognitive bugs but evolved adaptations. The recognition heuristic, for instance, leads people to choose the option they recognize over one they do not, a strategy that performs remarkably well in environments where recognition correlates with quality or popularity. The take-the-best heuristic uses a single most-valid cue to make a decision, ignoring all other available information. In many real-world situations, this approach outperforms complex statistical models that incorporate more data, because the additional data introduces noise that outweighs its signal.
The adaptive toolbox concept reframed the relationship between intelligence and environment. Intelligence is not an abstract capacity that exists independently of context. It is a fit between cognitive strategies and the structure of the environment in which they are deployed. A heuristic that works well in one environment may fail catastrophically in another, and the adaptive toolbox contains multiple strategies that the cognitive system can deploy depending on environmental cues. This is ecological rationality: intelligence measured by success in real environments rather than by adherence to abstract logical norms.
The less-is-more effect, a counterintuitive finding from this research, demonstrates that having more information or more computational power does not always produce better decisions. In some environments, simpler strategies with fewer inputs outperform complex ones because they avoid overfitting to noise. The effect is not limited to human cognition. Machine learning research has documented the same phenomenon, where simpler models generalize better than complex ones when training data is limited or noisy. The parallel is instructive. Both biological and artificial systems face the same tradeoff between model complexity and generalization, and both can benefit from restraint.
Simon's work also addressed organizational attention directly, extending bounded rationality from individuals to institutions. Organizations, he argued, face the same attention constraints as individuals, but at a larger scale. An organization's attention is the sum of the attention of its members, filtered through structures, procedures, and communication channels. This makes organizational attention an institutional bottleneck. Problems that require coordinated attention across multiple departments compete with each other for a finite resource, and the organizational structure determines which problems win. The agenda-setting function of management, in Simon's framework, is fundamentally an attention-allocation problem.
Computational Intractability
The bounded rationality framework connects directly to a problem that computer scientists have grappled with independently: computational intractability. Many optimization problems are NP-hard, meaning that the time required to find an optimal solution grows exponentially with problem size. For problems of realistic scale, exhaustive search is not just impractical. It is effectively impossible within any reasonable timeframe. The traveling salesman problem, which asks for the shortest route visiting a set of cities and returning to the origin, illustrates this. With 10 cities, there are roughly 3.6 million possible routes. With 20 cities, the number exceeds 100 trillion. The search space explodes faster than any computer can explore it.
Humans face a version of this same problem. The number of possible decisions available in any situation is combinatorially vast, and evaluating all of them exhaustively would exceed biological processing capacity. Satisficing, in this light, is not merely a psychological tendency. It is a computationally necessary strategy. The cognitive system cannot afford to explore the full decision tree, so it prunes aggressively using heuristics that approximate good-enough solutions without exhaustive search.
The frame problem in AI provides a striking parallel to bounded rationality. First identified in the 1960s, the frame problem asks how an intelligent agent can determine which facts about its environment are relevant to a given action and which can be safely ignored. A robot tasked with moving a box from one room to another needs to know that the box will move with it. It also needs to know that the color of the walls will not change. But enumerating all the things that will not change is as computationally expensive as enumerating all the things that might. The agent must find a way to infer relevance without checking every possibility.
The frame problem has resisted a general solution for decades. Different approaches have been proposed, from default logic to causal modeling, but none has produced a complete answer. The difficulty reflects the same constraint that produces bounded rationality in humans. An agent with finite computational resources cannot evaluate all possible frames. It must select a frame, and the selection itself is an act of attention. What the agent attends to becomes its frame. What it does not attend to falls outside the frame, and the agent cannot reason about what it does not attend to.
Combinatorial explosion appears throughout AI architecture. In transformer models, the quadratic O(n²) attention cost is a form of combinatorial explosion: every token must be compared to every other token, and the number of comparisons grows with the square of sequence length. Sparse attention mechanisms like Longformer address this by limiting each token's attention to a local window plus a few global tokens, effectively pruning the attention graph in the same way that satisficing prunes the decision tree. The structural similarity is not coincidental. Both strategies respond to the same fundamental problem: the space of possibilities grows faster than the resources available to explore it.
Information Theory Foundations
The mathematical language for describing these constraints comes from information theory, a field founded by Claude Shannon in the 1940s. Shannon entropy quantifies the uncertainty in a random variable, or equivalently, the amount of information required to describe it. A variable with high entropy is unpredictable and requires more bits to encode. A variable with low entropy is predictable and compresses well. Channel capacity, the maximum rate at which information can be transmitted reliably through a channel, is determined by the channel's bandwidth and its signal-to-noise ratio. The human brain and AI models are both channels with finite capacity, and both must manage the tradeoff between information throughput and noise.
Rate-distortion theory extends Shannon's framework to lossy compression. It asks: what is the minimum rate at which information can be transmitted while keeping distortion below an acceptable threshold? The answer defines the fundamental limit of compression for a given tolerance of error. This is precisely the problem that attention solves. Attention is a compression mechanism that discards or downweights information whose loss would not significantly degrade the system's performance. The compression is lossy, but the loss is bounded.
Minimum description length (MDL) provides another perspective. MDL states that the best model of a dataset is the one that minimizes the total description length of both the model and the data encoded using that model. A model that is too simple cannot compress the data well. A model that is too complex requires too many bits to describe itself. The optimal model balances these two costs. Attention mechanisms in both biological and artificial systems implement this tradeoff. They build internal models that compress incoming information by extracting regularities and discarding noise, and the quality of the model depends on finding the right balance between simplicity and fidelity.
Kolmogorov complexity takes the compression idea to its logical extreme. It defines the complexity of an object as the length of the shortest computer program that can produce it. Objects with low Kolmogorov complexity are highly compressible. Objects with high Kolmogorov complexity are incompressible, meaning they contain no detectable patterns that a shorter program could exploit. Solomonoff induction, a theoretical framework for prediction, uses Kolmogorov complexity to assign prior probabilities to hypotheses: simpler hypotheses (shorter programs) receive higher priors. The framework is mathematically elegant but computationally intractable, because determining Kolmogorov complexity is itself an undecidable problem. The intractability reinforces the bounded rationality lesson. The optimal strategy is theoretically defined but practically unreachable, so systems must use approximations.
Fisher information measures how much information an observable random variable carries about an unknown parameter. The Cramér-Rao bound establishes a lower limit on the variance of any unbiased estimator, and Fisher information determines how close that bound can be approached. Systems that allocate attention to maximize Fisher information are, in a precise sense, gathering data most efficiently. Research in neuroscience has found evidence that the visual system allocates attention in ways that approximate Fisher information maximization, directing resources to stimuli that would most reduce uncertainty about relevant parameters. The brain, it appears, solves an information-theoretic optimization problem in real time.
The Free Energy Principle
Karl Friston's free energy principle offers a unifying framework that connects information theory, predictive processing, and attention. The principle states that any self-organizing system must minimize its free energy, a quantity that bounds the system's surprise or prediction error. A system that minimizes free energy maintains itself within a set of preferred states, avoiding surprises that would indicate it is drifting into conditions incompatible with its continued existence.
Variational inference provides the computational machinery. The system maintains a probabilistic model of the world and uses it to predict incoming sensory data. When predictions match observations, free energy is low. When predictions fail, free energy rises, and the system must either update its model or take action to change the environment so that predictions come true. Active inference describes this second option: the system acts to make the world conform to its predictions, not just to make its predictions conform to the world.
Markov blankets define the boundary between a system and its environment. They are the set of variables through which the system interacts with the world, receiving sensory input and producing motor output. Everything inside the Markov blanket is the system's internal state. Everything outside is the environment. The free energy principle applies to any system with a Markov blanket, from single cells to brains to organizations.
Surprise minimization is the engine of the free energy principle. A system that encounters high surprise repeatedly will eventually fail, because surprise indicates that the system's model of the world is inadequate. Attention, in this framework, functions as precision weighting. The system estimates how reliable each prediction error is and allocates processing resources accordingly. High-precision prediction errors receive more attention, because they carry more information about model inadequacy. Low-precision errors are downweighted, because they are likely noise. This precision-weighted attention mechanism is mathematically equivalent to the relevance-scoring performed by transformer attention heads, where attention weights determine how much each token influences the next prediction.
The free energy principle reframes perception as controlled hallucination. The brain does not passively receive sensory input. It actively generates predictions and compares them to input, using the comparison to refine both the predictions and the model that generates them. What we perceive is the brain's best guess about what is causing the sensory data, constrained by the data but primarily driven by internal predictions. Attention determines which prediction errors are trusted enough to update the model and which are dismissed as noise.
Relevance Realization
The theoretical frameworks described so far establish that attention is a compression and relevance-selection mechanism constrained by finite capacity. But they do not fully explain what relevance means or how systems realize it. This is the domain of relevance realization, a framework developed by John Vervaeke that bridges bounded rationality, predictive processing, and the phenomenology of meaning.
Vervaeke distinguishes several modes of pre-propositional knowing that operate beneath explicit, declarative knowledge. Participatory knowing is the direct, embodied engagement with a situation. Perspectival knowing is the situated viewpoint from which a situation is experienced. Procedural knowing is the know-how embedded in skills and habits. These modes of knowing are not reducible to facts or propositions. They are the infrastructure that makes propositional knowledge possible. When you ride a bicycle, you are not running a mental simulation of physics equations. You are deploying procedural knowing that was acquired through practice and is now embedded in your sensorimotor system.
Opponent processing describes the dynamic tension between complementary cognitive modes. Focusing and scattering are one pair: focusing narrows attention to a specific target, while scattering broadens it to detect patterns and connections. Feature and gestalt are another: feature processing analyzes individual components, while gestalt processing perceives the whole. Exploitation and exploration form a third: exploitation refines known strategies, while exploration searches for new ones. Effective cognition requires cycling between these modes rather than fixating on one. Attention pathologies often involve a breakdown in this cycling, where the system gets stuck in a single mode. Doom scrolling, for instance, can be understood as a failure to transition from exploration back to exploitation, a perpetual search for the next information patch without ever settling into deep processing.
The salience landscape is the structure of attentional priorities that guides opponent processing. Figure-ground separation determines what stands out from the background. Affordance detection, a concept from James Gibson's ecological psychology, identifies the action possibilities that an environment offers. A chair affords sitting. A handle affords grasping. These affordances are not properties of objects alone. They emerge from the relationship between the object and the perceiver's capabilities. Attentional capture occurs when bottom-up salience overrides top-down goals, pulling attention toward novel or intense stimuli regardless of task relevance. Digital platforms exploit this mechanism by designing interfaces that maximize attentional capture through bright colors, motion, and variable reward schedules.
The connection to the meaning crisis is where Vervaeke's framework becomes most provocative. He argues that the contemporary sense of meaninglessness is not primarily an emotional or spiritual problem. It is a relevance failure. When the systems that help us realize relevance break down, we lose the ability to distinguish what matters from what does not. The information environment has become so saturated that relevance signals are drowned in noise. Meta-relevance frameworks, the higher-order structures that help us evaluate which relevance signals to trust, have eroded under the weight of contradictory information sources. Wisdom, in Vervaeke's account, is the mastery of relevance realization: the ability to navigate complex information environments by maintaining functional relevance structures.
Connectedness to reality is the positive counterpart to the meaning crisis. It describes a state in which relevance realization is functioning well, and the individual experiences a coherent relationship between their internal models and the external world. This state is not passive. It requires active maintenance through practices that preserve the integrity of relevance structures. Meditation, deliberate practice, and information hygiene all serve this function by strengthening the cognitive systems that realize relevance.
The Bridge to Biology
The theoretical frameworks covered in this chapter establish attention as a mathematically constrained relevance-selection mechanism. Bounded rationality shows that exhaustive optimization is impossible under capacity limits, and satisficing provides the practical alternative. Information theory quantifies the compression problem and defines the fundamental limits of efficient representation. The free energy principle unifies prediction, perception, and action under a single optimization objective. Relevance realization explains what relevance means and how systems maintain functional relevance structures in saturated environments.
These frameworks describe the logic of attention without specifying the machinery that implements it. The logic is domain-general. It applies to any system that must select relevant information from an overabundant environment under finite capacity constraints. But the implementation differs radically between silicon and carbon. The next chapter will examine the biological machinery in detail: the neural circuits, metabolic processes, and evolutionary adaptations that implement these principles in the human brain. Cognitive load theory, predictive coding, and information foraging will provide the framework for understanding how the brain allocates its finite attentional resources in real time. The theoretical foundations laid here will serve as the lens through which the biological evidence is interpreted, and the biological evidence will in turn reveal implementation details that the theory alone cannot predict.
Comments (0)
No comments yet. Be the first to share your thoughts!