Sunday, June 2, 2019

Backward and Forward... Steps to Perception

Perception takes a database, learning, and attention.

We all know by now that perception is more than simply being a camera, getting visual input from the world. Cameras see everything, but they recognize nothing, conceptualize nothing. Perception implies categorization of that input into an ontology that makes hierarchical sense of the world, full of inter-relationships that establish context and meaning. In short, a database is needed- one that is dynamically adaptive to allow learning to slice its model of reality into ever finer and more accurate categories.

How does the brain do that? The work of Karl Friston has been revolutionary in this field, though probably not well-enough appreciated and admittedly hard for me and others not practiced in mathematical statistics to understand. A landmark paper is "A theory of cortical responses", from 2005. This argues that the method of "Empirical Bayes" is the key to unlock the nature of our mental learning and processing. Bayesian statistics seems like mere common sense. The basic proposition is that the likelihood of some thing is related to our naive model (hypothesis) of its likelihood arrived at prior to any evidence or experience, combined with evidence expressed in a way that can weight or alter that model. Iterate as needed, and the model should improve with time. What makes this a statistical procedure, rather than simple common sense? If one can express the hypothesis mathematically, and the evidence likewise, in a way that relates to the hypothesis, then the evaluation and the updating from evidence can be done in a mechanical way.

Friston postulates that the brain is such a machine, which studiously models the world, engaging in what statisticians call "expectation maximization", which is to say, progressive improvements in the in detail and accuracy of its model, driven by inputs from sensory and other information. An interesting point is that sensory input functions really as feedback to the model, rather than the model functioning as an evaluator of the inputs. We live in the model, not in our senses. The overall mechanism works assiduously to reduce surprise, which is a measure of how inputs differ from the model. Surprise drives both attention and learning.

Another interesting point is the relationship between inference and learning. The model exists to perform inference- that is the bottom-up process of judging the reality and likely causes of some event based on the internal model, activated by the input-drive attention. We see a ball fall down, and are not surprised because our model is richly outfitted with calculations of gravitation, weight, etc. We infer that it has weight, and no learning is required. But suppose it is a balloon that floats up instead of falling- a novel experience? The cognitive dissonance represents surprise, which prompts higher-level processing and downward, top-down alterations to the model to allow for lighter-than-air weights. Our inferences about the causes may be incorrect. We may resort to superstition rather than physics for the higher-level inference or explanation. But in any case, the possibility of rising balls would be added to our model of reality, making us less surprised in the future.
The brain as a surprise-minimizing machine. Heading into old age, we are surprised by nothing, whether by great accumulated experience or by a closure against new experiences, and thus reach a stable / dead state. 

This brings up the physiology of what is going on in the brain, featuring specialization, integration, and recurrent networks with distinct mechanisms of bottom-up and top-down connection. Each sensory mode has its specialized processing system, with sub-modules, etc. But these only work by working together, both in parallel, cross-checking forms of integration, and by feeding into higher levels that integrate their mini-models (say for visual motion, or color assignment) into more general, global models.
"The cortical infrastructure supporting a single function may involve many specialized areas whose union is mediated by functional integration. Functional specialization and integration are not exclusive; they are complementary. Functional specialization is only meaningful in the context of functional integration and vice versa."

But the real magic happens thanks to the backward connections. Friston highlights a variety of distinctions between the forward and backward (recurrent) connections:

Forward connections serve inference, which is the primary job of the brain most of the time. They are regimented, sparsely connected, topographically organized, (like in the regular striations of the visual system). They are naturally fast, since speed counts most in making inferences. On the molecular level, forward connections use fast voltage-gated receptors, AMPA and GABA.

Backward connections, in contrast, serve learning and top-down modulation/attention. They are slow, since learning does not have to obey the rapid processing of forward signals. They tend to occupy and extend to complimentary layers of the cortex vs the forward connecting cells. They use NMDA receptors, which are roughly 1/10 as fast in response as the receptors use in forward synapses. They are diffuse and highly elaborated in their projections. And they extend widely, not as regimented as the forward connections. This allows lots of different later effects (i.e. error detection) to modulate the inference mechanism. And surprisingly, they far outnumber the forward connections:
"Furthermore, backward connections are more abundant. For example, the ratio of forward efferent connections to backward afferents in the lateral geniculate is about 1 : 10. Another distinction is that backward connections traverse a number of hierarchical levels whereas forward connections are more restricted."

Where does the backward signal come from, in principle? In the brain, error = surprise. Surprise expresses a violation of the expectation of the internal model, and is accommodated by a variety of responses. An emotional response may occur, such as motivation to investigate the problem more deeply. More simply, surprise would induce backward correction in the model that predicted wrongly, whether that is a high-level model of our social trust network, or something at a low level like reaching for a knob and missing it. Infants spend a great deal of time reaching, slowly optimizing their models of their own capabilities and the context of the surrounding world.
"Recognition is simply the process of solving an inverse problem by jointly minimizing prediction error at all levels of the cortical hierarchy. The main point of this article is that evoked cortical responses can be understood as transient expressions of prediction error, which index some recognition process. This perspective accommodates many physiological and behavioural phenomena, for example, extra classical RF [receptive field] effects and repetition suppression in unit recordings, the MMN [mismatch negativity] and P300 in ERPs [event-related potentials], priming and global precedence effects in psychophysics. Critically, many of these emerge from the same basic principles governing inference with hierarchical generative models."

This paper came up due to a citation from current work investigating this model specifically with non-invasive EEG methods. It is clear that the model cited and outlined above is very influential, if not the leading model now of general cognition and brain organization. It also has clear applications to AI, as we develop more sophisticated neural network programs that can categorize and learn, or more adventurously, develop neuromorphic chips that model neurons in a physical rather then abstract basis and show impressive computational and efficiency characteristics.

No comments: