Saturday, October 17, 2020

Vision Bubbles up into Perception

Visual stimuli cause progressive activation of processing stages in the brain, and also slower recurrent, rebounding activation, tied to decision-making.

Perception is our constant companion, but at the same time a deep mystery. How are simple, if massively parallel, sensory inputs assembled into cognition? What is doing the perceiving? Where is the theater of the mind? Modern neuroscience is steadily chipping away at these questions, whether one deems them "hard", unscientific, or theological. The brain is a very physical, if inaccessible, place, forming both the questions and the answers.

A recent paper made use of MEG, magnetoencephalography of electrical events in the brain, to track decision-making as a stepwise process and make some conclusions about its nature, based on neural network modeling and analogies. The researchers used a simple ambiguous decision task, presenting subjects with images of numbers and letters, with variations in between. This setup had several virtues. First is that letters and numbers are recognized, at a high level, in different regions of the brain. So even before the subjects got to the button pressing response stage, the researchers could tell that they had cognitively categorized and perceived one or the other. Second, the possibility of ambiguity put a premium on categorization, and the more ambiguous the presented image, the longer that decision would take, which would then hypothetically be reflected in what could be observed here within the brain.

The test presented. An image was given, and the subject had to judge whether it was a number, a letter, and what number/letter it was.

The researchers used MRI solely to obtain the structures of the subject's brains, on which they could map the dynamic MEG data. MEG is intrinsically much faster in time scale than, say, fMRI, allowing this work to see millisecond scale events. They segmented the observed stages of processing by some kind of statistical method into 1- position and visibility of the stimulus; 2- what  number or letter it is; 3- whether it is a number or a letter; 4- how uncertain these decisions are; 5- the motor action to press the button responding with all these subjective reports. In the author's words, "We estimated, at each time sample separately, the ability of an l2-regularized regression to predict, from all MEG sensors, the five stimulus features of interest." These steps / features naturally happen at different times, perception being necessarily a stepwise process, with the scene being processed at first without bias in a highly parallel way, before features are picked out, categorized, and brought to consciousness, in a progressively less parallel and more diffuse process.

Activation series in time. The authors categorized brain activity by the kind of processing done, based on how it was responding, and mapped these over time. First (A, bottom) comes the basic visual scene processing, followed by higher abstractions of visual perception. Finally (A, top) is activation in the motor area corresponding to pressing a response button. C and D show more detail on the timing of the various processes.

It is hard to tell how self-fulfilling these analyses are, since the researchers knew what they were looking for, binned the data to find it, and then obtained estimates for when & where these binned processes happened. But assuming that all that is valid, they came up with striking figures of cognitive progression, shown above. The initial visual processing signal is very strong and localized, which is understandable first because the early parts of the visual system (located in the rear of the brain) are well understood, and second because those early steps take in the whole scene and are massively parallelized, thus generating a great deal of activity and signal in these kinds of analysis. This processing is also linear with respect to the ambiguous nature of the image, rather than sigmoidal or categorical, which is how higher processing levels behave. That is because at this level, the processing components are just mindlessly dealing with their pixel or other micro-feature of the scene, regardless of its larger meaning. The authors state that 120 milliseconds is on average where the rough location (left or right of the screen) is decided by this low level of the visual system, for instance. By 500 milliseconds (ms), this activity has ceased, the basic visual analysis having been done, and processing having progressed to higher levels. The perception of what the letter is, (225 ms) and whether it is a letter or number (370 ms) happens roughly around the same time, and varies substantially, presumably due to the varying abiguities of what was presented.

At around 600 ms, the processing of just how uncertain the image is seems to peak, a sort of meta-process evaluating what has gone on at lower levels. And finally, the processing directing the motor event of pressing the button to specify what the subject has decided about the item identity or category (just a binary choice of left or right index finger) comes in on average also around 600 ms. Obviously, there is a great deal going on that these categorizations and bins do not capture- the MEG data, though fast, are very crude with respect to location, so only the broadest distinctions can be attempted.

The authors go on to press a further observation, which is that the higher level processing is relatively slow, and processing devoted to these separate aspects of the perceptual event takes longer and trails off slower than one might expect. Comparing all this to what is known from artificial neural networks, they conclude that it could not possibly be consistent with strictly feed-forward processing, where one level simply does its thing and communicates results up to the next level. Rather, the conclusion, in line with a great deal of other work and theorization, is that recurrent processing, making representations that are stable for some time some of these levels, is required to explain what is going on. 

This is hardly pathbreaking, but this paper is notable for the clarity with which the processing sequence, from visual stimulus to motor response, is detected and presented. While working in very broad brushstrokes regarding details of the scene and of its perception, it lays out a clear program for what comes next- filling in the details to track the byways of our thoughts as we attend consciously to a red flower. This tracking extends not only the motor event of a response, but also to whatever constitutes consciousness. This paper did not breathe the word "conscious" or "consciousness" at all, yet a video is provided of the various activations in sequence showed substantial prefrontal activity in the 450 ms range after image presentation, constituting a sixth category in their scheme that deserves a bit more attention, so to speak.

  • A high-level talk about how blood supply in the brain responds to neural activity.
  • The Pope weighs in on climate change. But carries none of the weight himself.
  • On the appalling ineffectiveness of the flu vaccine. It makes one wonder why we are waiting for the phase III trials of the coronavirus vaccines.
  • US to Taliban: "Please?"
  • A day at the Full Circle Ranch.

No comments: