Thursday, November 15, 2012

stack puzzle

Okay, I’ve been wondering for a while whether or not something is a valid question – a good question or a bad question. It is related to a few entries I’ve written here in the past year (esp. this and this), and to a paper that I’m about to get ready for submission.

The question: are the percepts contributed by different layers or modules of visual processing perceived as embedded within one another, or as layered in front of or behind one another?

Such percepts could include brightness, location and sharpness of an edge, its color, its boundary association; color and shape and texture of a face, its identity, its emotional valence, its association with concurrent speech sounds; scale of a texture, its orientation, its angle relative to the frontal plane, its stereoscopic properties.

All of these, and more, are separately computed properties of images as they are perceived, separate in that they are computed by different bits of neural machinery at different parts of the visual system hierarchy. Yet, they are all seen together, simultaneously, and the presence of one implies another. That is, to see an edge implies that it must have some contrast, some color, some orientation, some blur; but this implication is not trivial. That is, a mechanism that senses an edge does not need to signal contrast or color or orientation or scale; the decoder could simply interpret the responses of the mechanism as saying ‘there is an edge here’. To decode the orientation of an edge requires that many such mechanisms exist, each preferring different orientations, and that some subsequent mechanism exists which can discriminate the responses of one from another, i.e. the fact that the two properties are both discriminable (edge or no; orientation) means that there must be a hierarchy, or that there must be different mechanisms.

So, whenever something is seen, the seeing of the thing is the encoding of the thing by many, many different mechanisms, each of which has a special place in the visual system, a devoted job – discriminate orientation, discriminate luminance gradients, discriminate direction of motion, or color, etc.

So, although we know empirically and logically that there must be different mechanisms encoding these different properties, there is no direct perceptual evidence for such differences: the experience is simultaneous and whole. In other words, the different properties are bound together; this is the famous binding problem, and it is the fundamental problem of the study of perception, and of all study of subjective psychology or conscious experience.

This brings us to the question, reworded: how is the simultaneity arranged? From here, it is necessary to adopt a frame of reference to continue discussion, so I will adopt a spatial frame of reference, which I am sure is a severe error, and which is at the root of my attempts so far to understand this problem; it will be necessary to rework what comes below from different points of view, using different framing metaphors.

Say that the arrangement of the simultaneous elements of visual experience is analogous to a spatial arrangement. This is natural if we think of the visual system as a branching series of layers. As far as subjective experience goes, are ‘higher’ layers in front of or behind the ‘lower’ layers? Are they above or below? Do they interlock like... it is hard to think of a metaphor here. When do layers, as such, interlock so that they form a single variegated layer? D* suggested color printing as something similar, though this doesn’t quite satisfy me. I imagine a jigsaw puzzle where the solution is a solid block, and where every layer has the same extent as the solution but is mostly empty space. D* also mentioned layers of transparencies where on each layer a portion of the final image – which perhaps occludes lower parts – is printed; like the pages in the encyclopedia entry on the human body, where the skin, muscles, organs, bones, were printed on separate sheets.

But after some thought, I don't think these can work. An image as a metaphor for the perceptual image? A useful metaphor would have some explanatory degrees of freedom; one set of things that can be understood in one way, used to understand something different in a similar way. Where do we get by trying to understand one type of image as another type of image? Not very far, I think. The visual field is a sort of tensor: at every point in the field, multiple things are true at the same time, they are combined according to deterministic rules, and a unitary percept results. Trying to understand this problem in terms of a simpler type of image seems doomed to fail.

So, whether or not there is a convenient metaphor, I think that the idea of the question should be clear: how are the different components of the percept simultaneously present? A prominent part of psychophysics studies how different components interact: color and luminance contrast, or motion and orientation, but my understanding is that for the most part different components are independently encoded; i.e. nothing really affects the perceived orientation of an edge, except perhaps the orientations of other proximal (in space or time) edges.

Masking, i.e. making one thing harder to see by laying another thing in proximity to it, is also usually within-layer, i.e. motion-to-motion, or contrast-to-contrast. Here, I am revealing that my thinking is still stuck in the lowest levels: color, motion, contrast, orientation, are all encoded together, in overlapping ensembles. So, it may well be that a single mechanism can encode a feature with multiple perceptual elements.

Anyways, the reason why I wonder about these things is, lately, because of this study where I had subjects judge the contrast of photographic images and related these judgments to the contrasts of individual scales within the images. This is related to the bigger question because there is no obvious reason why the percept contrast of a complex, broadband image should correspond to the same percept contrast of a simple spatial pattern like a narrowband wavelet of one type or another. This is where we converge with what I have written a few months ago: the idea of doing psychophysics with simple stimuli is that a subject’s judgments can be correlated with the physical properties of the stimuli, which can be completely described because they are simple. When the stimuli are complex and natural, there is a hierarchy of physical properties for which the visual system is specifically designed, with its own hierarchy, to analyze. Simple stimuli target components of this system; complex stimuli activate the entire thing.

It is possible that when I ask you to identify the contrast – the luminance amplitude – of a Gabor patch, you are able to do so by looking, from your behavioral perch, at the response amplitude of a small number of neural mechanisms which are themselves stimulated directly by luminance gradients, which are exactly what I am controlling by controlling the contrast of the Gabor. It is not only possible, but this is the standard assumption in most contrast psychophysics (though I am suspicious that the Perceptual Template people have fuzzier ideas than this, I am not yet clear on their thinking – is the noisiness of a response also part of apparent magnitude?).

It is also possible that when I ask you to identify the contrast of a complex image, like a typical sort of image you look at every day (outside of spatial vision experiments), you are able to respond by doing the same thing: you pool together the responses of lots of neural mechanisms whose responses are determined by the amplitude of luminance gradients of matched shape. This is the assumption I set out to test in my experiment, that contrast is more or less the same, perceptually, whatever the stimulus is.

But, this does not need to be so. This assumption means that in judging the contrast of the complex image, you are able to ignore the responses of all the other mechanisms that are being stimulated by the image: mechanisms that respond to edges, texture gradients, trees, buildings, depth, occlusions, etc. Why should you be able to do this? Do these other responses not get in the way of ‘seeing’ those more basic responses? We know that responses later in the visual hierarchy are not so sensitive to the strength of a stimulus, rather they are sensitive to the spatial configuration of the stimulus; if you vary how much the configuration fits, you will vary the response of the neuron, but if you vary its contrast you will, across some threshold, turn the neuron on and off.

I don’t have a solution; the question is not answered by my experiment. I don’t doubt that you can see the luminance contrast of the elements in a complex scene, but I am not convinced that what you think is the contrast is entirely the contrast. In fact, we know for certain that it is not, because we have a plethora of lightness/brightness illusions.

No progress here, and I'm still not sure of the quality of the question. But, maybe this way of thinking can make for an interesting pitch at the outset of the introduction of the paper.

No comments:

Post a Comment