I need to procrastinate slightly more productively, so here is a short essay relating some of my thoughts on
visual consciousness.
For years now, I've understood visual experience or consciousness (experience is easier to say and write, and has less near-meaning baggage, so let's continue with that term) as having two components:
1. The image. A part of vision is
direct, which means that when you see an object, it is true to say that what you see is
the thing itself, or at least the light reflected/emitted by that object (this similar to the idea of the 'optic array'). This is a difficult position to hold, but I think it is a necessary default. The alternative, which is definitely more popular these days, is to say that what you see is entirely a
representation of the thing itself, instantiated in the brain. This sort of idealism is attractive because the brain is obviously a self-contained system, and because experience also seems to be self-contained, and because every aspect of experience seems to have a neural correlate. If I say that vision involves processes or structure outside the brain, I have to explain why we
don't see what we don't see; why don't I see what
you see, for example?
It seems to me that in placing the contents of consciousness somewhere in the physical world, there are two possible null hypotheses: either everything is absolutely centralized, completely contained within the brain, or everything is absolutely external, completely outside the brain. The second account is rare these days (see Gibson), as the only job it leaves for the brain is sorting out of responses to visual experiences. It seems clear that much of vision actually does occur within the brain, and I'll get to that in part 2, below. Now, these null hypotheses: that everything is internal is an
objective hypothesis, based on e.g. a scientist's observations that the brain is correlated with experience; that everything is external is a
subjective hypothesis, based on e.g. my observations that what seems to be in the world is actually there, i.e. that my sensations are always accurate.
Since visual experience is a subjective process which cannot be observed, I like to stick to the subjective null hypothesis: everything is external unless shown otherwise. Immediately on stating this hypothesis, we can start to make a list of the components of visual experience which are surely
neural.
2. The brain. Let's start with the subjective null hypothesis: everything you see is there, in the world. Just a little thought proves that this can't be true: faces are a great example. Look at two faces, one of a person you know well - your sister or brother, maybe - and one of a strange that you've never seen before. There, in the faces, you see a difference that you can't deny, because one seems to have an identity and the other does not. This difference isn't purely cognitive or emotional, either, because one will easily make the admission that the face of his sister
is his sister. Seeing her face, he will say, "That is her!" Clearly, however, the identity is not in the face - it is in the observer.
If this isn't a satisfying example, color perception must be. Color is not a property of images, it is a construct of the brain - this is not difficult to show, either with the proof that identical wavelength distributions can yield different color percepts in different conditions ('color constancy'), or with the inverse proof that different wavelength distributions can yield identical color percepts ('metamers'). We understand color as a brain's capacity to discriminate consistently between different (simultaneous or asynchronous) distributions of visible radiation. It is something that exists only in the observer.
These are easy, but it does get harder. Consider depth perception. In a scene, some things are nearer or further from you, but there is nothing in the images you sense that labels a given point in the scene as being at a particular depth. There is information in the scene that can be
used by the observer to
infer depth. So, depth is another part of the brain's capacity to interpret the image, but it is not a part of the scene. This is a more more difficult step than with faces or colors, and here's why: whereas a face's identity, or a light's color, is plainly not a property of the world itself, we
know that the world is three dimensional, and that objects have spatial relationships; and, we know that what we see as depth in a scene informs us as to these spatial relationships. However, we then make the mistake of believing that visual
depth is the same as
space; on reflection, however, we can begin to understand that they are not the same.
Depth is an neural estimate of
space based on image information.
Let's keep going. Spatial orientation is another good one: 'up' and 'down' and 'left' and 'right' are, in fact, not part of space. I've already
made my complaint about this one: spatial orientation is created by the brain.
If we keep going like this, what do we have left? What is there about visual experience that is not in some way created by the brain? How can I state that there is an 'external' component to vision?
The only feature of vision, it seems, that is not generated by the brain is the internal spatial organization of the image, the positional relationships between points in the image - what in visual neuroscience is recognized as
retinotopy. Spatial relationships between points in the visual field do not need to be recovered, only preserved. A person's ability to
use this information can be lost, certainly, through damage to the dorsal stream (simultanagnosia, optic ataxia, neglect, etc). This does not mean that the visual experience of these relationships is lost, only that it is unable to contribute to behavioral outputs. I think it is a mistake - commonly made - to assume that a patient with one of these disorders is unable to
see the spatial relationships that they are unable to respond to. Assigning to the brain the
generation of positional relationships needs evidence, and I know of none. A digital, raster image based system would be different, of course: a video camera detects images by reading them into a long, one-dimensional string of symbols. Positional relationships are lost, and can only be recovered by using internal information about how the image was encoded to
recreate those positions. The visual system never needs to do this: it's all there, in the very structure of the system, starting at the pupil of the eye.
So, here is my understanding of vision: it is a stack of transformations, simultaneously experienced. The bottom of the stack is, at the very least, the retinal image (and if the image, why not the logically prior optic array?). Successive levels of the stack analyze the structure of the lower levels, discriminating colors, brightnesses, depths, and identities; this entire stack is
experienced simultaneously, and is identical with visual consciousness. But, the entire thing is anchored in the reality of that bottom layer; take it away, and everything above disappears. Activity in the upper levels can be experienced independently - we can use visual imagination, or have
visual dreams, but these are never
substantial, and I mean this not in a figurative sense - the substance of vision is the retinal image.
This view has consequences. It means that it is impossible to completely reproduce visual experience by any brain-only simulation, i.e. a 'brain in a vat' could never have
complete visual experience. Hallucinations must be mistakes in the upper levels of the stack, and cannot involve substantial features of visual experience - a hallucination is a
mistaking of the spatial organization in the lowest levels for something that it is not. Having had very few hallucinations in my life, this does not conflict with my experiences. I can imagine that a hallucination of a pink elephant could actually involve
seeing a pink elephant in exactly the same experiential terms as if one was there, in physical space, to be seen, but i don't believe it, and I don't think there's any evidence for vision working that way. Similarly, dreams are insubstantial, I claim, because there is nothing in that bottom layer to pin the stack to a particular state; memory, or even immediate experience, of a dream may
seem like visual experience, but this is a mistake of association: we are so accustomed to experiencing activity in the upper stacks as immediately consequent to the image, that when there is activity with no image, we fail to notice that it isn't there! I think, though, that on careful inspection (which is difficult in dreams!), we find that dream vision has indeterminate spatial organization.
Anyways, that's my thinking. This has gone on long enough, I need to work on this proposal...