VEMI's Underlying Theory

Many of the same tasks and interface technologies traditionally performed with vision can also be supported by other senses. Indeed, much of what most people consider “visual” we argue is really “spatial”. Take a look around you: how much of what you see would you consider purely visual? Color is certainly in this camp, but we believe there is far more commonality in the information perceived about the world than is sensory-specific. Spatial information is one example of a common thread which is specified, often redundantly, by multiple inputs in our nervous system. Temporal coding and emotional valence represent other examples.  

The vast majority of research on spatial cognition and interface design only considers the role of vision. By considering the world through the lens of one modality, we are missing much about how we actually perceive the world; we do not perceive the world from just one sense, we perceive it as the integrated complement of a multitude of sensory information built up from different inputs. Studying the information processing characteristics and representational structure of multiple channels of sensory information is inherently more challenging and “noisy” than focusing all one’s attention on elaborating these matters for just one. To be effective, it is critical to have a multi-level appreciation of the input(s) of interest and a good understanding of the sensory translation rules between inputs. The more you know about the physiology of the sensory organs of interest (channel dynamics, sensory psychophysics, perceptual biases, cortical projection, memory capacity, etc.) the better you are able to make valid comparisons and translations between inputs. Thus, given VEMI’s core interest in multimodal spatial information research, we adopt a holistic perspective of the enterprise of spatiality—one that focuses on common spatial information content, representation, mental computation, and behavior, rather than emphasizing the specific sensory conduit of this spatial information.

Geometric properties such as 3-D structure, relations between surfaces, lines, and edges are not related to any one modality; indeed, such information can be equivalently represented, transformed, and acted upon from multiple input channels. Likewise, distance and directional cues can be conveyed quite accurately through multiple input sources, as well as the specification of relation (e.g., foreground-background, subject-object, object-object, and their associated transformations). It is one’s knowledge and computation of spatial structure which makes interpretation and processing of these relations possible, not sensory-specific input, such as from visual information, as is commonly assumed. For instance, the relation between a perceiver and their surrounding environment (egocentric perspective), relations between elements in the environment (allocentric perspective), or the ability to calculate how these attributes change given a perceiver’s movement (spatial updating) can all be specified based on information derived from multiple modalities.

All of the information just discussed can be similarly specified through any number of spatial inputs, and thus, we argue, is best considered in terms of its underlying spatial properties rather than from its sensory-specific medium of conveyance. Feeling or seeing the edge of your desk provides the same information content and spatial computation of “edgeness” for both touch and vision. This is not to say that there are no differences between the sensory-specific conduits of this information, this is obviously erroneous, but confusing spatial information as sensory-specific (e.g., calling it visual) is also fallacious. For more on this notion, see the Functional Equivalence page.