such that the sensory systems of an active perceiver will experience certain consistencies and inconsistencies. The detection of these consistencies is related to the ability of an object or event to produce sensory stimuli in more than one sensory system in a spatiotemporally consistent manner. The detection of inconsistencies is related to the inability of an object or event to produce such sensory stimulation. Whether or not an object or event is capable of producing spatiotemporally consistent multisensory stimulation is determined by whether or not the particular stimulus has properties that can be detected by more than one sensory system. Properties of sensory stimuli that may be detected in more than one sensory system are referred to as amodal stimulus properties. For instance, shape and texture are perceived in both visual and haptic modalities, just as intensity and spatiotemporal information may be perceived in both visual and auditory modalities. Properties of sensory stimuli that may only be detected in one sensory system are referred to as modality-specific properties. As examples, color can only be perceived in the visual modality, whereas tone can only be perceived in the auditory modality. Throughout ontogeny, the interaction between the neural structure of the organism and its environment result in a neural system that processes certain properties best in multisensory contexts and others best in unisensory contexts.
Figure 2.1 A. Diagram of superior colliculus of the cat with associated cortical connections. B. The three sensory representations in the superior colliculus (visual, auditory and somatosensory) are organized in an overlapping topographical map. Space is coded anterior-posterior. And stimuli in space are coded by multiple modalities. In adults, this leads to enhancements of neuronal activity and an increase in behavioral responses. C. In early development, the unisensory inputs converge to produce multisensory neurons but these neurons cannot yet integrate their multisensory cross-modal inputs. This latter function only develops around 4 weeks of age. (From Stein et al. [2014])
The importance of the distinction between multisensory and unisensory perceptual input is most evident in consideration of the ontogeny of multisensory neurons in the evolutionarily early subcortical structure of the superior colliculus (SC). Although multisensory neurons have been found in several brain regions and in many species, the anatomical nature of this region has been most extensively studied in cats [Stein et al. 2014]. The SC is unique, in that decidedly unisensory neurons from the retina, ear, and/or skin may all converge onto a single SC neuron [Wallace et al. 1992, 1993, Stein and Meredith 1993, Fuentes-Santamaria et al. 2009]. Note that convergence is not the same as integration, which is something that has recently been shown to be experience-dependent, at least in the sequentially early sensory convergence region of the SC [Wallace et al. 1992, Stein and Rowland 2011, Stein et al. 2014]. In other words, neurons from several sensory systems may converge at one point, but whether or not these signals are integrated into a single output depends upon the history that a give neuron had with receiving sensory signals. SC neurons in kittens are largely unisensory and only become multisensory if they receive stimulation from multiple sensory systems at the same time. Perhaps even more interesting from a developmental perspective: the emergence of multisensory neurons and their receptive fields are also experience-dependent [Stein and Rowland 2011]. Receptive fields not only change their size through experience, they also change their sensitivity to physical locations. Repeated exposure to a visual stimulus and an auditory stimulus that occur at the same time and at the same physical location will result in overlapping receptive fields. Overlapping receptive fields from more than one sensory system results in faster, stronger, and more reliable neural responses for that combination of stimuli at that location in the future [Stein et al. 2014]. The implications of these phenomena are that events that violate the learned correspondences among sensory modalities are readily detected and, because they converge at such an early stage in sensory processing, are difficult for the perceiver to overcome. Furthermore, projections from SC are widely distributed throughout the cortex and are one of the major pathways by which sensory information reaches the cortex where, presumably, higher-level cognitive functions are carried out, such as object and symbol recognition.
Neural plasticity, the ability of neuronal structures, connections, and processes in the brain to undergo experience-dependent changes, in the adult is well documented with respect to the initial learning of multimodal-multisensory mappings. However, the relearning of these mappings is a decidedly more laborious process, though initial and relearning stages both follow a few basic principles. Although several principles could be mentioned here, we will focus on two: Multisensory enhancement and multisensory depression. First, information about objects and events in the environment can be gleaned by simply relying upon spatiotemporal coincidence to indicate the presence of an object or event; this type of coordinated environmental stimuli contribute to multisensory enhancement. This phenomenon results in an increase in the system’s ability to detect the same object or event based on multisensory, or, to use another word, amodal, input in the future. Second, information about objects and events in the environment can be gleaned by simply relying upon spatiotemporal disparities to indicate the separability of objects or events; this type of uncoordinated environmental stimuli contribute to multisensory depression. This phenomenon results in a decrease in the system’s ability to detect either of those objects or events based on multisensory input in the future. In the case of multisensory enhancement, amodal cues are emphasized and in the case of multisensory depression, modality-specific cues are more readily emphasized. Thus, the functioning of the SC and corresponding connections appear to be foundational attention mechanisms associated with orienting to salient environmental stimulation based on the system’s history of sensory experiences [Johnson et al. 2015].
These principles reflect our knowledge regarding the cellular mechanisms of neural plasticity. The principle of Hebbian learning is foundational to theories of experience-dependent brain changes as it proposes that axonal connections between neurons undergo activity-dependent changes. It has two basic tenants: (1) when two neurons repeatedly fire in a coordinated manner, the connections between them are strengthened, effectively increasing the likelihood of firing together in the future; and (2) when two neurons repeatedly fire in an uncoordinated manner, the connections between them weaken, effectively reducing the likelihood of firing together in the future [Hebb 1949]. The relevance of this theory to experience-dependent brain changes is most readily understood when considering the differences in brain systems supporting the recognition of objects that would result from active, as opposed to passive, behaviors. For this discussion, the crucial difference between active and passive behaviors is that the perceiver in active behaviors performs an action on stimuli in the environment. The crucial implication of that difference is that active behaviors are inherently multimodal, involving both action systems and perceptual systems, and multisensory, involving haptic and visual inputs at a minimum. Passive behaviors, on the other hand, often involve the stimulation of only one sensory modality, rendering them unisensory (see Chapter 3).
Therefore, active interactions with the environment, as opposed to passive interactions, are inherently multisensory and multimodal. Not only do they entail input from various effectors (e.g., hands, eyes) based on the action (e.g., reaching to grasp, saccading), but they also simultaneously produce input to numerous sensory modalities (e.g., somatosensation, vision). Therefore, active interactions produce multisensory information that allow for coactivation of signals, resulting in multisensory enhancement. Beyond multisensory enhancement, however, active interactions have been thought to be necessary for any type of percept, be it unisensory or multisensory. That is, without physical movement, sensory information quickly fades. This phenomenon occurs because sensory receptor organs and neurons stop responding to repeated stimulation. One well-known example is that of the stabilized image phenomenon: if eye movements are inhibited, resulting in a stable retinal image, the resultant percept will dissipate within seconds (for review see [Heckenmueller 1965]). Although it is beyond the scope of this chapter to outline sensory and neural adaptation, it is important to note that for the brain to code change, and thus detect similarities and differences in incoming percepts, action is necessary.
What effect does this knowledge of the mechanisms that underlie experience-dependent