thought that temporal fine structure cues to aspects such as the periodicity pitch of voiced speech sounds become recoded as they ascend the auditory pathway beyond the brainstem. Thus, from about the inferior colliculus onward, temporal fine structure at fast rates is increasingly less represented though fast and highly precise temporal firing patterns, but instead through neurons becoming periodicity tuned (Frisina, 2001); this means that their firing rates may vary as a function of the fundamental frequency of a voiced speech sound, in addition to depending on the amount of sound energy in a particular frequency band. Some early work on periodicity tuning in the inferior colliculus has led to the suggestion that this structure may even contain a periodotopic map (Schreiner & Langner, 1988), with neurons tuned to different periodicities arranged along an orderly periodotopic axis running through the whole length of the inferior colliculus, with the periodotopic gradient more or less orthogonal to the tonotopic axis. Such an arrangement would be rather neat: periodicity being a major cue for sound features such as voice pitch, a periodotopic axis might, for example, physically separate out the representations of voices that differ substantially in pitch. But, while some later neuroimaging studies seemed to support the idea of a periodotopic map in the inferior colliculus (Baumann et al., 2011), more recent, very detailed, and comprehensive recordings with microelectrode arrays have shown conclusively that there are no consistent periotopic gradients running the width, breadth, or depth of the inferior colliculus (Schnupp, Garcia‐Lazaro, & Lesica, 2015), nor are such periodotopic maps a consistent feature of primary auditory cortex (Nelken et al., 2008).
Thus, tuning to periodicity (and, by implication, voicing and voice pitch), as well as to cues for sound‐source direction, is widespread among neurons in the lemniscal auditory pathway from at least the midbrain upward, but neurons with different tuning properties appear to be arranged in clusters without much overarching systematic order, and their precise arrangement can differ greatly from one individual to the next. Thus, neural populations in these structures are best thought of as a patchwork of neurons that are sensitive to multiple features of speech sounds, including pitch, sound‐source direction, and formant structure (Bizley et al., 2009; Walker et al., 2011), without much discernible overall anatomical organization other than tonotopic order.
Primary auditory cortex
So far, in the first half of this chapter we have talked about how speech is represented in the inner ear and auditory nerve, and along the subcortical pathways. However, for speech to be perceived, the percolation of auditory information must reach the cortex. Etymologically, the word cortex is Latin for “rind,” which is fitting as the cerebral cortex covers the outer surface of the brain – much like a rind covers citrus fruit. Small mammals like mice and trees shrews are endowed with relatively smooth cortices, while the cerebral cortices of larger mammals, including humans (Homo sapiens) and, even more impressively, African bush elephants (Loxodonta africana), exhibit a high degree of cortical folding (Prothero & Sundsten, 1984). The more folded, wrinkled, or crumpled your cortex, the more surface area can fit into your skull. This is important because a larger cortex (relative to body size) means more neurons, and more neurons generally mean more computational power (Jerison, 1973). For example, in difficult, noisy listening conditions, the human brain appears to recruit additional cortical regions (Davis & Johnsrude, 2003) which we shall come back to in the next few sections. In this section, we begin our journey through the auditory cortex by touching on the first cortical areas to receive auditory inputs: the primary auditory cortex.
Anatomy and tonotopicity of the human primary auditory cortex
In humans the primary auditory cortex (PAC) is located around a special wrinkle in the cortical sheet, known as Heschl’s gyrus (HG). A gyrus is a ridge, where the cortical sheet is folded outward, while a sulcus describes an inward fold or valley. There are multiple HG in each brain. First, all people have at least two HG, one in each cerebral hemisphere (the left and right halves of the visible brain). These are positioned along the superior aspect of each temporal lobe. In addition, some brains have a duplication in the HG, that is, one or both hemispheres have two ridges instead of one (Da Costa et al., 2011). This anatomical factoid can be useful for identifying PAC (also known as A1) in real brains (as we shall see in Figure 3.5). However, the gyri are used only as landmarks: what matters is the sheet of neurons in and around HG, not whether that area is folded once or twice. This sheet of neurons receives connections from the subcortical auditory pathways, most prominently via the medial geniculate body of the thalamus (see Figure 3.1 and the previous section).
When the cortex is smoothed, in silico, using computational image processing, the primary auditory cortex can be shown to display the same kind of tonotopic maps that we observed in the cochlea and in subcortical regions. This has been known from invasive microelectrode recordings in laboratory animals for decades and can be confirmed to be the case in humans using noninvasive MRI (magnetic resonance imaging) by playing subjects stimuli at different tones and then modeling the optimal cortical responses to each tone. This use of functional MRI (fMRI) results in the kind of tonotopic maps shown in Figure 3.5.
Figure 3.5 Tonotopic map. HG = Heschl’s gyrus; STG = superior temporal gyrus; SG = supramarginal gyrus.
(Source: Adapted from Humphries, Liebenthal, & Binder, 2010.)
Figure 3.5 depicts a flattened view of the left‐hemisphere cortex colored in dark gray. Superimposed onto the flattened cortex is a tonotopic map (grayscale corresponding to the color bar on the bottom right). Each point on the surface of the tonotopic map has a preferred stimulus frequency, in hertz, and along the dotted arrow across HG there is a gradient pattern of responses corresponding to low frequencies, high frequencies, and then low frequencies again. Given this tonotopic organization of the primary auditory cortex, which is in some respects not that different from the tonotopy seen in lower parts of the auditory system, we may expect the nature of the representation of sounds (including speech sounds) in this structure to be to a large extent spectrogram‐like. That is, if we were to read out the firing‐rate distributions along the frequency axes of these areas while speech sounds are represented, the resulting neurogram of activity would exhibit dynamically shifting peaks and troughs that reflect the changing formant structure of the presented speech. That this is indeed the case has been shown in animal experiments by Engineer et al. (2008), who, in one set of experiments, trained rats to discriminate a large set of consonant–vowel syllables and, in another, recorded neurograms for the same set of syllables from the primary cortices of anesthetized rats using microelectrodes. They found, first, that rats can learn to discriminate most American English syllables easily, but are more likely to confuse syllables that humans too find more similar and easier to confuse (e.g. ‘sha’ vs. ‘da’ is easy, but ‘sha’ vs. ‘cha’ is harder). Second, Engineer et al. found that the ease with which rats can discriminate between two speech syllables can be predicted by how different the primary auditory cortex neurograms for these syllables are.
These data would suggest that the representation of speech in primary auditory cortex is still a relatively unsophisticated time–frequency representation of sound features, with very little in the way of recognition, categorization, or interpretation. Calling primary auditory cortex unsophisticated is, however, probably doing it an injustice. Other animal experiments indicate that neurons in the primary auditory cortex can, for example, change their frequency tuning quickly and substantially if a particular task requires attention to be directed to a particular frequency band (Edeline, Pham, & Weinberger, 1993; Fritz et al., 2003). Primary auditory cortex neurons can even become responsive to stimuli or events that aren’t auditory at all if these events are firmly associated with sound‐related