Группа авторов

The Handbook of Speech Perception


Скачать книгу

have been proposals, like the motor theory of speech perception (Liberman et al., 1967; Liberman & Mattingly, 1985) or the analysis‐by‐synthesis theory (Stevens, 1960) that view speech perception as a kind of active rather than passive process. Analysis by synthesis says that speech perception involves trying to match what you hear to what your own mouth, and other articulators, would have needed to do to produce what you heard. Speech comprehension would therefore involve an active process of covert speech production. Following this line of thought, we might suppose that what the vSMC does, when it is engaged in deciphering what your friend is asking you at a noisy cocktail party, is in some sense the same as what the vSMC does when it is used to articulate your reply. Because we know that place‐of‐articulation features take priority over manner‐of‐articulation features in the vSMC during a speech‐production task (i.e. reading consonant–vowel syllables aloud), we might hypothesize that place‐of‐articulation features will similarly take primacy during passive listening. Interestingly, despite being predicted by theory, this prediction is wrong.

      When Cheung et al. (2016) examined neural response patterns in the vSMC while subjects listened to recordings of speech, they found that, as in the STG, it was the manner‐of‐articulation features that took precedence. In other words, representations in vSMC were conditioned by task: during speech production the vSMC favored place‐of‐articulation features (Bouchard et al., 2013; Cheung et al., 2016), but during speech comprehension the vSMC favored manner‐of‐articulation features (Cheung et al., 2016). As we discussed earlier, the STG is also organized according to manner‐of‐articulation features when subjects listen to speech (Mesgarani et al., 2014). Therefore the representations in these two areas, STG and vSMC, appear to use a similar type of code when they represent heard speech.

Schematic illustration of feature-based representations in the human sensorimotor cortex.

      Source: Cheung et al., 2016. Licensed under CC BY 4.0.

       Auditory feedback networks

      One way to appreciate the dynamic interconnectedness of the auditory brain is to consider the phenomenon of auditory suppression. Auditory suppression manifests, for example, in the comparison of STG responses when we listen to another person speak and when we speak ourselves, and thus hear the sounds we produce. Electrophysiological studies have shown that auditory neurons are suppressed in monkeys during self‐vocalization (Müller‐Preuss & Ploog, 1981; Eliades & Wang, 2008; Flinker et al., 2010). This finding is consistent with fMRI and ECoG results in humans, showing that activity in the STG is suppressed during speech production compared to speech comprehension (Eliades & Wang, 2008; Flinker et al., 2010). The reason for this auditory suppression is thought to be an internal signal (efference copy) received from another part of the brain, such as the motor or premotor cortex, which has inside information about external stimuli when those external stimuli are self‐produced (Holst & Mittelstaedt, 1950). The brain’s use of this kind of inside information is not, incidentally, limited to the auditory system. Anyone who has failed to tickle themselves has experienced another kind of sensory suppression, again thought to be based on internally generated expectations (Blakemore, Wolpert, & Frith, 2000).