Группа авторов

The Handbook of Speech Perception


Скачать книгу

in the vocal tract. At the same time, /d/ and /s/ also belong to the alveolar place‐of‐articulation class because, for both phonemes, the tip of the tongue is brought up toward the alveolar ridge just behind the top row of the teeth. In contrast, /b/ has a labial place of articulation because to articulate /b/ the airflow is constricted at the lips. Manner features are often associated with particular acoustic characteristics. Plosives involve characteristically brief intervals of silence followed by a short noise burst, while fricatives exhibit sustained aperiodic noise spread over a wide part of the spectrum. Classifying speech sounds by place and manner of articulation is certainly very popular among speech scientists, and is also implied in the structure of the International Phonetic Alphabet (IPA), but it is by no means the only possible scheme. Speech sounds can also be described and classified according to alternative acoustic properties or perceptual features, such as loudness and pitch. An example feature that is harder to characterize in articulatory or acoustic terms is sonority. Sonority defines a scale of perceived loudness (Clements, 1990) such that vowels are the most sonorous, and glides are the next most sonorous, followed by then liquids, nasals, and finally obstruents (i.e. fricatives and plosives). Despite the idea of sonority as a multitiered scale, phonemes are sometimes lumped into two groups of sonorant and nonsonorant, with everything but the obstruents counting as sonorants.

Schematic illustration of feature-based representations in the human STG.

      Source: Mesgarani et al., © 2014, The American Association for the Advancement of Science.

      As these examples illustrate, there could in principle be many different ways in which speech sounds are grouped. To ask which grouping is “natural” or “native” for the STG, Mesgarani et al. (2014) used hierarchical clustering of neural responses to speech, examples of which can be seen in the ECoG recordings depicted in Figure 3.7, panel D. The results of the clustering analysis follow in Figure 3.7, panels E–G. Perhaps surprisingly, Mesgarani et al. (2014) discovered that the STG was organized primarily by manner‐of‐articulation features and secondarily by place‐of‐articulation features. The prominence of manner‐of‐articulation features can be seen by clustering the phonemes directly (Figure 3.7, panel F). For example, on the right‐side dendrogram we find neat clusters of plosives /d b g p k t/, fricatives /ʃ z s f θ/, and nasals /m n ŋ/. Manner‐of‐articulation features also stand out when the electrodes are clustered (Figure 3.7, panel G). By going up a column from the bottom dendrogram, we can find the darkest cells (those with the greatest selectivity for phonemes), and then follow these rows to the left to identify the phonemes for which the electrode signal was strongest. The electrode indexed by the leftmost column, for example, recorded neural activity that appeared selective for the plosives /d b g p k t/. In this way, we may also find electrodes that respond to both manner and place of articulation features. For example, the fifth column from the left responds to the bilabial plosives /b p/. Thus, the types of features that phoneticians have for a long time employed for classifying speech sounds turn out to be reflected in the neural patterns across the STG. Mesgarani et al. (2014) argue that this pattern of organization, prioritizing manner over place‐of‐articulation features, is most consistent with auditory‐perceptual theories of feature hierarchies (Stevens, 2002; Clements, 1985). Auditory‐perceptual theories contrast, for instance, with articulatory or gestural theories, which Mesgarani et al. (2014) assert would have prioritized place‐of‐articulation features (Fowler, 1986).

       Auditory phonetic representations in the sensorimotor cortex

      From the STG, we turn now to a second cortical area. The ventral sensorimotor cortex (vSMC) is better known for its role in speech production than in speech comprehension (Bouchard et al., 2013). This part of the cortex, near the ventral end of the SMC (see Figure 3.6), contains the primary motor and somatosensory areas, which send motor commands to and receive touch and proprioceptive information from the face, lips, jaw, tongue, velum, and pharynx. The vSMC plays a key role in controlling the muscles associated with these articulators, and is further involved in monitoring feedback from the sensory nerves in these areas when we speak. Less widely known is that the vSMC also plays a role in speech perception. We know, for example, that a network including frontal areas becomes more active when the conditions for perceiving speech become more difficult (Davis & Johnsrude, 2003), such as when there is background noise or the sound of multiple speakers overlaps (contrast easy listening conditions when distractions like these are absent). This context‐specific recruitment of speech‐production areas may signal that they play an auxiliary role in speech perception, by providing additional computational resources when the STG is overburdened. We might ask how the vSMC, as an auxiliary auditory system that is primarily dedicated to coordinating the articulation of speech, represents heard speech. Does the vSMC represent the modalities of overt and heard speech similarly or differently? Is the representation of heard speech in the vSMC similar or different from that of the STG?