Группа авторов

The Handbook of Speech Perception


Скачать книгу

(b) show spectrograms for two words, faster /fæstr/ and factor /fæktr/. The segments of the spectrograms for /s/ and /k/ are indicated by dashed lines. The arrow in (a) points to aperiodic energy in higher‐frequency bands associated with fricative sounds like [s], which is absent in (b). (c) and (d) show neural reconstructions when subjects heard (a) and (b). (e) and (f) show neural reconstructions when subjects heard the masked stimulus /fæ#tr/. In (e), subjects heard On the highway he drives his car much /fæ#tr/, which caused them to interpret the masked segment as /s/. In (f), the context suggested that the masked segment should be /k/.

      Source: Leonard et al., 2016. Licensed under CC BY 4.0.

      In the next and final section, we turn from sounds to semantics and to the representation of meaning in the brain.

       Embodied meaning

      Despite the difficulty of comprehending the totality of what an example of speech might mean to your brain, there are some relatively easy places to begin. One kind of meaning a word might have, for instance, will relate to the ways in which you experience that word. Take the word ‘strawberry.’ Part of the meaning of this word is the shape and vibrant color of strawberries that you have seen. Another is how it smells and feels in your mouth when you eat it. To a first approximation, we can think of the meaning of the word ‘strawberry’ as the set of associated images, colors, smells, tastes, and other sensations that it can evoke. This is a very useful operational definition of “meaning” because it is to an extent possible to decode brain responses in sensory and motor areas and test whether these areas are indeed activated by words in the ways that we might expect, given the word’s meanings. To take a concrete example of how this approach can be used to distinguish the meaning of two words, consider the words ‘kick’ and ‘lick’: they differ by only one phoneme, /k/ versus /l/. Semantically, however, the words differ substantially, including, for example, by the part of the body that they are associated with: the foot for ‘kick’ and the tongue for ‘lick.’ Since, as we know, the sensorimotor cortex contains a map of the body, the so‐called homunculus (Penfield & Boldrey, 1937), with the foot and tongue areas at opposite ends, the embodied view of meaning would predict that hearing the word ‘kick’ should activate the foot area, which is located near the very top of the head, along the central sulcus on the medial surface of the brain, whereas the word ‘lick’ should active the tongue area, on the lateral surface almost all the way down the central sulcus to the Sylvian fissure. And indeed, these predictions have been verified now over a series of experiments (Pulvermüller, 2005): when you hear a word like ‘kick’ or ‘lick,’ not only does your brain represent the sounds of these words through the progression of acoustic, phonetic, and phonological representations in a hierarchy of auditory‐processing centers that has been discussed in this chapter, but your brain also represents the meaning of these words across a network of associations that certainly engage your sensory and motor cortices, and, as we shall see, many other cortical regions too.

       Vector representations and encoding models

      One difficulty in studying meaning is that “meaning” can be challenging to define. If you ask what the word ‘strawberry’ means, we might point at a strawberry. If we know the activity in your visual system that is triggered by looking at a strawberry, then we can point to similar activity patterns in your visual system when you think of the word ‘strawberry’ as another kind of meaning. You might imagine that it is harder to point to just any part of the brain and ask of its current state, “Is this a representation of ‘strawberry’?” But it is not impossible. In this subsection, we will, in as informal a way as possible, introduce the ideas of vector representations of words, and encoding models for identifying the neural representations of vectors.

      Generally speaking, an encoding model aims to predict how the brain will respond to a stimulus. Encoding models contrast with decoding models, which aim to do the opposite: guess which stimulus caused the brain response. The spectrogram reconstruction method (mentioned in a previous section) is an example of a decoding model (Mesgarani et al., 2008). An encoding model of sound would therefore try to predict the neural response to an audio recording. In a landmark study of semantic encoding, Mitchell et al. (2008) were able to predict fMRI responses to the meanings of concrete nouns, like ‘celery’ and ‘airplane.’ Unlike studies of embodied meaning, Mitchell et al. (2008) were able to predict neural responses that were not limited to the sensorimotor systems. For instance, they predicted accurate word‐specific neural responses across bilateral occipital and parietal lobes, the fusiform and middle frontal gyri, and sensory cortex; the left inferior frontal gyrus; the medial frontal gyrus and the anterior cingulate (see Figure