The Handbook of Speech Perception. Группа авторов. Читать онлайн. Hotlib. HOTLIB.NET

The Handbook of Speech Perception

rode tasted the airplane 0 0 1 0 1 1 0 2 boat 1 0 1 0 1 1 0 2 celery 0 0 0 1 0 0 1 1 strawberry 0 1 0 0 0 0 1 1

With a reasonable vector representation for words like these, one can begin to see how it may be possible to predict the brain activation for word meanings (Mitchell et al., 2008). Start with a fairly large set of words and their vector representations, and record the brain activity they evoke. Put aside some of the words (including perhaps the word ‘strawberry’) and use the remainder as a training set in order to find the best linear equation that maps from word vectors to patterns of brain activation. Finally, use that equation to predict what the brain activation should have been for the words you held back, and test how similar that predicted brain activation is to the one that is actually observed, and whether the activation patterns for ‘strawberry’ is indeed more similar to that of ‘celery’ than it is to that of ‘boat.’ One similarity measure commonly used for this sort of problem is the cosine similarity, which can be defined for two vectors ⃗p and ⃗q, according to the following formula:

Now if we plug the context‐word embeddings for each pair of words from our four‐word set into this equation, we end up with the similarity scores shown in Table 3.3. Note that numbers closer to 1 mean more similar and numbers closer to 0 mean more dissimilar. A perfect score of 1 actually means identical, which we see when we compare any word embedding with itself. Note that we have only populated the diagonal and upper triangle of this table, because the lower part is a reflection of the upper part, and therefore redundant.

As expected, the words ‘airplane’ and ‘boat’ received a very high similarity score (0.94), whereas ‘airplane’ and ‘celery,’ for example, received lower similarity scores (0.41). The score for ‘celery’ and ‘strawberry,’ however, were also more similar (0.67). Summary statistics such as these for the similarity between two very long lists of numbers are quick and easy to compute, even for very long lists of numbers. Exploring them also helps to build an intuition about how encoding models, such as those of Mitchell et al. (2008), represent the meanings of words and thus what the brain maps they discover represent. Specifically, Firth’s (1957) idea that the company a word keeps can be used to build up a semantic representation of a word has had a profound impact on the study of semantics recently, especially in the computational fields of natural language processing and machine learning (including deep learning). Mitchell et al.’s (2008) landmark study bridged natural language processing with neuroscience in a way that finds common ground for both fields at the time of writing. Not only do we expect words that belong to similar semantic domains to co‐occur with similar context words, but if the brain is capable of statistical learning, as many believe, then this is exactly the kind of pattern we should expect to find encoded in neural representations.

Table 3.3 Cosine similarities between four words.

	airplane	boat	celery	strawberry
airplane	1	0.94	0.44	0.44
boat	–	1	0.41	0.41
celery	–	–	1	0.67
strawberry	–	–	–	1

To summarize, we have only begun to scratch the surface of how linguistic meaning is represented in the brain. But figuring out what the brain is doing when it is interpreting speech is so important, and mysterious, that we have tried to illustrate a few recent innovations in enough detail that the reader may begin to imagine how to go further. Embodied meaning, vector representations, and encoding models are not the only ways to study semantics in the brain. They do, however, benefit from engaging with other areas of neuroscience, touching for example on the homunculus map in the somatosensory cortex (Penfield & Boldrey, 1937). It is less clear, at the moment, how to extend these results from lexical to compositional semantics. A more complete neural understanding of pragmatics will also be needed. Much work remains to be done. Because spoken language combines both sound and meaning, a full account of speech comprehension should explain how meaning is coded by the brain. We hope that readers will feel inspired to contribute the next exciting chapters in this endeavor.

Conclusion

Our journey through the auditory pathway has finally reached the end. It was a substantial trip, through the ear and auditory nerve, brainstem and midbrain, and many layers of cortical processing. We have seen how, along that path, speech information is initially encoded by some 30,000 auditory nerve fibers firing hundreds of thousands of impulses a second, and how their activity patterns across the tonotopic array encode formants, while their temporal firing patterns encode temporal fine structure cues to pitch and voicing. We have learned how, as these activity patterns then propagate and fan out over the millions of neurons of the auditory brainstem and midbrain, information from both ears are combined to add cues to sound‐source direction. Furthermore, temporal fine structure information gets recoded, so that temporal firing patterns at higher levels of the auditory brain no longer need to be read out with sub‐millisecond precision, and information about the pitch and timbre of speech sounds is instead encoded by a distributed and multiplexed firing‐rate code. We have seen that the neural activity patterns at levels up to and including the primary auditory cortex are generally thought to represent predominantly physical acoustic or relatively low‐level psychoacoustic features of speech sounds, and that this is then transformed into increasingly phonetic representations at the level

Скачать книгу