Группа авторов

The Handbook of Speech Perception


Скачать книгу

notions of embodied meaning, as well as of statistical learning, are shaping our thinking about how the brain represents the meaning of speech.

      By the time they reach these meaning‐representing levels of the brain, the waves of neural activity racing up the auditory pathway will have passed through at least a dozen anatomical processing stations, each composed of between a few hundreds of thousands to hundreds of millions of neurons, each of which is richly and reciprocally interconnected both internally and with the previous and the next levels in the processing hierarchy. We hope readers will share our sense of awe when we consider that it takes a spoken word only a modest fraction of a second to travel through this entire stunningly intricate network to be transformed from sound wave to meaning.

      Remember that the picture painted here of a feed‐forward hierarchical network that transforms acoustics to phonetics to semantics is a highly simplified one. It is well grounded in scientific evidence, but it is necessarily a rather selective telling of the story as we understand it to date. Recent years have been a particularly productive time in auditory neuroscience, as insights from animal research, human brain imaging, human patient data and ECoG studies, and artificial intelligence have begun to come together to provide the framework of understanding we have attempted to outline here. But many important details remain unknown, and, while we feel fairly confident that the insights and ideas presented here will stand the test of time, we must be aware that future work may not just complement and refine but even overturn some of the ideas that we currently put forward as our best approximations to the truth. One thing we are absolutely certain of, though, is that studying how human brains speak to each other will remain a profoundly rewarding intellectual pursuit for many years to come.

      1 Aloni, M., & Dekker, P. (2016). The Cambridge handbook of formal semantics. Cambridge: Cambridge University Press.

      2 Ballard, D. H., Hinton, G. E., & Sejnowski, T. J. (1983). Parallel visual computation. Nature, 306, 21–26.

      3  Baumann, S., Griffiths, T. D., Sun, L., et al. (2011). Orthogonal representation of sound dimensions in the primate midbrain. Nature Neuroscience, 14, 423–425.

      4 Belin, P., Zatorre, R. J., Lafaille, P., et al. (2000). Voice‐selective areas in human auditory cortex. Nature, 403, 309–312.

      5 Bizley, J. K., Walker, K. M., Silverman, B. W., et al. (2009). Interdependent encoding of pitch, timbre, and spatial location in auditory cortex. Journal of Neuroscience, 29, 2064–2075.

      6 Blakemore, S.‐J., Wolpert, D., & Frith, C. (2000). Why can’t you tickle yourself? NeuroReport, 11, R11–R16.

      7 Bogen, J. E., & Bogen, G. (1976). Wernicke’s region – where is it? Annals of the New York Academy of Sciences, 280, 834–843.

      8 Bouchard, K. E., Mesgarani, N., Johnson, K., & Chang, E. F. (2013). Functional organization of human sensorimotor cortex for speech articulation. Nature, 495, 327–332.

      9 Brosch, M., Selezneva, E., & Scheich, H. (2005). Nonauditory events of a behavioral procedure activate auditory cortex of highly trained monkeys. Journal of Neuroscience, 25, 6797–6806.

      10 Cheung, C., Hamilton, L. S., Johnson, K., & Chang, E. F. (2016). The auditory representation of speech sounds in human motor cortex. Elife, 5, e12577.

      11 Clements, G. N. (1985). The geometry of phonological features. Phonology, 2, 225–252.

      12 Clements, G. N. (1990). The role of the sonority cycle in core syllabification. Papers in Laboratory Phonology, 1, 283–333.

      13 Da Costa, S., van der Zwaag, W., Marques, J. P., et al. (2011). Human primary auditory cortex follows the shape of Heschl’s gyrus. Journal of Neuroscience, 31, 14067–14075.

      14 Daunizeau, J., David, O., & Stephan, K. E. (2011). Dynamic causal modelling: A critical review of the biophysical and statistical foundations. NeuroImage, 58, 312–322.

      15 Davis, M. H., & Johnsrude, I. S. (2003). Hierarchical processing in spoken language comprehension. Journal of Neuroscience, 23, 3423–3431.

      16 Dayan, P., Hinton, G. E., Neal, R. M., & Zemel, R. S. (1995). The Helmholtz machine. Neural Computation, 7, 889–904.

      17 Dean, I., Harper, N., & McAlpine, D. (2005). Neural population coding of sound level adapts to stimulus statistics. Nature Neuroscience, 8, 1684–1689.

      18 Delgutte, B. (1997). Auditory neural processing of speech. In W. J. Hardcastle & J. Laver (Eds.), The handbook of phonetic sciences (pp. 507–538). Oxford: Blackwell.

      19 Edeline, J. M., Pham, P., & Weinberger, N. M. (1993). Rapid development of learning‐induced receptive field plasticity in the auditory cortex. Behavioral Neuroscience, 107, 539–551.

      20 Eliades, S. J., & Wang, X. (2008). Neural substrates of vocalization feedback monitoring in primate auditory cortex. Nature, 453, 1102–1106.

      21 Engineer, C. T., Perez, C. A., Chen, Y. H., et al. (2008). Cortical activity patterns predict speech discrimination ability. Nature Neuroscience, 11, 603–608.

      22 Ferry, R. T., & Meddis, R. (2007). A computer model of medial efferent suppression in the mammalian auditory system. Journal of the Acoustical Society of America, 122, 3519–3526.

      23 Firth, J. (1957). Papers in linguistics, 1934–1951. Oxford: Oxford University Press.

      24 Flinker, A., Chang, E. F., Kirsch, H. E., et al. (2010). Single‐trial speech suppression of auditory cortex activity in humans. Journal of Neuroscience, 30, 16643–16650.

      25 Fowler, C. A. (1986). An event approach to the study of speech perception from a direct‐realist perspective. Journal of Phonetics, 14, 3–28.

      26 Frisina, R. D. (2001). Subcortical neural coding mechanisms for auditory temporal processing. Hearing Research, 158(1–2), 1–27.

      27  Friston, K. J., Harrison, L., & Penny, W. (2003). Dynamic causal modelling. NeuroImage, 19(4), 1273–1302.

      28 Friston, K., & Kiebel, S. (2009). Predictive coding under the free‐energy principle. Philosophical Transactions of the Royal Society B: Biological Sciences, 364, 1211–1221.

      29 Fritz, J., Shamma, S., Elhilali, M., & Klein, D. (2003). Rapid task‐related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nature Neuroscience, 6, 1216–1223.

      30 Garofolo, J. S., Lamel, L. F., Fisher, W. M., et al. (1993). TIMIT Acoustic‐Phonetic Continuous Speech Corpus. Linguistic Data Consortium, from https://catalog.ldc.upenn.edu/LDC93S1.

      31 Golding, N. L., & Oertel, D. (2012). Synaptic integration in dendrites: Exceptional need for speed. Journal of Physiology, 590, 5563–5569.

      32 Graves, A., & Jaitly, N. (2014). Towards end‐to‐end speech recognition with recurrent neural networks. In ICML’14: Proceedings of the 31st International Conference on Machine Learning, 32(2), 1764–1772.

      33 Greenberg, S. (2006). A multi‐tier framework for understanding spoken language. In S. Greenberg & W. A. Ainsworth (Ed.), Listening to speech: An auditory perspective (pp. 411–430). Mahwah, NJ: Lawrence Erlbaum.

      34 Grinn, S. K., Wiseman, K. B., Baker, J. A., & Le Prell, C. G. (2017). Hidden hearing loss? No effect of common recreational noise exposure on cochlear nerve response amplitude in humans. Frontiers in Neuroscience, 11, 465.

      35 Grose, J. H., Buss, E., & Hall, J. W. (2017). Loud music exposure and cochlear synaptopathy in young adults: Isolated auditory brainstem response effects but no perceptual consequences. Trends in Hearing, 21, 1–18.

      36 Heinz, M. G., Colburn, H. S., & Carney, L. H. (2002). Quantifying the implications of nonlinear cochlear tuning for auditory‐filter estimates. Journal of the Acoustical Society of America, 111, 996–1011.

      37 Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language.