Группа авторов

The Handbook of Speech Perception


Скачать книгу

have confirmed that specific neuron populations in the auditory cortex are sensitive to vocal feedback whereas others are not (Eliades & Wang, 2008). Neurons that are suppressed during speech production show increased firing in response to altered feedback, and thus appear to be more sensitive to errors during speech production (Eliades & Wang, 2008). At the same time, a smaller proportion of neurons that are generally excited during production show reduced firing in response to altered feedback (Eliades & Wang, 2008). Although these neural response changes could in principle be due to changes in the vocal signal as a result of feedback perturbations, playing recordings of the vocalizations and altered vocalizations does not change the differential neuronal firing pattern in response to altering the sound (Eliades & Wang, 2008).

      Muller‐Preuss and Ploog (1981) found that most neurons in the primary auditory cortex of unanesthetized squirrel monkeys that were excited in response to a playback of self‐vocalization were either weakened or completely inhibited during phonation. However, approximately half of superior temporal gyrus (primary auditory cortex) neurons do not demonstrate that distinction (Muller‐Preuss & Ploog, 1981). This ultimately reflects phonation‐dependent suppression in specific populations of auditory cortical neurons. Electrocorticography data in humans has also supported the idea that specific portions of the auditory cortex are supporting auditory feedback processing (Chang et al., 2013).

      In a magnetoencephalography (MEG) study, Houde and colleagues (2002) investigated directly whether vocalization‐induced auditory cortex suppression resulted from a neurological comparison between an incoming signal (auditory feedback) and an internal “prediction” of that signal. They created a discrepancy, or “mismatch,” between the signal and expectation by altering the auditory feedback. Specifically, participants heard a sum of their speech and white noise that lasted the duration of their utterance. The authors found that, when feedback was altered using the gated noise (speech plus white noise), self‐produced speech no longer suppressed M100 amplitude in the auditory cortex. Suppression was observed during normal self‐produced speech. Therefore, these findings support a forward model whereby expected auditory feedback during talking produces cortical suppression of the auditory cortex.

      In order to determine whether a forward model system truly regulates cortical suppression of the human auditory cortex during speech production, Heinks‐Maldonado and colleagues (2005) examined event‐related potentials (N100) during speech production. Like Houde et al. (2002), they found that the amplitude of N100 was reduced in response to unaltered vocalization relative to both pitch shifted and speech from a different voice. Furthermore, during passive listening, neither perturbation produced any N100 amplitude differences. This suggests that suppression of the auditory cortex is greatest when afferent sensory feedback matches an expected outcome specifically during speech production (Heinks‐Maldonado et al., 2005).

      These studies and many others support the existence of neural mechanisms that use the auditory character of the talker’s speech output to control articulation. However, the challenge of mapping high‐level computational models to behavioral and neural data remains. The necessity of different levels of description and the units within the levels are difficult to determine. In short, while there may be only a single neural architecture that manages fluent speech, many abstract cognitive models could be consistent with this architecture (see Griffiths, Lieder, & Goodman, 2015, for a discussion of constraints on cognitive modeling). An additional approach is to examine ontogeny for relationships between perception and production.

      Much is made about the uniqueness of human language and at the speech level, the frequent focus of these uniqueness claims is on the perceptual skills of the developing infant. However, the less emphasized side of communication, speaking, is clearly a specialized behavior. Humans are part of a small cohort of species that are classified as vocal learners and who acquire the sounds in their adult repertoire through social learning (Petkov & Jarvis, 2012). This trait seems to be an example of convergent evolution in a few mammalian (humans, dolphins, whales, seal, sea lions, elephants, and bats) and bird (songbirds, parrots, and hummingbirds) species. The behavioral similarities shown by these disparate species are mirrored by their neuroanatomy and gene expression. In a triumph of behavioral, genetic, and neuroanatomic research, a consortium of scientists has shown similarities in brain pathways for vocal learners that are not observed in species that do not learn their vocal repertoires (Pfenning et al., 2014).

      The DIVA model proposes that early vocal development involves a closed‐loop imitation process driven initially by two stages of babbling and later a vocal learning stage that directly involves corrective feedback processing. In the first babbling phase, random articulatory motions generate somatosensory and acoustic consequences, and a mapping of the developing vocal tract as a speech device takes place. Separately, infants learn the perceptual categories of their native language. This is crucial to the model. As Guenther (1995, p. 599) states, “the model starts out with the ability to perceive all of the sounds that it will eventually learn to produce.” When the second stage of babbling begins, it involves the mapping between phonetic categories developed through perceptual learning and articulation. The babbling during this period tunes the feedback system to permit corrective responses to detected errors. In the next imitation phase, infants hear adult syllables that they try to reproduce. In a cyclical process involving sensory feedback from actual productions and better feedforward commands, the system shapes the early utterances toward the native language.

      Simulations by Guenther and his students support the logic of this account. However, there are significant concerns. First among these is that the data supporting this process are weak or, in the case of MacDonald et al.’s (2012) results, contradict the hypothesis. Early speech feedback processing and the shaping of speech production targets is not well attested. Second, the proposal relies on a strong relationship between the representations of speech perception and speech production. Surprisingly, this relationship is controversial.

      Models