of speech deviation differed considerably from individual to individual in the small samples.
The study of postlingually deafened individuals represents the best window onto the role played by auditory feedback in a well‐developed human control system. While the effects of hearing loss on speech are not immediate, both consonant and vowel errors emerge over time (Zimmermann & Rettaliata, 1981; Osberger & McGarr, 1982; Waldstein, 1990; Lane & Webster, 1991; Cowie & Douglas‐Cowie, 1992). Other effects on speech caused by a long‐term lack of auditory feedback are of a suprasegmental nature. These include a slower overall rate of speech, higher and more inconsistent pitch, overstressing syllables and words, and a greater mean intensity (Cowie, Douglas‐Cowie, & Kerr, 1982; Plant, 1984; Leder, Spitzer, & Kirchner, 1987; Waldstein, 1990; Lane & Webster, 1991; Cowie & Douglas‐Cowie, 1992). All of these factors contribute to the overall loss of intelligibility of speech.
The study of vocal learning in birds has permitted more systematic studies of the effects of deafening at different ages of development. In a classic study by Lombardino and Nottebohm (2000) groups of zebra finches were deafened at intervals ranging from 81 days to six years. Changes in song were strongly correlated with age of deafening. The songs of birds deafened earlier (e.g. at three months) deteriorated much more quickly (approximately a week) compared to birds deafened between two and five years. The latter took more than a year to show quantifiable deficits.
In birds, invasive ablation studies have shown that the relationship between acquired song and auditory feedback is not simple. Anatomical studies have revealed at least two distinct pathways for the vocal control of song that converge on the song motor cortex. One pathway is strongly influenced by the cortical premotor area HVC and the other has strong influences from the lateral magnocellular nucleus of the anterior nidopallium (LMAN). An oversimplification of the contributions of these two anatomical regions is that one controls the memorized song (HVC) and the other influences the variability of the pitch and amplitude of productions (LMAN). The role of auditory feedback in mediating the influence of these two systems is intriguing. It has been suggested that auditory feedback may influence the gain of the variability system (Bertram et al., 2014). When birds are deafened, the production of structured song is impaired (e.g. dropped syllables and deteriorated structure of syllables), but later, when LMAN is ablated, the effects of deafening are reversed, at least in those with moderate decline (Nordeen & Nordeen, 2010), and variability is reduced. The authors conclude that deafening induces song deterioration and LMAN activity contributes to that degradation. Thus, in birds, there are neural systems such as LMAN that play a direct role in determining the amount of vocal variability.
Overall, the data indicate that hearing is vital for vocal learning but also that the maintenance of accurate vocal production relies on auditory feedback. Without this auditory feedback, sound production changes across the full range of attributes (e.g. precision of frequency, timing, consistency).
Real‐time manipulations of auditory feedback
Separate from clinical evidence, behavioral studies of auditory feedback in speech have been carried out for more than a century. In 1911 the otolaryngologist Étienne Lombard published “Le signe de l’élévation de la voix” (“The symptom of the raised voice”; Lombard, 1911), in which he noted a patient’s tendency to speak more loudly when a loud noise was transmitted to one ear. This became the first published evidence for a feedback mechanism by which real‐time speech perception could influence speech production (Brumm & Zollinger, 2011) and, more than 100 years later, the Lombard effect remains the most persistent and robust feedback phenomenon within psycholinguistic speech production research.
A notable feature of real‐time speech corrections is that they appear to be largely involuntary and often occur without awareness. In one study, speakers who wore headphones persisted in raising their volume when loud noises were played, even when informed by an interviewer that they were doing so (Mahl, 1972). While learned inhibition of the Lombard effect in humans is possible (Pick et al., 1989), it remains persistent in spontaneous speech and has been observed in young children (Siegel et al., 1976) as well as Old World monkeys (Sinnott, Stebbins, & Moody, 1975), whales (Parks et al., 2011), and a multitude of songbird species (see Cynx et al., 1998; Kobayasi & Okanoya, 2003; Leonard & Horn, 2005).
Other types of speech feedback distortion also show compensatory responses. In a common paradigm, speakers’ formants (resonances of the vocal tract) are adjusted away from what is actually being produced – for example, a speaker might produce the vowel /ε/, and hear themselves say the vowel /æ/. In response, the speaker may compensate by shifting their own production in the opposite direction in frequency. In this example, in compensation they might produce a vowel closer to /I/ (Houde & Jordan, 1998; Purcell & Munhall, 2006). Interestingly, such compensation is often incomplete, such that the relative magnitude of the response is less than the magnitude of the perturbation and individual variability is considerable (MacDonald, Purcell, & Munhall, 2011). In Figure 4.1, perturbations of the first formant (F1) and average compensations (dots) are shown (MacDonald, Goldberg, & Munhall, 2010). Three perturbations to F1 are introduced in steps over a series of trials. The dots show that, on average, subjects responded in a manner that counteracted the perturbation. However, as can be seen, even for the smallest perturbation of 50 Hz, the compensation is incomplete. Subjects make changes less than this even though they are capable of making a compensation large enough to correct this error as evidenced by their response to the 200 Hz perturbation at the end of the series.
Vocal‐pitch perturbations produce the same pattern of partial compensation and individual variability. When the fundamental frequency (F0) is raised or lowered, talkers tend to compensate by producing speech with F0 shifts in the opposite direction in frequency to the perturbation (Burnett et al., 1998; Jones & Keough, 2008). Such pitch compensations can be reduced but not eliminated with specific instruction in conjunction with intensive training (Zarate & Zatorre, 2008). As in the Lombard effect, compensation in response to formant and pitch perturbations appears to be largely automatic (Munhall et al., 2009).
Figure 4.1 Perturbation (solid line) and average compensation (dots) of first formant frequency in hertz. The frequencies have been normalized to the mean of the baseline phase
(Source: Adapted from MacDonald, Goldberg, & Munhall, 2010).
In birdsong, feedback perturbations result in similar responses. Pitch shifting single notes yields a compensatory response wherein vocal output shifts in the direction opposite to the perturbation (Sober & Brainard, 2009). As with humans, this response is often incomplete. Sober and Brainard (2009) found that a 100 percent pitch shift yielded a 50 percent change in response on average; however, contrary to the pattern observed in humans, this compensation is not immediate. In the same experiment, Sober and Brainard (2009) found that pitch shifts developed across a two‐week period, and that, once the pitch shift stimulus was removed, return to baseline was gradual. In humans, compensations in response to feedback perturbations are observed within single testing sessions (see Purcell & Munhall, 2006; Terband, van Brenk, & van Doornik‐van der Zee, 2014; Zarate & Zatorre, 2008) and even single trials (Tourville, Reilly, & Guenther, 2008), and speech acoustics return to baseline slowly within a session after removal of the perturbation stimulus (Purcell & Munhall, 2006). The reasons for these interspecies differences are unclear; however, the evidence overwhelmingly supports the notion that both humans and songbirds actively correct for “errors” in vocal production, comparing vocal output to some form of target in real time.
A notable exception to direct compensation occurs in response to delayed auditory feedback (DAF), wherein time delays are introduced between speech production and audition.