a phenomenon that is frequently assumed but is not strongly supported by actual data – babbling drift. The hypothesis that the sounds of babbling drift over time was first proposed by Roger Brown (1958). Brown suggested that the phonetic repertoire in the babbling of infants slowly begins to resemble the phonetics of the language environment that they are exposed to and begins to not include sounds that are absent from the native language. As the review by Best et al. (2016) indicates, the support for this idea is mixed, particularly from transcription studies and perceptual studies where naive listeners attempted to identify the language environment of the infant’s babbling.
There is broad agreement that early babbling has common characteristics across languages and a somewhat limited phonetic repertoire. The evidence from later babbling lacks this broad consensus. Older transcription studies of late babbling were often plagued by small sample sizes and bias issues inherent in transcription. Studies with larger sample sizes, however, still show conflicting patterns of results. For example, de Boysson‐Bardies and Vihman (1991) reported that the prevalence of consonants of different manners and places of articulation in the babbling of 12‐month old infants from English, French, Japanese, and Swedish homes corresponded to the distributions of consonants in their language environments. In contrast, a number of other transcription studies have failed to find such differences (e.g. Kern, Davis, & Zink, 2009; Lee, Davis, & MacNeilage, 2010).
Figure 4.2 Average F1 (circles) and F2 (triangles) frequencies estimates across time for adults (top panel), young children (middle panel), and toddlers (bottom panel). The formant frequencies have been normalized to the average baseline frequencies. The shaded area indicates when subjects were given altered auditory feedback (from MacDonald et al., 2012).
Source: MacDonald et al., © 2012, Elsevier.
Another approach has been to use recordings of infants babbling as perceptual stimuli and ask adult listeners to categorize what native language the infants have. These studies have also shown mixed results, with some experiments reporting that listeners can discriminate the home language of the infants (e.g. de Boysson‐Bardies, Sagart, & Durand, 1984) while others showed no perceptual difference (e.g. Thevenin et al., 1985). The more serious concern about these studies is that listeners were likely tuning into prosodic differences in the babbling rather than the segmental differences that would be predicted by babbling drift. The ability to perceptually distinguish the language of babbling has been shown for low‐pass filtered stimuli (e.g. Whalen, Levitt, & Wang, 1991) and this supports the idea that it is prosodic differences that are driving these results. A recent controlled study (Lee et al., 2017) with a large number of stimuli found that perceptual categorizations of Chinese‐ and English‐learning babies’ utterances at 8, 10, and 12 months of age were only reliable for a small subset of the stimuli (words or canonical syllables that resembled words). These effects were modest and suggest that early lexical development rather than babbling may be where the home language shows its earliest influence.
Direct measurements of babbling acoustics have shown evidence for babbling drift, albeit only small effects. For example, Whalen, Levitt, and Goldstein (2007) measured voice onset time (VOT) in French‐ and English‐learning infants at ages 9 and 12 months. There were no differences in VOT or in the duration of prevoicing that was observed. However, there was a greater incidence of prevoicing in the French babies which corresponds to adult French–English differences.
The most serious concern from the existing data is that there is no evidence for speech‐production tuning of targets based on production errors. MacDonald et al. (2012) data suggest that young children do not correct errors. However, there are several caveats to that conclusion. First, the magnitude of the perturbation may have to be within a critical range and the perturbations for all ages in MacDonald et al. were the same in hertz. It is possible that younger children require larger perturbations to elicit compensations. A related issue is that the perturbations may have been within the noisy categories that the children were producing. The variability of production may be an indicator of the category status. However, even if this were true, it begs the question: How could an organism learn to produce adult targets under these conditions? The challenges are enormous. Juveniles in all species have vocal tracts that do not match their parents’ vocal tracts. Birds and other species show marked production variance as juveniles (e.g. Bertram et al., 2014). There is no obvious feedback base mechanism that permits the mapping from adult targets to young productions (see Messum & Howard, 2015). Error correction as normally envisioned in motor control may not be engaged.
Perception–production interaction
This puzzle reflects the general problem of understanding the relationship between the processes of listening to speech and producing it. Liberman (1996, p. 247) stated:
In all communication, sender and receiver must be bound by a common understanding about what counts; what counts for the sender must count for the receiver, else communication does not occur. Moreover, the processes of production and perception must somehow be linked; their representation must, at some point, be the same.
This is certainly true in a very general sense but the roles played in communication by the auditory signal that reaches the listener and by the signal that reaches the speaker are dramatically different. For the listener, the signal is involved in categorical discrimination and information transmission, while for the talker the signal is primarily thought to influence motor precision and error correction. These two issues are not independent but are far from equivalent. The problem for researchers is that the perception and production of speech are so intrinsically intertwined in communication that it is difficult to distinguish the influence of these “two solitudes” of speech research on spoken language.
While historically the relationship between speech perception and production has been implicated as explanations of language change, patterns of language disorder, and the developmental time course of speech acquisition, there has been little comprehensive theorizing about how speech input and output interact (Levelt, 2013). Recently, Kittredge and Dell (2016) outlined three stark hypotheses about the relationship between speech perception and production. In their view, the representations for perception and production could be completely separate, absolutely inseparable, or separable under some if not many conditions.
A number of different types of experimental evidence might distinguish these possibilities, including (1) data that examine whether learning/adaptation changes in perception influence production and vice versa; (2) correlational data showing individual differences in the processing of speech perception and production (e.g. perceptual precision and production variability); and (3) data showing interference between the two processes of perception and production.
Learning/adaptation changes
In speech perception, selective adaptation for both consonants and vowels results in changes to category boundaries after exposure to a repetitive adapting stimulus. Cooper (1974; Cooper & Lauritsen, 1974) reported production changes in produced VOT following repeated presentation of a voiceless adapting stimulus. In a manner similar to the effects of selective adaptation on perceptual category boundaries, talkers produced shorter VOTs after adaptation. The effect was attributed to a perceptuomotor mechanism that mediates both the perception and the production of speech. More recently, Shiller et al. (2009) found that, when subjects produced fricatives with frequency‐altered feedback, they produced fricatives that compensated for the perturbation. Most interestingly, the subjects shifted their perceptual boundary for /s/–/sh/ identification following this production perturbation. However, as Perkell (2012) cautions, the segment durations of the fricatives were far beyond the natural range, raising the possibility that the effect was acoustic rather than phonetic. Lametti et al. (2014) showed the opposite direction of influence. A perceptual training