& Kuhl, 1996), speech‐perception studies have shown that listener reaction times are slower in identifying a stimulus on a continuum, and their judgments of category goodness are reduced as stimuli approach the phonetic category boundary. Variations in task requirements support the robustness of these gradient effects (e.g. Carney, Widin, & Viemeister, 1977). Thus, phonetic categories are graded and have an internal structure to them (see Miller, 1997). In this sense, categories are not truly binary representations that are either present or absent, but rather some exemplars of a category are better representations of the category than others.
Such findings support a functional architecture in which the degree of activation of a representation is itself graded and influences, as well, the degree of activation of potential competitors. As a stimulus approaches the phonetic category boundary, its activation decreases and there is a concomitant increase in the activation, and hence the extent of competition with the contrasting phonetic category representation. For example, assume a [da]–[ta] continuum ranging in 10 ms steps from 0–40 ms voice onset time (VOT) with a category boundary of 20 ms. As described earlier, there is competition between stimuli that share acoustic properties. Thus, presentation of a 40 ms stimulus (perceived as a [d]) would compete with the representation of the contrasting voiced phonetic category [t]. However, a stimulus with a VOT of 30 ms is a poorer exemplar of the voiceless phonetic category, and thus not only does it activate the phonetic representation of [t] more weakly, but there is an increase in the activation of the contrasting voiced phonetic category [d] (see Blumstein, Myers, & Rissman, 2005).
Neural evidence also supports the gradient nature of phonetic categories. Both temporal and frontal areas show graded responses as a function of the goodness of the phonetic category input, with the least activation for the best exemplar of the phonetic category and increased activation as stimuli on a continuum approach the phonetic category boundary (Blumstein, Myers, & Rissman, 2005; Frye et al., 2007; Guenther et al., 2004). Importantly, other neural areas (middle frontal gyrus, supramarginal gyrus) fail to show such graded activation, displaying sensitivity only to between‐phonetic‐category and not to within‐phonetic‐category differences (Joanisse, Zevin, & McCandliss, 2007; Myers et al., 2009). That there is both graded and categorical perception of phonetic categories reflects two critical aspects of speech perception: the need for sensitivity to fine acoustic differences on the one hand, and sensitivity to category membership on the other. We will return to this point in the Conclusion of this chapter.
Lexical access
Results showing that the perception of phonetic categories is graded and is influenced by the goodness of the stimulus input raises the question of potential effects on lexical access. One possibility is that phonetic category membership is resolved at the phonetic‐phonological level and the fine acoustic differences leading to graded responses at this level are not mapped onto higher levels of processing. Alternatively, given the functional architecture of the speech and lexical processing system proposed earlier, graded phonetic category representations should influence the mapping from the phonetic‐phonological levels to lexical representations. Results of a series of experiments using various methodologies support this latter hypothesis.
Using the visual world paradigm, it has been shown that access to lexical representations is affected by the fine acoustic structure of the auditory input. In particular, it has been shown that looks to a visual target are affected by within‐phonetic‐category acoustic differences (McMurray, Tanenhaus, & Aslin, 2002; see also McMurray, Tanenhaus, & Aslin, 2009). In the 2002 study of McMurray and colleagues, eye movements were measured as participants identified a named target word (using a mouse click) from an array of four pictures. The pictures consisted of a target word whose name began with a labial stop consonant, for example pear, a minimal pair of the target bear, and two phonologically unrelated words, for example lamp and ship. The auditorily presented names varied along a [b]–[p] VOT continuum ranging from 0–40 ms in 5 ms steps. Results showed graded responses with increasing looks to the competitor minimal pair (i.e. bear) as the acoustic‐phonetic input (a VOT variant of [p]) approached the phonetic boundary. These findings show that lexical access is indeed graded.
Perhaps stronger evidence of the effects of within‐phonetic‐category differences on access to higher levels of processing comes from studies showing that within‐phonetic‐category effects not only influence access to lexical representations but also cascade to the lexical semantic network. Examining semantic priming in a lexical decision task, Andruski Blumstein, and Burton (1994) presented prime words semantically related to a target stimulus in which the initial voiceless stop consonant of the prime was an exemplar stimulus (spoken naturally and acoustically unmodified) or it was a poorer exemplar of the voiceless stop phonetic category (the VOT was reduced by one third or two thirds). Shortening the VOT of the stimuli rendered them closer to the voiced phonetic category boundary. Importantly, pilot work showed that stimuli presented alone were perceived correctly as beginning with voiceless stop consonants. However, the reduction of the VOT for the acoustically modified stimuli resulted in a reduction in the magnitude of semantic priming relative to the unmodified prime stimuli, particularly for the primes reduced by two thirds.
In a later study, Misiurski and colleagues (2005) showed that, in addition to the effects described of reduced semantic priming when an initial voiceless stop consonant of a prime word is shortened, that is, reduced priming for t*ime‐clock compared to time‐clock, mediated semantic priming emerged for the minimal pair competitor, that is, t*ime primed penny via dime. Not surprisingly, the magnitude of the mediated semantic priming was less than that obtained for dime–penny. These mediated priming results provide further evidence for the cascading effects of the acoustic input on access to the lexical semantic network. In particular, it shows that the acoustic modification of a prime stimulus not only influences the activation of the lexical semantic network of its semantically related target, but also partially activates the lexical representation of the contrasting voiced minimal pair competitor and subsequently its lexical semantic network.
Feature representations: Articulatory or acoustic
The motor theory of speech perception
We turn now to an unresolved question: What is the nature of feature representations? The crux of the problem turns on variability in the phonetic input. As indicated at the beginning of this chapter, there are many sources of variability that affect and influence the ultimate speech input that the listener receives. The question is whether, despite this variability, there are patterns (articulatory or acoustic) that provide a stable mapping from acoustic input to features and ultimately phonetic categories. At this point, no one has solved this invariance problem, that is, no one has solved the transformation of a variable input to a constant feature or phonetic category representation. Even if one were to assume that lexical representations are episodic, containing fine structure acoustic differences that are used by the listener, as has been proposed by Goldinger (1998) and others, such a view still begs the question. It does not elucidate the nature of the mapping from input to sublexical or lexical representations and thus fails to provide an explanation for how the listener knows that a given stimulus belongs to one phonetic category and not another; that is, what property of the signal tells the listener that the input maps onto the lexical representation of pear and not bear or that the initial consonant is a variant of [p] and not [b].
The pioneering research of Haskins Laboratories in the 1950s tried to solve the invariance problem. It is important to understand the historical context in which this research was conducted. At that time, state‐of‐the‐art speech technology consisted of the sound spectrograph and the pattern playback (see Cooper, 1955; Koenig, Dunn, & Lacy, 1946). The sound spectrograph provided a visual graph of the Fourier transform of the speech input, with time represented on the abscissa, frequency represented on the ordinate, and amplitude represented by the darkness of the various frequency bands. The pattern playback converted the visual pattern of a representation of the sound spectrogram to an auditory output (see Studdert‐Kennedy & Whalen, 1999, for a review). Thus, examining the patterns of