464–476.
121 Waldstein, R. S. (1990). Effects of postlingual deafness on speech production: Implications for the role of auditory feedback. Journal of the Acoustical Society of America, 88(5), 2099–2114.
122 Whalen, D. H., Levitt, A. G., & Goldstein, L. M. (2007). VOT in the babbling of French‐and English‐learning infants. Journal of Phonetics, 35(3), 341–352.
123 Whalen, D. H., Levitt, A. G., & Wang, Q. (1991). Intonational differences between the reduplicative babbling of French‐ and English‐learning infants. Journal of Child Language, 18(3), 501–516.
124 Yates, A. J. (1963). Recent empirical and theoretical approaches to the experimental manipulation of speech in normal subjects and in stammerers. Behaviour Research and Therapy, 1(2‐4), 95–119.
125 Zarate, J. M., & Zatorre, R. J. (2008). Experience‐dependent neural substrates involved in vocal pitch regulation during singing. NeuroImage, 40(4), 1871–1887.
126 Zimmermann, G., & Rettaliata, P. (1981). Articulatory patterns of an adventitiously deaf speaker: Implications for the role of auditory information in speech production. Journal of Speech, Language, and Hearing Research, 24(2), 169–178.
5 Features in Speech Perception and Lexical Access
SHEILA E. BLUMSTEIN
Brown University, United States
One of the goals of speech research has been to characterize the defining properties of speech and to specify the processes and mechanisms used in speech perception and word recognition. A critical part of this research agenda has been to determine the nature of the representations that are used in perceiving speech and in lexical access. However, there is a lack of consensus in the field about the nature of these representations. This has been largely due to evidence showing tremendous variability in the speech signal: there are differences in vocal tract sizes; there is variability in production even within an individual from one utterance to another; speakers have different accents; contextual factors, including vowel quality and phonetic position, affect the ultimate acoustic output; and speech occurs in a noisy channel. This has led researchers to claim that there is a lack of stability in the mapping from acoustic input to phonetic categories (sound segments) and mapping from phonetic categories to the lexicon (words). In this view, there are no invariant or stable acoustic properties corresponding to the phonetic categories of speech, nor is there a one‐to‐one mapping between the representations of phonetic categories and lexical access. As a result, although there is general consensus that phonetic categories (sound segments) are critical units in perception and production, studies of word recognition generally bypass the mapping from the auditory input to phonetic categories (i.e. phonetic segments), and assume that abstract representations of phonetic categories and phonetic segments have been derived in some unspecified manner from the auditory input.
Nonetheless, there are some who believe that stable speech representations can be derived from the auditory input. However, there is fundamental disagreement among these researchers about the nature of those representations. In one view, the stability is inherent in motor or speech gestures; in the other, the stability is inherent in the acoustic properties of the input.
In this chapter, we will use behavioral, psychoacoustic, and neural evidence to argue that features (properties of phonetic segments) are basic representational units in speech perception and in lexical access. We will also argue that these features are mapped onto phonetic categories of speech (phonetic segments), and subsequently onto lexical representations; that these features are represented in terms of invariant (stable) acoustic properties; and that, rather than being binary (either present or not), feature representations are graded, providing a mapping by degrees from sounds to words and their meanings during lexical access.
Preliminaries
To set the stage for our discussion, it is necessary first to provide a theoretical framework of the functional architecture of the word recognition system. Here, we will briefly specify the various components and stages of processing, identify the proposed representations in each of these components, and describe the nature of the information flow between the components. It is within this framework that we will consider feature representations. As a starting point for the discussion of features as representational units, it is useful to provide motivation and evidence for the theoretical construct of features. We will then turn to the evidence that features are indeed representational units in speech perception and word recognition.
Functional architecture of word recognition
It is assumed in nearly all models of word recognition that there are multiple components or stages of processing in the mapping from sound to words. The first stage of processing involves the transformation of the auditory input from the peripheral auditory system into a spectro‐temporal representation based on the extraction of auditory patterns or properties from the acoustic signal. This representation is in turn converted at the next stage to a more abstract phonetic‐phonological representation corresponding to the phonetic categories of speech. The representation units at this stage of processing are considered to include segments and (as we will claim) features. These units then interfaces with the lexical processing system where the segment and feature representations map onto the lexicon (words). Here, a particular lexical entry is ultimately selected from a potential set of lexical candidates or competitors. Each lexical entry in turn activates its lexical semantic network where the meaning of the lexical entry is ultimately contacted.
What is critically important is the functional architecture of this hierarchical system. Current models consider that the system is characterized by a distributed, network‐like architecture in which representations at each level of processing are realized as patterns of activation with properties of activation, inhibition, and competition (McClelland & Elman, 1986; McClelland & Rumelhart, 1986; Gaskell & Marslen‐Wilson, 1999). Not only do the dynamic properties of the network influence the degree to which a particular representation (e.g. a feature, a segment, a word) is activated or inhibited, but patterns of activation also spread to other representations that share particular structural properties. It is also assumed that the system is interactive with information flow being bidirectional; lower levels of representations may influence higher levels, and higher levels may influence lower levels. Thus, there is spreading activation not only within a level of representation (e.g. within the lexical network), but also between and within different levels of representation (e.g. phonological, lexical, and semantic levels).
There are several consequences of such a functional architecture, as shown in Figure 5.1. First, there is graded activation throughout the speech‐lexical processing system; that is, the extent to which a given representation is activated is a function of the “goodness” of the input. Thus, the activation of a potential candidate is not all‐or‐none but rather is graded or probabilistic. For example, the activation of a phonetic category such as [k] will be influenced by the extent to which the acoustic‐phonetic input matches its representation. It is worth noting that graded activation is more complex than simply the extent to which a particular phonetic attribute matches its representation. Rather, the extent of activation reflects the totality of the acoustic properties giving rise to a particular phonetic category. Thus, the activation of the phonetic feature [voicing] would include the probabilities of voice onset time, burst amplitude and duration, fundamental frequency, to name a few (see Lisker, 1986). Second, because the system is interactive, activation patterns at one level of processing will influence activation at another level of processing. For example, a poor acoustic‐phonetic exemplar of a phonetic category such as [k] will influence the activation of the lexical representation of a word target such as cat.