Группа авторов

The Handbook of Speech Perception


Скачать книгу

it was possible to make hypotheses about particular portions of the signal or cues corresponding to particular features of sounds or segments (phonetic categories). Using the pattern playback, these potential cues were then systematically varied and presented to listeners for their perception. Results reported in their seminal paper (Liberman et al., 1967) showed clearly that phonetic segments occur in context, and cannot be defined as separate “beads on a string.” Indeed, the context ultimately influences the acoustic manifestation of the particular phonetic segment, resulting in acoustic differences for the same features of sound. For example, sound spectrograms of stop consonants show a burst and formant transitions, which potentially serve as cues to place of articulation in stop consonants. Varying the onset frequency of the burst or the second formant transition and presenting them to listeners provided a means of systematically assessing the perceptual role these cues played. Results showed there was no systematic relation between a particular burst frequency or onset frequency of the second formant transition to place of articulation in stop consonants (Liberman, Delattre, & Cooper, 1952). For example, there was no constant burst frequency or formant transition onset that signaled [d] in the syllables [di] and [du]. Rather, the acoustic manifestation of sound segments (and the features that underlie them) is influenced by the acoustic parameters of the phonetic contexts in which they occur.

      Liberman et al. (1967) recognized that listener judgments were still consistent. What then allowed for the various acoustic patterns to be realized as the same consonant? They proposed the motor theory of speech perception, hypothesizing that what provided the stability in the variable acoustic input was the production of the sounds or the articulatory gestures giving rise to them (for reviews see Galantucci, Fowler, & Turvey, 2006; Liberman et al., 1967; Fowler, 1986; Fowler, Shankweiler, & Studdert‐Kennedy, 2016). In this view, despite their acoustic variability, constant articulatory gestures provided phonetic category stability – [p] and [b] are both produced with the stop closure at the lips, [t] and [d] with the stop closure at the alveolar ridge, and [k] and [g] are produced with the closure at the velum.

      It would not be surprising to see activation of motor areas during the perception of speech, as listeners are also speakers, and speakers perceive the acoustic realization of their productions. That there is a neural circuit bridging temporal and motor areas then would be expected (see Hickok & Poeppel, 2007). However, what needs to be shown in support of the motor (gesture) theory of speech is that the patterns of speech‐perception representations are motoric or gestural. It is, of course, possible that there are gestural as well as acoustic representations corresponding to the features of speech. However, at the minimum, to support the motor theory of speech, gestures need to be identified that provide a perceptual standard for mapping from auditory input to phonetic feature. As we will see shortly, the evidence to date does not support such a view (for a broad discussion challenging the motor theory of speech perception see Lotto, Hickok, & Holt, 2009).

       The acoustic theory of speech perception

      Despite the variability in the speech input, there is the possibility that there are more generalized acoustic patterns that can be derived that are common to features of sounds, patterns that override the fine acoustic detail derived from analysis of individual components of the signal such as burst frequency or frequency of the onset of formant transitions. The question is where in the signal such properties might reside and how they can be identified.

      One hypothesis that became the focus of the renewed search for invariant acoustic cues was that more generalized patterns could be derived at points where there are rapid changes in the spectrum. These landmarks serve as points of stability between transitions from one articulatory state to another (Stevens, 2002). Once the landmarks were identified, it was necessary to identify the acoustic parameters that provided stable patterns associated with features and ultimately phonetic categories. To this end, research focused on the spectral patterns that emerged from the integration of amplitude and frequency parameters within a window of analysis rather than considering portions of the speech signal that had been identified on the sound spectrogram and considered to be distinct acoustic events.

      Invariant properties were identified for additional phonetic features, giving rise to a theory of acoustic invariance hypothesizing that, despite the variability in the acoustic input, there were more generalized patterns that provided the listener with a stable framework for the perception of the phonetic features of language (Blumstein & Stevens, 1981; Stevens & Blumstein, 1981; see also Kewley‐Port, 1983; Nossair & Zahorian, 1991). These features include those signifying manner of articulation for [stops], [glides], [nasals], and [fricatives] (Kurowski & Blumstein, 1984; Mack & Blumstein, 1983; Shinn & Blumstein, 1984; Stevens & Blumstein, 1981). Additionally, research has shown that if the speech auditory input were normalized for speaker and vowel context, generalized patterns can be identified for both stop (Johnson, Reidy, & Edwards, 2018) and fricative place of articulation (McMurray & Jongman, 2011).

      A new approach to the question of invariance provides perhaps the strongest support for the notion that listeners extract global invariant acoustic properties in processing the phonetic categories of speech. Pioneering work from the lab of