Группа авторов

The Handbook of Speech Perception


Скачать книгу

does not yet warrant an endorsement of a hybrid model of perceptual organization. Carrell and Opie had used a range of pulse rates and conditions in their study, and reported that the intelligibility gain attributable to pulsing a sinewave sentence was restricted to a pulse rate in the range of 50–100 Hz. No benefit of pulsing was observed for a pulse rate of 200 Hz. While this topic merits additional examination, the available evidence encourages a doubtful conclusion about this hypothetical hybrid character of perceptual organization, which would necessarily be limited in applicability to speech signals produced by low bass voices; its benefit would not extend to tenors, to say nothing of altos and sopranos. Most generously, we might conclude that the relation of primitive gestalt‐based generic auditory grouping and the more abstract organization by sensitivity to coordinate variation cannot be defined without stronger evidence, and that it is premature to conclude that the gestalt set plays a prominent or even a secondary role in the perceptual organization of speech.

       The nature of speech cues

      The evolving portrait of speech perception that includes organization and analysis recasts the raw cue as the property of perception that gives speech its phenomenality, though not its phonetic effect. The transformation of natural speech to chimera, to noise‐band vocoded signal, and to sinewave replica is phonetically conservative, preserving the fine details of subphonemic variation while varying to the extremes of timbre or auditory quality. It is apparent that the competent listener derives phonetic impressions from the properties that these different kinds of signal share, and derives qualitative impressions from their unique attributes. The shared attribute, for want of a more precise description, is a complex modulation of spectrum envelopes, although the basis for the similar effect of the infinitely sharp peaks of sinewave speech and the far coarser spectra of chimerical and noise‐band vocoded speech has still to be explained. None of these manifests the cues present in natural speech despite the success of listeners in understanding the message. The conclusion supported by these findings is clear: phonetic perception does not require the sensory registration of natural speech cues. Instead, the organizational component of speech perception operates on a spectro‐temporal grain that is requisite both for finding and following a speech signal and for analyzing its linguistic properties. The speech cues that seemed formerly to bear the burden of stimulating phonetic analyzers into action appear in hindsight to provide little more than auditory quality subordinate to the phonetic stream.

       A constraint on normative descriptions of speech perception

      The application of powerful statistical techniques to problems in cognitive psychology has engendered a variety of normative, incidence‐based accounts of perception. Since the 1980s, a technology of parallel computation based loosely on an idealization of the neuron has driven the creation of a proliferation of devices that perform intelligent acts. The exact modeling of neurophysiology is rare in this enterprise, though probabilistic models attired as neural nets enjoy a hopeful if unearned appearance of naturalness that older, algorithmic explanations of cognitive processes unquestionably lack. As a theory of human cognitive function, it is more truthful to say that deep learning implementations characterize the human actor as an office full of clerks at an insurance company, endlessly tallying the incidence of different states in one domain (perhaps age and zip code, or the bitmap of the momentary auditory effect of a noise burst in the spectrum) and associating them (perhaps in a nonlinear projection) with those in another domain (perhaps the risk of major surgery, or the place of articulation of a consonant).

       Multisensory perceptual organization

      Fifty