Группа авторов

The Handbook of Speech Perception


Скачать книгу

of speech presented in noise in which listeners could also see the talkers whose words they aimed to recognize. The point of the study was to calibrate the level at which the speech signal would become so faint in the noise that to sustain adequate performance attention would switch from an inaudible acoustic signal to the visible face of the talker. In fact, the visual channel contributed to intelligibility at all levels of performance, indicating that the perception of speech is ineluctably multisensory. But how does the perceiver determine the audible and visible composition of a speech stream? This problem (reviewed by Rosenblum & Dorsi, Chapter 2) is a general form of the listener’s specific problem of perceptual organization, understood as a function that follows the speechlike coordinate variation of a sensory sample of an utterance. To assign auditory effects to the proper source, the perceptual organization of speech must capture the complex sound pattern of a phonologically governed vocal source, sensing the spectro‐temporal variation that transcends the simple similarities on which the gestalt‐derived principles rest. It is obvious that gestalt principles couched in auditory dimensions would fail to merge auditory attributes with visual attributes. Because auditory and visual dimensions are simply incommensurate, it is not obvious that any notion of similarity would hold the key to audiovisual combination. The properties that the two senses share – localization in azimuth and range, and temporal pattern – are violated freely without harming audiovisual combination, and therefore cannot be requisite for multisensory perceptual organization.

      Perceptual organization is the critical function by which a listener resolves the sensory samples into streams specific to worldly objects and events. In the perceptual organization of speech, the auditory correlates of speech are resolved into a coherent stream that is fit to be analyzed for its linguistic and indexical properties. Although many contemporary accounts of speech perception are silent about perceptual organization, it is unlikely that the generic auditory functions of perceptual grouping provide adequate means to find and follow the complex properties of speech. It is possible to propose a rough outline of an adequate account of the perceptual organization of speech by drawing on relevant findings from different research projects spanning a variety of aims. The evidence from these projects suggests that the critical organizational functions that operate for speech are that it is fast, unlearned, nonsymbolic, keyed to complex patterns of coordinate sensory variation, indifferent to sensory quality, and requiring attention whether elicited or exerted. Research on other sources of complex natural sound has the potential to reveal whether these functions are unique to speech or are drawn from a common stock of resources of unimodal and multimodal perceptual organization.

      In conducting some of the research described here and in writing this chapter, the author is grateful for the sympathetic understanding of Samantha Caballero, Mariah Marrero, Lyndsey Reed, Hannah Seibold, Gabriella Swartz, Philip Rubin, and Michael Studdert‐Kennedy. This work was supported by a grant from the National Science Foundation (SBE 1827361).

      1 Barker, J., & Cooke, M. (1999). Is the sine‐wave cocktail party worth attending? Speech Communication, 27, 159–174.

      2 Bertelson, P., Vroomen, J., & de Gelder, B. (1997). Auditory–visual interaction in voice localization and in bimodal speech recognition: The effects of desynchronization. In C. Benoît & R. Campbell (Eds), Proceedings of the Workshop on Audio‐Visual Speech Processing: Cognitive and computational approaches (pp. 97–100). Rhodes, Greece: ESCA.

      3 Billig, A. J., Davis, M. H., & Carlyon, R. P. (2018). Neural decoding of bistable sounds reveals an effect of intention on perceptual organization. Journal of Neuroscience, 38, 2844–2853.

      4 Bregman, A. S. (1990). Auditory scene analysis. Cambridge, MA: MIT Press.

      5 Bregman, A. S., Abramson, J., Doehring, P., & Darwin, C. J. (1985). Spectral integration based on common amplitude modulation. Perception & Psychophysics, 37, 483–493.

      6 Bregman, A. S., Ahad, P. A., & Van Loon, C. (2001). Stream segregation of narrow‐band noise bursts. Perception & Psychophysics, 63, 790–797.

      7 Bregman, A. S., & Campbell, J. (1971). Primary auditory stream segregation and perception of order in rapid sequence of tones. Journal of Experimental Psychology, 89, 244–249.

      8 Bregman, A. S., & Dannenbring, G. L. (1973). The effect of continuity on auditory stream segregation. Perception & Psychophysics, 13, 308–312.

      9 Bregman, A. S., & Dannenbring, G. L. (1977). Auditory continuity and amplitude edges. Canadian Journal of Psychology, 31, 151–158.

      10 Bregman, A. S., & Doehring, P. (1984). Fusion of simultaneous tonal glides: The role of parallelness and simple frequency relations. Perception & Psychophysics, 36, 251–256.

      11 Bregman, A. S., Levitan, R., & Liao, C. (1990). Fusion of auditory components: effects of the frequency of amplitude modulation. Perception & Psychophysics, 47, 68–73.

      12 Bregman, A. S., & Pinker, S. (1978). Auditory streaming and the building of timbre. Canadian Journal of Psychology, 32, 19–31.

      13 Broadbent, D. E., & Ladefoged, P. (1957). On the fusion of sounds reaching different sense organs. Journal of the Acoustical Society of America, 29, 708–710.

      14 Carlyon, R. P., Cusack, R., Foxton, J. M., & Robertson, I. H. (2001). Effects of attention and unilateral neglect on auditory stream segregation. Journal of Experimental Psychology: Human Perception and Performance, 27, 115–127.

      15 Carlyon, R. P., Plack, C. J., Fantini, D. A., & Cusack, R. (2003). Crossmodal and non‐sensory influences on auditory streaming. Perception, 32, 1393–1402.

      16 Carrell, T. D., & Opie, J. M. (1992). The effect of amplitude comodulation on auditory object formation in sentence perception. Perception & Psychophysics, 52, 437–445.

      17 Cherry, E. (1953). Some experiments on the recognition of speech, with one and two ears. Journal of the Acoustical Society of America, 25, 975–979.

      18  Conrey, B. L., & Pisoni, D. B. (2003). Audiovisual asynchrony detection for speech and nonspeech signals. In J.‐L. Schwartz, F. Berthommier, M.‐A. Cathiard, & D. Sodoyer (Eds), Proceedings of AVSP 2003: International Conference on Audio‐Visual Speech Processing. St. Jorioz, France September 4–7, 2003 (pp. 25–30). Retrieved September 24, 2020, from https://www.isca‐speech.org/archive_open/avsp03/av03_025.html.

      19 Cooke,