Sharon Oviatt

The Handbook of Multimodal-Multisensor Interfaces, Volume 1


Скачать книгу

AND NEUROSCIENCE FOUNDATIONS

      1

       Theoretical Foundations of Multimodal Interfaces and Systems

       Sharon Oviatt

      This chapter discusses the theoretical foundations of multisensory perception and multimodal communication. It provides a basis for understanding the performance advantages of multimodal interfaces, as well as how to design them to reap these advantages. Historically, the major theories that have influenced contemporary views of multimodal interaction and interface design include Gestalt theory, Working Memory theory, and Activity theory. They include perception-action dynamic theories and also limited resource theories that focus on constraints involving attention and short-term memory. This chapter emphasizes these theories in part because they are supported heavily by neuroscience findings. Their predictions also have been corroborated by studies on multimodal human-computer interaction. In addition to summarizing these three main theories and their impact, several related theoretical frameworks will be described that have influenced multimodal interface design, including Multiple Resource theory, Cognitive Load theory, Embodied Cognition, Communication Accommodation theory, and Affordance theory.

      The large and multidisciplinary body of research on multisensory perception, production, and multimodal interaction confirms many Gestalt, Working Memory, and Activity theory predictions that will be discussed in this chapter. These theories provide conceptual anchors. They create a path for understanding how to design more powerful systems, so we can gain better control over our own future. In spite of this, it is surprising how many systems are developed from an engineering perspective that is sophisticated, yet in a complete theoretical vacuum that Leonardo da Vinci would have ridiculed:

      Those who fall in love with practice without science are like a sailor who enters a ship without helm or compass, and who never can be certain whither he is going. Richter and Wells [2008]

      This chapter aims to provide a better basis for motivating and accelerating future multimodal system design, and the quality of its impact on human users.

      For a definition of highlighted terms in this chapter, see the Glossary. For other related terms and concepts, also see the textbook on multimodal interfaces by [Oviatt and Cohen 2015]. Focus Questions to aid comprehension are available at the end of this chapter.

      In cognitive neuroscience and experimental psychology, a rapidly growing literature during the past three decades has revealed that brain processing fundamentally involves multisensory perception and integration [Calvert et al. 2004, Stein 2012], which cannot be accounted for by studying the senses in isolation. Multisensory perception and communication are supported by multimodal neurons and multisensory convergence regions, which are a basic design feature of the human brain. As outlined in Section 1.2, it now is understood that multisensory integration of information exerts extensive control over human perception, attention, language, memory, learning, and other behaviors [Calvert et al. 2004, Schroeder and Foxe 2004, Stein and Meredith 1993]. This relatively recent shift from a unimodal to multisensory view of human perception reflects movement away from reductionism toward a perspective compatible with Gestalt theory, which originated in the late 1800s and early 1900s [for intellectual history, see Smith 1988]. Gestalt theory presents a holistic systems-level view of perception, which emphasizes self-organization of perceptual experience into meaningful wholes, rather than analyzing discrete elements as isolates. It asserts the principle of totality, or that the whole is a qualitatively different entity than the sum of its parts. A second overarching belief of Gestalt theory is the principle of psychophysical isomorphism, which states that conscious perceptual experience corresponds with underlying neural activity. These Gestalt views substantially predate Stein and Meredith [1993] pioneering research on multisensory integration and the neurophysiology of the superior colliculus.

      In terms of the principle of totality, a central tenant of Gestalt theory is that when elements (e.g., lines) are combined into a whole percept (e.g., human figure), emergent properties arise that transform a perceptual experience qualitatively. Multisensory processing research has demonstrated and investigated many unexpected perceptual phenomena. Reports abound of perceptual “illusions” once thought to represent exceptions to unimodal perceptual laws. A classic example is the case of Wertheimer’s demonstration in 1912 that two lines flashed successively at optimal intervals appear to move together, an illusion related to human perception of motion pictures [Koffka 1935]. In these cases, it is the whole percept that is apprehended first, not the elements composing it. That is, the whole is considered to have experiential primacy. These integrated percepts typically do not involve equal weighting of individual stimuli or simple additive functions [Calvert et al. 2004]. Although Gestalt theory’s main contributions historically involved perception of visual-spatial phenomena, as in the classic Wertheimer example, its laws also have been applied to the perception of acoustic, haptic, and other sensory input [Bregman 1990]. They likewise have been applied to the production of multimodal communications, and to human-computer interface design [Oviatt et al. 2003], as will be described further in this chapter.

      Gestalt theory describes different laws or principles for perceptual grouping of information into a coherent whole, including the laws of proximity, symmetry, area, similarity, closure, continuity, common fate, and others [Koffka 1935, Kohler 1929, Wertheimer 1938]. With respect to perceptual processing, Gestalt theory claims that the elements of a percept first are grouped rapidly according to its main principles. In addition, more than one principle can operate at the same time. Gestalt laws, and the principles they describe, maintain that we organize experience in a way that is rapid, economical, symmetrical, continuous, and orderly. This is viewed as economizing mental resources, which permits a person’s focal attention to be allocated to a primary task.

      The Gestalt law of proximity states that spatial or temporal proximity causes unisensory elements to be perceived as related. This principle has become the backbone for explaining current multisensory integration of whole percepts, which is formulated as two related rules. The spatial rule states that the likelihood and strength of multisensory integration depends on how closely located two unisensory stimuli are to one another [Stein and Meredith 1993]. In parallel, the temporal rule claims that the likelihood and strength of multisensory integration depends on the degree of close timing of two unisensory stimuli, which must occur within a certain window of time [Stein and Meredith 1993]. Multisensory integration research has confirmed and elaborated the role of these two principles at both the behavioral and neuroscience level. For example, it is now known that there is a relatively wide temporal window for perceiving simultaneity between signals during audiovisual perception [Dixon and Spitz 1980, Spence and Squire 2003].1 A wide temporal window also has been demonstrated at the cellular level for simple stimuli in the superior colliculus [King and Palmer 1985, Meredith et al. 1987].

       Glossary

      Affordances are perceptually based expectations about actions that can be performed on objects in the world, which derive from people’s beliefs about their properties. Affordances invite and constrain people to interact with objects, including computer interfaces, in specific ways. They establish behavioral attunements that transparently prime people’s use of objects, including their physical and communicative actions, and they can lead to exploratory learning. Affordances can be analyzed at the biological, physical, perceptual, and symbolic/cognitive level, and they are influenced by cultural conventions. The way people use different computer input tools is influenced heavily by their affordances, which in turn has a substantial impact on human cognition and performance [Oviatt et al. 2012].

      Disequilibrium refers to the Gestalt concept that people are driven to create a balanced, stable, and meaningful whole perceptual form. When this goal is not achieved or it is disrupted, then a state of tension or disequilibrium arises that alters human behavior. For example, if a user encounters a high rate of system errors during interaction, this will create a state of disequilibrium. Furthermore, human behavioral