Sharon Oviatt

The Handbook of Multimodal-Multisensor Interfaces, Volume 1


Скачать книгу

systems. Each handbook chapter defines the basic technical terms required to understand its topic. Educational resources, such as focus questions, are included to support readers in mastering these newly presented materials.

       Theoretical and Neuroscience Foundations

      The initial chapters in this volume address foundational issues in multimodal-multisensor interface design, including theoretical and neuroscience foundations, user modeling, and the design of interfaces involving rich input modalities and sensors. In Chapter 1, Oviatt discusses the theoretical foundation of multisensory perception and multimodal communication, which provides a basis for understanding the performance advantages of multimodal interfaces and how to design them to reap these advantages. This chapter describes the major theories that have influenced contemporary views of multimodal interaction and interface design, including Gestalt theory, Working Memory theory, and Activity theory—which subsume perception-action dynamic theories, and also limited-resource theories with a focus on attention and short-term memory constraints. Other theoretical perspectives covered in this chapter that have influenced multimodal interface design include Multiple Resource theory, Cognitive Load theory, Embodied Cognition, Communication Accommodation theory, and Affordance theory. These theories are emphasized in part because they are supported heavily by neuroscience findings.

      In Chapter 2, James et al. discuss the human brain as an inherently multimodal-multisensory dynamic learning system. Although each sensory modality processes different signals from the environment in qualitatively different ways (e.g., sound waves, light waves, pressure, etc.), these signals ultimately are transduced into a common language and unified percept in the brain. From an Embodied Cognition viewpoint, humans also act on the world multimodally through hand movements, locomotion, speech, gestures, etc., and these physical actions directly shape the multisensory input we perceive. Given recent findings in neuroscience, this chapter discusses the multisensory-multimodal brain structures (e.g., multisensory neurons, multisensory-multimodal brain circuits) and processes (e.g., convergence, integration, multisensory enhancement, and depression) that produce human learning, and how multimodal learning affects brain plasticity in adults and children. Findings on this topic have direct implications for understanding how multimodal-multisensor technologies influence us at both the brain and behavioral levels. The final section of this chapter discusses implications for multimodal-multisensor interface design, which is considered further in Chapter 13 in the exchange among experts on the challenge topic “Perspectives on Learning with Multimodal Technology.”

       Approaches to Design and User Modeling

      In Chapter 3, MacLean et al. discuss multisensory haptic interfaces broadly as anything a user touches or is touched by to control, experience, or receive information from a computational device—including a wide range of both energetically passive (e.g. touchscreen input) and energetically active (e.g., vibrotactile feedback) interface techniques. This chapter delves into both conceptual and pragmatic issues required for designing optimally enriched haptic experiences—especially ones involving energetically active haptic interfaces, for which technological advances from materials to robotics have opened up many new frontiers. In approaching this topic, MacLean and colleagues describe human’s distributed and multi-parameter range of haptic sensory capabilities (e.g. temperature, texture, forces), and the individual differences associated with designing for them. They walk through scenarios illustrating multimodal interaction goals, and explain the many roles the haptic component can play. For example, this may involve notifying and then guiding a mobile user about an upcoming turn using haptic information that complements visual. This chapter includes hands-on information about design techniques and software tools that will be valuable for students and professionals in the field.

      From the perspective of multimodal output, in Chapter 7 Freeman et al. discuss a wide range of alternatives to visual displays, including: (1) haptics, vibrotactile, thermal, force, and deformable feedback; (2) non-speech auditory icons, earcons, musicons, sonification, and spatial audio output; and (3) combined multimodal feedback—which they argue is indispensable in mobile contexts, including in-vehicle ones, and for users with sensory limitations. Their chapter describes the relevant mechanical devices, and research findings on how these different forms of feedback and their combinations affect users’ performance during tasks. In many cases, the non-visual feedback is providing background interface information to conserve users’ cognitive load—for example, during a mobile navigation task. In other cases, combined multimodal feedback is supporting more rapid learning, more accurate navigation, more efficient information extraction, and more effective warning systems—for example, during hand-over of control to the driver in an autonomous car system. This chapter is richly illustrated with digital demonstrations so readers can concretely experience the haptic and non-speech audio examples that are discussed.

      In the related Chapter 4, Hinckley considers different design perspectives on how we can combine modalities and sensors synergistically within an interface to achieve innovative results. The dominant theme is using individual modalities and sensors flexibly in foreground vs. background interaction roles. Hinckley provides detailed illustrations of the many ways touch can be combined with sensor input about proximity, orientation, acceleration, pressure, grip, etc., during common action patterns—for example, lifting a cell phone to your ear. Such background information can provide context for predicting and facilitating next steps during an interaction, like automatically activating your phone before making a call. This context can guide system processing without necessarily requiring the user’s attention, which is especially valuable in mobile situations. It also can be used to reduce technological complexity. However, a major challenge is to achieve reliable activity recognition without false positives or false alarms that cause unintended system activation (i.e., Midas Touch problem)—such as inadvertently activating URLs when you scroll through news stories on your cell phone. In the final sections of this chapter, Hinckley discusses bimanual manipulation that coordinates touch with pen input, and also leverages a sensor-augmented stylus/tablet combination with inertial, grip, and other sensors for refined palm rejection. This type of pen and touch multimodal-multisensor interface is now commercially available on numerous pen-centric systems.

      In Chapter 5, Jameson and Kristensson discuss what it means to give users freedom to use different input modalities in a flexible way. They argue that modality choice will vary as a function of user characteristics (e.g., abilities, preferences), the task (e.g., main multimodal task, any competing tasks), multimodal system characteristics (e.g., recognition accuracy, ease of learning and use), the context of use (e.g., social and physical environment), and the consequences of selecting certain modalities for interaction (e.g., expressive adequacy, errors, speed, potential interference with other modalities, potential for repetitive stress syndrome (RSI), social acceptability and privacy). Their chapter considers what is known from the existing literature and psychological models about modality choice, including the degree to which users’ multimodal interaction actually represents a conscious choice versus automatic behavior. The aim of the chapter is to understand the limitations of existing research on modality choice, and to develop system design strategies that can guide users in selecting input modalities more optimally. In support of the second objective, Jameson and Kristensson introduce two related models, ASPECT and ARCADE, which provide conceptual tools that summarize (1) common reasons for users’ choice patterns, and (2) different concrete strategies for promoting better modality choices (e.g., based on trial-and-error, consequences, social feedback). In summary, this chapter raises unsolved issues at the very heart of how to design better integrated multimodal-multisensor interfaces.

      In Chapter 6, Kopp and Bergmann adopt a simulation-based cognitive modeling approach, with a focus on how users’ working memory load influences their speech, gesturing, and