of naturally occurring human communication and activity patterns. They are far more flexible and expressively powerful than past keyboard-and-mouse interfaces, which are limited to discrete input.
Multimodal interfaces support multimodal input, and they may also include sensor-based controls. In many cases they may also support either multimodal or multimedia output.
Multimodal-multisensor interfaces combine one or more user input modalities with sensor information (e.g., location, acceleration, proximity, tilt). Sensor-based cues may be used to interpret a user’s physical state, health status, mental status, current context, engagement in activities, and many other types of information. Users may engage in intentional actions when deploying sensor controls, such as tilting a screen to change its orientation. Sensors also can serve as “background” controls, to which the interface automatically adapts without any intentional user engagement (e.g., dimming phone screen after lack of use). Sensor input aims to transparently facilitate user-system interaction, and adaptation to users’ needs. The type and number of sensors incorporated into multimodal interfaces has been expanding rapidly, resulting in explosive growth of multimodal-multisensor interfaces. There are numerous types of multimodal-multisensor interface with different characteristics, as will be discussed in this handbook.
Multimodal output involves system output from two or more modalities, such as a visual display combined with auditory or haptic feedback, which is provided as feedback to the user. This output is processed by separate human sensory systems and brain areas.
As we now embark upon designing new multimodal-multisensor digital tools with multiple input and output components, we can likewise expect a major transition in the functionality and performance of computer interfaces. With experience, users will learn the advantages of different types of input available on multimodal-multisensor interfaces, which will enable them to develop better control over their own performance. For professionals designing new multimodal-multisensor interfaces, it is sobering to realize the profound impact that our work potentially could have on further specialization of the human brain and cognitive abilities [Oviatt 2013].
More Expressively Powerful Tools Are Capable of Stimulating Cognition
This paradigm shift reflects the evolution of more expressively powerful computer input, which can substantially improve support for human cognition and performance. Recent findings have revealed that more expressively powerful interfaces (e.g., digital pen, multimodal) can stimulate cognition beyond the level supported by either keyboard interfaces or analogous non-digital tools. For example, the same students working on the same science problems generate substantially more domain appropriate ideas, solve more problems correctly, and engage in more accurate inferential reasoning when using a digital pen, compared with keyboard input. In different studies, the magnitude of improvement has ranged from approximately 10–40% [Oviatt 2013]. These results have generalized widely across different user populations (e.g., ages, ability levels), content domains (e.g., science, math, everyday reasoning), types of thinking and reasoning (problem solving, inference, idea generation), computer hardware, and evaluation metrics. From a communications perspective, results have demonstrated that more expressively powerful input modalities, and multimodal combinations of them, can directly facilitate our ability to think clearly and perform well on tasks.
Multimodal interfaces also support improved cognition and performance because they enable users to self-manage and minimize their own cognitive load. Working memory load is reduced when people express themselves using multimodal input, for example by combining speech and writing, because their average utterance length is reduced by conveying spatial information with pointing and gesturing. When speaking and writing together, people avoid speaking location descriptions, because they are error prone and increase mental load. Instead, they use written input (e.g., pointing, encircling) to indicate such content [Oviatt and Cohen 2015]. In Chapter 13, experts discuss neuroscience and human-computer interaction findings on how and why new multimodal-multisensor interfaces can more effectively stimulate human cognition and learning than previous computer interfaces.
One Example of How Multimodal-Multisensor Interfaces Are Changing Today
One of the most rapidly changing areas in multimodal-multisensor interface design today is the incorporation of a wide variety of new sensors. This is part of the long-term trend toward expanding the number and type of information sources available in multimodal-multisensor interfaces, which has been especially noteworthy on current smartphones. These changes have been coupled with experimentation on how to use different sensors for potentially valuable functionality, and also how to design a whole multimodal-multisensor interface in a synergistic and effective manner. Designers are beginning to grasp the many versatile ways that sensors and input modalities can be coupled within an interface—including that either can be used intentionally in the “foreground,” or they can serve in the “background” for transparent adaptation that minimizes interface complexity and users’ cognitive load (see Chapter 4). The separate research communities that historically have focused on multimodal versus ubiquitous sensor-based interfaces have begun to engage in collaborative cross talk. One outcome will be improved training of future students, who will be able to design better integrated multimodal-multisensor interfaces.
As will be discussed in this volume, one goal of multimodal-multisensor interfaces is to facilitate user-system interaction that is more human-centered, adaptive, and transparent. Sensor-based information sources are beginning to interpret a user’s physical state (e.g., walking), health status (e.g., heart-rate), emotional status (e.g., frustrated, happy), cognitive status (e.g., cognitive load, expertise), current context (e.g., driving car), engagement in activities (e.g., picking up cell phone), and many other types of information. As these capabilities improve in reliability, systems will begin to adapt by supporting users’ goal-oriented behavior. One critical role for sensor input on mobile devices is to transparently preserve users’ focus of attention on important primary tasks, such as driving, by minimizing distraction and cognitive load. However, mechanical sensors are not unique avenues for accomplishing these advances. Paralinguistic information from input modalities like speech and writing (e.g., volume, rate, pitch, pausing) are becoming increasingly reliable at predicting many aspects of users’ mental status, as will be detailed in other handbook chapters [Burzo et al. 2017, Cohn et al. 2017, Oviatt et al. 2017, Zhou et al. 2017].
Insights in the Chapters Ahead
This handbook presents chapters that summarize basic research and development of multimodal-multisensor systems, including their status today and rapidly growing future directions. This initial volume introduces relevant theory and neuroscience foundations, approaches to design and user modeling, and an in-depth look at some common modality combinations. The second volume [Oviatt et al. 2017a] summarizes multimodal-multisensor system signal processing, architectures, and the emerging use of these systems for detecting emotional and cognitive states. The third volume [Oviatt et al. 2017b] presents multimodal language and dialogue processing, software tools and platforms, commercialization of applications, and emerging technology trends and societal implications. Collectively, these handbook chapters address a comprehensive range of central issues in this rapidly changing field. In addition, each volume includes selected challenge topics, in which an international panel of experts exchanges their views on some especially consequential, timely, and controversial problem in the field that is in need of insightful resolution. We hope these challenge topics will stimulate talented students to tackle these important societal problems, and motivate the rest of us to envision and plan for our technology future.
Information presented in the handbook is intended to provide a comprehensive state-of-the-art resource for professionals, business strategists, and technology funders, interested lay readers, and training of advanced undergraduate and graduate students in this multidisciplinary computational field. To enhance its pedagogical value to readers, many chapters include valuable digital resources such as pointers to open-source tools, databases, video demonstrations, and case study walkthroughs to assist in designing,