affordances are well matched with a task domain, they can increase human activity patterns that stimulate exploratory learning, cognition, and overall performance.
Motivated by both Affordance theory and Activity theory, research on humancomputer interaction has shown that more expressively powerful interfaces can substantially stimulate human communicative activity and corresponding cognition. An expressively powerful computer interface is one that can convey information involving multiple modalities, representations, or linguistic codes [Oviatt 2013]. Recent research has shown that different input capabilities, such as a keyboard vs. digital pen, have affordances that prime qualitatively different types of communicative content. In one study, students expressed 44% more nonlinguistic representational content (e.g., numbers, symbols, diagrams) when using a pen interface. In contrast, when the same students worked on the same type of problems with keyboard input, they switched to expressing 36% more linguistic content (e.g., words, abbreviations) [Oviatt et al. 2012].
These differences in communication pattern corresponded with striking changes in students’ cognition. In particular, when students used a pen interface and wrote more nonlinguistic content, they also generated 36% more appropriate biology hypotheses. A regression analysis revealed that knowledge of individual students’ level of nonlinguistic fluency accounted for a substantial 72% of all the variance in their ability to produce appropriate science ideas (see Figure 1.4, left; Oviatt et al. 2012). However, when the same students used the keyboard interface and communicated more linguistic content, a regression now indicated a substantial decline in science ideation (see Figure 1.4, right). In this case, knowledge of students’ level of linguistic communication had a negative predictive relation with their ability to produce appropriate science ideas. That is, it accounted for 62% of the variation in students’ inability to produce biology hypotheses.
Figure 1.4 Regression analysis showing positive relation between nonlinguistic communicative fluency and ideational fluency (left). Regression showing negative relation between linguistic communicative fluency and ideational fluency (right). (From Oviatt et al. [2012])
From an Activity theory perspective, neuroscience, behavioral, and humancomputer interface research all consistently confirm that engaging in more complex and multisensory-multimodal physical actions, such as writing letter shapes, can stimulate human cognition more effectively than passive viewing, naming, or tapping on a keyboard. Keyboard interfaces never were designed to be a thinking tool. They constrict the representations, modalities, and linguistic codes that can be communicated when using computers, and therefore fail to provide comparable expressive power [Oviatt 2013].
In addition, research related to Activity theory has highlighted the importance of communication as a type of activity that directly stimulates and shapes human cognition. This cognitive facilitation has been demonstrated in a variety of communication modalities. In summary, multimodal interface design is a fertile direction for supporting computer applications involving extended thinking and reasoning.
All of the theories presented in this chapter have limitations in scope, but collectively they provide converging perspectives on multisensory perception, multimodal communication, and the design of multimodal interfaces that effectively blend information sources. The focus of this chapter has been to summarize the strengths of each theory, and to describe how they have been applied to date in the design of multimodal interfaces. In this regard, the present chapter is by no means exhaustive. Rather, it highlights examples of how theory has influenced past multimodal interface design, often in rudimentary ways. In the future, new and more refined theories will be needed that can predict and coherently explain multimodal research findings, and shed light on how to design truly well-integrated multimodal interfaces and systems.
Focus Questions
1.1. Describe the two main types of theory that have provided a basis for understanding multisensory perception and multimodal communication.
1.2. What neuroscience findings currently support Gestalt theory, Working Memory theory, Activity theory, Embodied Cognition theory, and Communication Accommodation theory?
1.3. What human-computer interaction research findings support these theories?
1.4. What Gestalt laws have become especially important in recent research on multisensory integration? And how has the field of multisensory perception substantially expanded our understanding of multisensory fusion beyond these initial Gestalt concepts?
1.5. What Working Memory theory concept has been central to understanding the performance advantages of multimodal interfaces, as well as how to design them?
1.6. Activity theory and related research asserts that communicative activity in all modalities mediates thought, and plays a direct role in guiding and improving human performance. What is the evidence for this in human-computer interaction studies? And what are the key implications for designing multimodal-multisensor interfaces?
1.7. How is the action-perception loop, described by Embodied Cognition theory, relevant to multisensory perception and multimodal actions? Give one or more specific examples.
1.8. How do multisensory and multimodal activity patterns influence the brain and its neurological substrates, compared with unimodal activity? What are the implications for multimodal-multisensor interface design?
1.9. What is the principle of complementarity, and how does it relate to designing “well-integrated” multimodal-multisensor systems? What are the various ways that modality complementarity can be defined, as well as measured?
1.10. The field’s understanding of how to design well-integrated multimodal interfaces remains rudimentary, in particular focused on what modalities to include in a system. What other more fine-grained questions should be asked regarding how to design a well-integrated system? And how could future human-computer interaction research and theory be organized to refine our understanding of this topic?
References
T. Anastasio and P. Patton. 2004. Analysis and modeling of multisensory enhancement in the deep superior colliculus. In G. Calvert, C. Spence, and B. Stein, editors. The Handbook of Multisensory Processing. pp. 265–283. MIT Press, Cambridge, MA. 25
M. Anderson and C. Green. 2001. Suppressing unwanted memories by executive control. Nature, 410:366–369. DOI: 10.1038/35066572. 32
A. Baddeley. 1986. Working Memory. Oxford University Press, New York. 30, 32
A. Baddeley. 2003. Working memory: Looking back and looking forward. Nature Reviews., 4:829–839. DOI: 10.1038/nrn1201. 30
A. D. Baddeley and G.J. Hitch. 1974. Working memory. In G. H. Bower, editor. The Psychology of Learning and Motivation: Advances in Research and Theory, vol. 8, pp. 47–89. Academic, New York. 30
S. L. Beilock, I. M. Lyons, A. Mattarella-Micke, H. C. Nusbaum, and S. L. Small. 2008. Sports experience changes the neural processing of action language. In Proceedings of the National Academy of Sciences, vol. 105, pages, 13269–13273. DOI: 10.1073/pnas.0803424105. 35
L. E. Berk. 1994. Why children talk to themselves. Scientific American, 71(5):78–83. 33
V. Berninger, R. Abbott, A. Augsburger, and N. Garcia. 2009. Comparison of pen and keyboard transcription modes in children with and without learning disabilities. Learning Disability Quarterly 32:123–141. DOI: 10.2307/27740364. 35
L. Bernstein and C. Benoit. 1996. For speech perception by humans or machines, three senses are better than one. In Proceedings of the International Conference on Spoken Language Processing, vol. 3, pages 1477–1480.DOI: 10.1.1.16.5972. 25