activity detection) that can benefit from co-processing of the visual modality. Examples of different systems are illustrated, which showcase superior multimodal performance over audio-only speech systems. The authors note that recent advances in deep learning, and the availability of multimodal corpora and open-source tools, now have the potential to advance “in the wild” field applications that previously were viewed as extremely challenging.
Expert Exchange on Multidisciplinary Challenge Topic
Chapter 13 presents a multidisciplinary challenge topic, with a discussion among experts that focuses on how humans learn, and what the implications are for designing more effective educational technologies. Expert discussants include Karin James (cognitive neuroscience basis of learning), Dan Schwartz (learning sciences and educational technologies), Katie Cheng (learning sciences and educational technologies), James Lester (HCI, AI, and adaptive learning technology), and Sharon Oviatt (multimodal-multisensor interfaces, educational technologies). The discussants identify promising new techniques and future research that is needed to advance work on this topic. This exchange reveals what experts in the field believe are the main problems, what is needed to solve them, and what steps they envision for pursuing technology research and development in the near future.
Based on Chapter 2, which summarizes how complex multimodal action patterns stimulate multisensory comprehension of related content, James begins by highlighting that multimodal technologies could facilitate learning of more complex content. She then qualifies this by saying that, to be effective, these systems should not violate people’s previously learned action-perception contingencies. From an interface design viewpoint, Oviatt emphasizes that multimodal-multisensor input capabilities need to support rich content creation in order to help students master complex domain content. Both discussants summarize that a body of research (i.e., behavioral and brain science studies) has confirmed that optimal learning cannot be achieved with keyboard-based tools alone. Students need to expend effort producing complex action patterns, such as manipulating objects or writing symbols with a pen.
Lester and Oviatt address the issue of what role automation could play in designing adaptive multimodal-multisensor educational technologies to support maximum learning, rather than undermining it. They point out that multimodal technologies should not be designed to minimize students’ effort expenditure. Rather, Lester envisioned adaptive multimodal interfaces that introduce and exercise new sub-skills, followed by an incremental decrease in the level of automation by fading multimodal scaffolding as students learn. Oviatt mentioned that adaptive multimodal-multisensor interfaces could be designed to better focus students’ attention by minimizing the many extraneous interface features that distract them (e.g., formatting tools). She added that emerging computational methods for predicting students’ mental state (e.g., cognitive load, domain expertise) could be used in future multimodal-multisensor systems to tailor what an individual learns and how they learn it [Oviatt et al. 2017, Zhou et al. 2017]. Given these views, automation in future educational technologies would conduct temporary adaptations that support students’ current activities and mental state in order to facilitate learning sub-goals, preserve limited working memory resources, and similar objectives that facilitate integrating new information.
In a final section of Chapter 13, all of the participants discuss emerging computational, neuroimaging, and modeling techniques that are just now becoming available to examine multimodal interaction patterns during learning, including predicting students’ motivational and cognitive state. In future research, James and Oviatt encourage people to research how the process of learning unfolds across levels, for example by examining the correspondence between behavioral data (e.g., fine-grained manual actions during writing) and brain activation patterns. Schwartz and Cheng distinguish students’ perceptual-motor learning from their ability to provide explanations. They advocate using new techniques to conduct research on students’ ability to learn at the meta-cognitive level, and also to probe how students learn multimodally. With recent advances, Lester suggested that we may well be on the verge of a golden era of technology-rich learning, in which multimodal-multisensor technologies play an invaluable role in both facilitating and assessing learning.
References
M. Burzo, M. Abouelenien, V. Perez-Rosas, and R. Mihalcea. 2017. Multimodal deception detection. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors. The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition. Morgan & Claypool Publishers, San Rafael, CA. 6
J. Cohn, N. Cummins, J. Epps, R. Goecke, J. Joshi, and S. Scherer. 2017. Multimodal assessment of depression and related disorders based on behavioural signals. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors. The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition. Morgan & Claypool Publishers, San Rafael, CA. 6
M. Commons and P. Miller. 2002. A complete theory of human evolution of intelligence must consider stage changes. Behavioral and Brain Science, 25: 404–405. DOI: 10.1017/S0140525X02240078. 2
H. Epstein. 2002. Evolution of the reasoning hominid brain. Behavioral and Brain Science, 25: 408–409. DOI: 10.1017/S0140525X02270077. 2
B. Evans. 2014. Mobile is eating the world, Tech Summit. October 28, 2014; http://a16z.com/2014/10/28/mobile-is-eating-the-world/ (retrieved January 7, 2015). 1
R. Masters and J. Maxwell. 2002. Was early man caught knapping during the cognitive (r)evolution? Behavioral and Brain Science, 25: 413. DOI: 10.1017/S0140525 X02320077. 2
S. Oviatt. 2013. The Design of Future of Educational Interfaces. Routledge Press. DOI: 10.4324/9780203366202. 4
S. Oviatt and P. R. Cohen. 2000. Multimodal systems that process what comes naturally, Communications of the ACM, 43(3): 45–53. DOI: 10.1145/330534.330538. 2
S. Oviatt and P. R. Cohen. 2015. The Paradigm Shift to Multimodality in Contemporary Computer Interfaces. Human-Centered Interfaces Synthesis series (ed. Jack Carroll). Morgan & Claypool Publishers, San Rafael, CA. DOI: 10.2200/S00636ED1V01Y201503HCI030. 1, 3, 5, 622
S. Oviatt, J. Grafsgaard, L. Chen, and X. Ochoa. 2017. Multimodal learning analytics: Assessing learners’ mental state during the process of learning. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors. The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition. Morgan & Claypool Publishers. San Rafael, CA. 6, 13
S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger. editors. 2017a. The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition. Morgan & Claypool Publishers, San Rafael, CA. 6, 12
S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger. editors. 2017b. The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Multimodal Language Processing, Software, Tools, Commercial Applications, and Emerging Directions. Morgan & Claypool Publishers, San Rafael, CA. 6
T. Wynn. 2002. Archaeology and cognitive evolution. Behavioral and Brain Science, 25: 389–438. DOI: 10.1017/S0140525X02000079. 2
J. Zhou, K. Yu, F. Chen, Y. Wang, and S. Arshad. 2017. Multimodal behavioral and physiological signals as indicators of cognitive load. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors. The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition. Morgan Claypool Publishers, San Rafael, CA. 6, 13
PART I