Sharon Oviatt

The Handbook of Multimodal-Multisensor Interfaces, Volume 1


Скачать книгу

studies have demonstrated that flexible multimodal interfaces are effective partly because they support students’ ability to self-manage their own working memory in a way that reduces cognitive load [Oviatt et al. 2004a]. For example, students prefer to interact unimodally when working on easy problems. However, they will upshift to interacting multimodally on harder problems in order to distribute processing, minimize cognitive load, and improve their performance [Oviatt et al. 2004a].

      One implication of these research findings is that flexible multimodal interfaces are especially well suited for applications like education, which typically involve higher levels of load associated with mastering new content. In fact, all applications that require extended thinking and reasoning potentially could be improved by implementing a flexible and expressively powerful multimodal interface. In addition, Working Memory theory has direct implications for designing well-integrated multimodal interfaces that can combine complementary modalities to process specific types of content with minimal interference effects.

      Activity Theory is a meta-theory with numerous branches. It makes the central claim that activity and consciousness are dynamically interrelated. Vygotskian Activity theory states that physical and communicative activity play a major role in mediating, guiding, and refining mental activities [Luria 1961, Vygotsky 1962, 1978, 1987].

      In Vygotsky’s view, the most powerful tools for semiotic meditation are symbolic representational ones such as language. Vygotsky was especially interested in speech, which he believed serves dual purposes: (1) social communication and (2) self-regulation during physical and mental activities. He described self-regulatory language, also known as “self talk” or “private speech,” as a think-aloud process in which individuals verbalize poorly understood aspects of difficult tasks to assist in guiding their thought [Berk 1994, Duncan and Cheyne 2002, Luria 1961]. In fact, during human-computer interaction the highest rates of self talk occur during more difficult tasks [Xiao et al. 2003]. For example, when using a multimodal interface during a map task, people typically have the most difficulty with relative directional information. They may subvocalize, “East, no, west of …” when thinking about where to place a landmark on a digital map. As a map task increases in difficulty, users’ self talk progressively increases, which has been shown to improve their performance [Xiao et al. 2003].

      Since Vygotsky’s original work on the role of speech in self-regulation, further research has confirmed that activity in all communication modalities mediates thought, and plays a self-regulatory role in improving performance [Luria 1961, Vygotsky 1962, 1987]. As tasks become more difficult, speech, gesture, and writing all increase in frequency, reducing cognitive load and improving performance [Comblain 1994, Goldin-Meadow et al. 2001, Oviatt et al. 2007, Xiao et al. 2003]. For example, manual gesturing reduces cognitive load and improves memory during math tasks, with increased benefit on more difficult tasks [Goldin-Meadow et al. 2001]. When writing, students also diagram more as math problems became harder, which can improve correct solutions by 30–40% [Oviatt 2006, 2007]. In summary, research across modalities is compatible with Vygotsky’s theoretical view that communicative activity mediates thought and improves performance [Luria 1961, Vygotsky 1962].

      Activity theory is well supported by neuroscience results on activity- and experience-dependent neural plasticity. Activity-dependent plasticity adapts the brain according to the frequency of an activity. Activities have a profound impact on human brain structure and processing, including changes in the number and strength of synapses, dendritic branching, myelination, the presence of neurotransmitters, and changes in cell responsivity, which are associated with learning and memory [Markham and Greenough 2004, Sale et al. 2009]. Recent neuroscience data indicate that physical activity can generate change within minutes in neocortical dendritic spine growth, and the extent of dendritic spine remodeling correlates with success of learning [Yang et al. 2009]. Other research has shown that the experience of using a tool can change the properties of multisensory neurons involved in their control [Ishibashi et al. 2004].

      A major theme uncovered by neuroscience research related to Activity theory is the following:

      • Neural adaptations are most responsive to direct physical activity, rather than passive viewing or vicarious experience [Ferchmin and Bennett 1975].

      One major implication of this finding is that the design of computer input tools is particularly consequential for eliciting actions that directly stimulate cognition. This contrasts with the predominant engineering focus on developing system output capabilities.

      In addition, neuroscience findings emphasize the following:

      • Physical activity that involves novel or complex actions is most effective at stimulating synaptogenesis, or neural adaptations compatible with learning and memory. In contrast, familiar and simple actions do not have the same impact [Black et al. 1990, Kleim et al. 1997].

      A further theme revealed by neuroscience research, which focuses specifically on activity theory and multimodality, is:

      • Multisensory and multimodal activity involve more total neural activity across a range of modalities, more intense bursts of neural activity, more widely distributed activity across the brain’s neurological substrates, and longer distance connections.

      Since multimodal interfaces elicit more extensive neural activity across many dimensions, compared with unimodal interfaces, they can have a greater impact on stimulating cognition. In particular, they produce deeper and more elaborated learning, improve long-term memory, and result in higher performance levels during human-computer interaction [Oviatt 2013].

      Embodied Cognition theory, which is related to Activity theory and Situated Cognition theory, asserts that thought is directly shaped by actions in context as part of an action-perception loop [Beilock et al. 2008, Shapiro 2014, Varela et al. 1991]. For example, specific gestures or hand movements during problem solving can facilitate an understanding of proportional equivalence and other mathematical concepts [Goldin-Meadow and Beilock 2010, Howison et al. 2011]. Representations and meaning are created and interpreted within activity, rather than being stored as past knowledge structures. More specifically, representation involves activating neural processes that recreate a related action-perception experiential loop. Two key findings in the embodied cognition literature are that:

      • The action-perception loop is based on multisensory perceptual and multimodal motor neural circuits in the brain [Nakamura et al. 2012].

      • Complex multisensory or multimodal actions, compared with unimodal or simpler actions, can have a substantial and broad facilitatory effect on cognition [James 2010, Kersey and James 2013, Oviatt 2013].

      As an example, writing complex letter shapes creates a long-term sensory-motor memory, which is part of an integrated multisensory-multimodal “reading neural circuit” [Nakamura et al. 2012]. The multisensory experience of writing includes a combination of haptic, auditory, and visual feedback. In both children and adults, actively writing letters has been shown in fMRI studies to increase brain activation to a greater extent than passively viewing, naming, or typing them [James 2010, James and Engelhardt 2012, James 2010, Kersey and James 2013, Longcamp et al. 2005, 2008]. Compared with simple tapping on keys during typing, constructing letter shapes also improves the accuracy of subsequent letter recognition, a prerequisite for successful comprehension and reading. Writing letters basically leads to a more elaborated and durable ability to recognize letter shapes over time. Research by Berninger and colleagues [Berninger et al. 2009, Hayes and Berninger 2010] has further documented that the multisensory-multimodal experience of writing letter shapes, compared with typing, facilitates spelling, written composition, and the content of ideas expressed in a composition. This extensive body of neuroscience and behavioral findings has direct implications for the broad cognitive advantages of pen-based and multimodal interfaces.

      Figure 1.3 Embodied cognition view of the perception-action loop during multisensory integration, which utilizes the Maximum Likelihood Estimation