Thematic Area C: Crossmodal learning in human-machine interaction
Whereas Area A focuses on the dynamics of crossmodal learning, and Area B focuses on crossmodal prediction and generalization, the projects in Area C will investigate crossmodal learning from the perspective of human-machine interaction, addressing issues that specifically relate to the shared multimodal signals that are perceived by both human and machine, particularly during their interaction. The projects in this thematic area will explore issues such as how crossmodal signals are integrated and learned for speech perception and language understanding (C1, 4–5); how multiple sensory modalities are combined for interpreting signals of social communication, such as hand gestures, facial expressions, and vocal utterances (C3–4); and how motor control (such as eye movements and speech articulation) can provide the information needed to disambiguate rich multisensory information (such as vision and audi- tion) to support a clearer understanding of both spoken and written language (C1–2, 5). What unites these research issues is their potential for improving human-machine interaction: by transferring to machines the knowledge we gain from these projects, we will provide the artificial systems greater common ground upon which interaction with humans can become more natural.
Project C1 Project C1 (Hong, Nolte) will investigate the crossmodal mechanisms underlying human speech perception and recognition. The experiments will collect high-quality, high-resolution data using ECoG (direct monitoring of brain activation). The goal is to understand the crossmodal interactions (the neural oscillations and long-range coupling between temporal auditory regions and centro-frontal sensorimotor regions of the brain) that occur during speech recognition. The work will shed light on the theoretical foundations of crossmodal learning (contributing to II-T) and will help improve artificial speech-understanding systems, contributing to II-M and II-R.
The focus of Project C2 (Xue, Engel) is the crossmodal learning of phonology (the mapping from visual symbols to auditory sounds). The project will use fMRI and EEG to characterize the neural representation of phonology learning, which is a crossmodal learning process involving primary visual, auditory, and multisensory temporoparietal cortex (Giraud & Peoppel, 2012). The project will contribute primarily to II-T.
Project C4 (Weber, Wermter, Liu) is a computational study starting from the assumption that language can only be learned by an embodied agent—one that experiences its multisensory world through action and perception (Tani et al., 2014). The ambition is to build an embodied model that processes audio, visual, and proprioceptive information and learns language grounded in these crossmodal perceptions. The heart of the learning mechanism is MTRNN, a deep neural architecture for modelling temporal dependencies at multiple timescales with a recurrent network designed to reflect the recurrent connectivity of the cortex in a conceptually simpler structure. This automation of experimentation is important for integration initiative II-M.
Project C5 (Li, Menzel, Qu) uses eye tracking to investigate how human adults integrate visual clues from the environment while they simultaneously hear speech that refers to that environment. The goal is to use insights from this investigation not only to improve existing computational models of visually informed language comprehension, but also to increase the effectiveness of crossmodal learning in humans (Crocker et al., 2010). The project will participate in all three integration initiatives, contributing insights on the theory of crossmodal learning to II-T, contributing a model of attention and eye movements to II-M, and contributing mechanisms for control of the robot demonstrator to II-R.
A unique perspective on crossmodal learning in human-machine interaction arises from tele-operated robotics, to be studied by Project C6 (Steinicke, Fang, Chen). This project will explore human crossmodal learning through the development of an interface that allows a human to operate a robot remotely (Tachi et al., 2012); in particular, the human will be able to control a remote robot arm while receiving real-time visual and tactile feedback through a haptic input device that maps control information from the human to the robot, and sensory feedback from the robot to the human. The goal is for the human operator to feel a sense of agency (SoA)—to accept the remote robot as an extension of his/her own body while interacting with objects near the remote arm. C6 will contribute significantly to II-R, where it will be used for guiding the robot’s activities, thus accelerating the learning process.
Thematic Area A,
Thematic Area B,