The long-term goal of our research is to develop a rigorous, theoretical framework describing the neural, cognitive and computational mechanisms of crossmodal learning. This framework will allow us to pursue the following primary sub-goals of the research programme: (1) to enrich our current understanding of the multisensory processes underlying the human mind and brain, (2) to build detailed formal models that describe crossmodal learning in both humans and machines, and (3) to improve the performance of artificial systems by enlisting this enriched understanding and applying these new models to tasks requiring a human-like conception of the world.
This TRR contains fifteen separate scientific research projects plus three administrative projects that work together towards achieving these goals. Each project examines crossmodal learning between at least two of the following modalities: visual, auditory, somatosensory (tactile, haptic, proprioceptive, etc.), and artificial (with information sources such as sonar, range finders, RGBD, brain signals, text, etc.). Each of these but the last is a human sensory modality that has a counterpart in robotics. However, robots have a much greater range of potential sensory mechanisms than humans, including not just traditional artificial sensor systems but any information-carrying method, such as text or other compact representation of knowledge that can serve to enhance other sensory modalities. In fact, human sensory modalities can also be enhanced by artificial mechanisms, such as computer interfaces and brain-computer interfaces.
In the TRR research programme, the projects are divided into three thematic areas (A, B, C); each project will work towards at least one of six objectives (O1–O6); and each project will take part in one or more of three integration initiatives (II-T, II-M, II-R). The thematic areas, the objectives and the integration initiatives are each treated in great detail below; however, briefly stated, the thematic areas organize the projects by their primary focus, which is either (A) learning dynamics, (B) generalization and prediction, or (C) human interaction. To achieve the goals of the TRR, each project proposes to do ground-breaking research on crossmodal learning towards at least one of these six TRR objectives: (1) top-down control, (2) representation, (3) decision-making, (4) communication, (5) action execution, and (6) efficiency and robustness. Finally, every project will be involved in at least one of the three centre- wide integration initiatives: II-T, which focuses on the theory of crossmodal learning; II-M, which focuses on building models of crossmodal learning; and II-R, which focuses on demonstrating the centre’s crossmodal learning results on a robotic platform. We will discuss each of these structural components—objectives, thematic areas, and integration initiatives—in this order, in detail, beginning with the objectives.
The Six Centre Objectives
The TRR will address its long-term goals by directing its research efforts towards the following six objectives:
- O1. Top-down control: Discover how top-down processes can influence learning across multiple modalities simultaneously. Such top-down control may involve feedback mechanisms in which low-level, crossmodal sensory components are affected by higher-level processes, which in turn may be influenced by prior experience, semantics, and mechanisms related to attention and executive function. For example, low-level multimodal conflicts can activate top-down control, which then resolves these conflicts, leading to crossmodal learning and improved performance.
- O2. Crossmodal representations: Identify both the representations that emerge during crossmodal learning and the functions that are optimized through these representations. While it is generally straightforward to represent unimodal data, it is less clear how to develop effective mechanisms for representing and storing crossmodal information. Given that the brain uses distributed neural population codes for its representations, we will investigate how it differentially represents and processes multimodal vs. unimodal stimuli, and we will attempt to characterize the representations that facilitate learning as opposed to those that result from learning.
- O3. Decision-making: Elucidate how the brain areas involved in crossmodal learning support and enable decision-making. Whereas in artificial agents the decision-making process can be specified arbitrarily, no known algorithm matches the human brain’s ability to integrate crossmodal information dynamically and to make decisions adaptively. In the brain, decisions emerge as the collective result of a large number of interacting sources of information exciting and inhibiting each other. We hypothesize that the neural circuits responsible for this decision-making process may be closely related to the circuits integrating information from different sensory modalities, since such information integration is necessary for decision making in our multimodal world.
- O4. Communication: Explain how crossmodal learning provides the foundation for robust, inter-agent communication. Human communication relies on a broad assortment of interrelated mental components cooperating in a highly complex manner. We will consider crossmodal learning in communication, such as how integration of multimodal information influences the way we communicate, and how crossmodal information can be reconciled with linguistic knowledge, verbal utterances, and textually-imparted assertions about the environment.
- O5. Action execution: Construct an action-execution component for an artificial agent (a robot) that implements and demonstrates the crossmodal learning methods and insights that we will discover and characterize in humans. Intelligent agents generally choose actions based on a complex interrelationship of multiple sensory modalities, including proprioceptive feedback. Since action choices can affect the information available to the agent—as well as the quality of information available—agents frequently execute actions simply to acquire additional sensory information. Consequently, we will examine methods for improving both action selection and action execution through crossmodal learning.
- O6. Efficiency and Robustness: Build artificial crossmodal learning systems whose efficiency and robustness approaches that of humans and animals. For example, we will develop a model of how humans exploit crossmodal information to increase robustness and efficiency in prediction, and then we will transfer that knowledge to an artificial system. We will also investigate deep hierarchical neural networks to see whether we can improve their efficiency and robustness during crossmodal learning by first tailoring learning to the individual modalities and then, building on this foundation, exploiting the similarities across modalities.
The three thematic areas of the planned TRR
The projects of the planned TRR fall into three thematic areas: A) dynamics of crossmodal adaptation, B) efficient crossmodal generalization and prediction, and C) crossmodal learning in human-machine interaction. Each of these represents a key aspect of crossmodal learning, serving as a common rubric for the closely related projects within each area—as described next.