Projects - Area Z

Three Integration Initiatives: II-T, II-M, II-R

Integration of research approaches: Integration initiatives and demonstrator Due to the highly interdisciplinary nature of our research, our planned centre will emphasize collaboration and interdisciplinary cooperation to a high degree. Every project is a proposal for collaborative research including at least two PIs with different, complementary backgrounds. This emphasis on integration is woven deeply into both our overall research strategy and indeed into the focus of the research itself: each project combines multiple modalities, multiple computational or neurocognitive disciplines, and multiple approaches across a range of perspectives in different international research groups from both Hamburg and Beijing. Integration is therefore an intrinsic part of the centre, supported by the six common research objectives and the three integration initiatives described below. Because the focus of our investigation—crossmodal learning—is not yet a well-established discipline, we will introduce novel, interdisciplinary metrics of success. Thus, to focus, evaluate, and demonstrate the progress of the centre’s research, we will have three Integration Initiatives (IIs) in the first four-year phase of this TRR. While each project has its own specific goals, the Integration Initiatives will organize the collaborative efforts of the participants, advance the state of the art, make progress towards our six objectives (see above), and produce the robotics demonstrator. In all, there will be three initiatives:

II-T: Theory,
II-M: Models,
II-R: Robotics,

where the third of these will produce a robot demonstrator, with the goal of illustrating many of the scientific advances of the centre on a common robotics platform.

Integration Initiative II-T: Construction of a theoretical framework for crossmodal learning

The purpose of II-T is to organize a set of integrative activities and events within the centre that aim at discussing, comparing and elaborating conceptual and theoretical perspectives on crossmodal learning. A particular emphasis will be placed on comparing the different theoretical approaches to crossmodal learning that will be pursued by the individual projects. In addition, II-T will focus on establishing links between concepts employed in research on natural systems and technical systems. A key role of II-T is to provide a backbone for the training activities in the Integrated Research Training Group. Therefore, II-T is integrated with the central project Z2 and supported by resources allocated to this project. However, all projects will contribute to II-T by participating in the training activities and events and contributing to the publications that will reflect the results of the discussion process. Work on II-T will have three components: (1) Conceptual integration—we will aim at a comparative analysis and synthesis of different views on mechanisms of crossmodal learning; this will not necessarily result in a single overarching concept or theory, but will lead to a framework for comparing and reconciling the nature and explanatory potential of different theoretical approaches; importantly, work in II-T will also be informed by and serve as a heuristic for work in II-M and II-R. (2) Training of conceptual background—work on conceptual foundations will provide a scaffold for structuring the graduate training activities of the TRR, which is reflected in the qualification plan delineated in project Z2. (3) Dissemination of novel integrated views—we will generate joint reviews and opinion articles that aim at an overview and a synthesis of the relevant concepts in multisensory processing, including the adaptation of the underlying representations (i.e., crossmodal learning) and the use of these representations for predictive processing. Furthermore, we will organize a set of events that encourage discussion and dissemination of theories on crossmodal learning.

Integration Initiative II-M: Computational models of crossmodal learning

II-M will develop, integrate and evaluate computational models for the purpose of improving the performance of crossmodal learning and communication in multi-modal AI systems. II-M will integrate subprojects contributing computational models of crossmodal learning. These computational models will be inspired and informed by the theoretical framework of II-T, and their integration and evaluation will be instrumental for refining the II-T framework as well as for providing methods to be used in the robotic demonstrator of II-R. While the focus of II-M is the computational modelling of robotic communication, the focus of II-R is the robotic demonstrator, including dextrous manipulation. To facilitate integration of the software models produced by the subprojects, we will concentrate our efforts on a specific, long-standing challenge as a test bed—grounded, crossmodal language learning—by which we mean the acquisition of language skills through the active, multimodal perception of the world. Grounded language learning is meant to address certain deficits of purely text-based natural-language processing (NLP) systems (such as text translation or summarization systems), which do not commonly incorporate the kinds of crossmodal skills required for nuanced, interactive communication with humans. Such skills include the recognition and understanding of objects, actions, human gestures, and the surrounding environment, all of which are integral aspects of human communication. Humans can perform these skills robustly and with ease, largely due to human efficiency in combining multimodal sensory input and learning from multisensory experience. Therefore, we will use grounded language learning as a test bed for combining the methods and models advanced by the individual subprojects and for evaluating their abilities in crossmodal integration and learning. II-M has three interrelated components: (1) development of an evaluation platform and task; (2) integration of the software models; and (3) evaluation of the integrated models in the testing environment.

Integration Initiative II-R: Crossmodal Human-Robot collaboration

The third integration initiative will prepare a robotic demonstrator. While II-M focuses on models and robots in a controlled virtual environment, II-R focuses on a demonstration of crossmodal learning in human-robot collaboration. Drawing on the theoretical framework of II-T and the software models of II-M, the demonstrator will highlight the advancements of the centre, providing a platform for evaluating and proving our framework and models in a robotics application. The finished system will demonstrate the value of crossmodal learning specifically in the realm of human-robot collaboration, an area of robotics requiring complex integration of multiple sensory modalities. The demonstrator will show that through a deep and nuanced understanding of the theory and processes underlying crossmodal learning in humans and machines, we can produce sophisticated, high-level collaboration between humans and robots—robots whose abilities are robust to noise and capable of adapting over time. That sophistication will be demonstrated in the form of nuanced crossmodal communications taking place throughout the collaboration, based on gestures, facial expression, and audible utterances, as produced and interpreted by both human and robot. The human will (a) produce the kinds of communicative signals the robot of II-M has learned to comprehend and (b) interpret both the actions and communicative signals of the robot. The goal of the demonstrator is purely scientific: to provide a platform for illustrating the scientific achievements of the centre in a real-world robot; the demonstrator is not intended to solve a specific open problem in robotics. The focus of the demonstrator, including the task(s) demonstrated, will be to show-case the following technical advancements developed in the centre: (1) the ability to interpret subtle forms of communication that rely on multiple modalities; (2) the ability to focus attention so as to enhance an integrated multimodal signal; (3) the ability to choose complex actions in response to an integrated crossmodal percept; (4) the ability to adapt and improve over time to a variety of crossmodal stimuli.

Project Z3 publications (phase 1)