Develop the communicative and interaction capabilities of the robot using both the gestural and spoken dialogue modalities. The design, development and evaluation of these capabilities will be grounded on child-robot interactive scenarios that constitute the first use case of BabyRobot. An important goal is to build the joint attention module of the robot and the common grounding module that employs conceptual networks. These modules, along with the audio-visual processing and behavioral informatics modules and core robotic functionality, will be integrated in the multimodal spoken dialogue human-robot interaction system that integrates also interaction via the gesture modality. An important innovation is the ability to handle multiparty human- robot interaction that will be fully validated in the collaboration use case 3. System integration and use case 1 evaluation of each of the module will also be performed.
Description of Work
- System Architecture, Use Case 1 Specification and Data Collection Protocols: We will first define the architecture and functionality of the communication and interaction modules of the robotic platform.We will then proceed to determining the interaction scenarios for use case 1 recruiting the help of our team of ASD specialists and educators. Last but not lest, we will define the data collection protocols for this use case.
- Joint Attention Modeling and Interaction: Using the data collected using the protocols specified the previous task, we will develop data-driven models for multiparty turn-taking and joint attention in child-robot interactions. Then we will use various machine learning methods to infer the focus of attention and turn- taking dynamics automatically. We will also develop methods for assessing the shared attention mechanism and dynamic turn-taking strategies of each individual child participating in the scenario including methods for long-term two-way entrainment.
- Conceptual Models and Grounding: We will construct multimodal concept networks will model associative/spacial/semantic relations of concepts (images/objects, words), people, actions, socio-affective states, and intentions that co-occur. Understanding, learning and negotiating semantics and sharing intentions will be modeled on top of this network. The end result will be a common ground model that will be used during communication and learning by the robot.
- Gesture Interaction and Synthesis: In this task we will set up a gesture generation and synthesis module to be integrated into the overall system architecture. Gesture specifications will be synthesized with a behavior realizer which enables the flexible production of synthetic speech and co-verbal hand and arm gestures on the robotic platform at run-time, while not being limited to a predefined repertoire of motor actions.
- Multiparty Multimodal Spoken Dialogue Interaction: The KTH dialogue framework for real-time multimodal multiparty interaction processing (IrisTK) will be enhanced to encompass human-robot interaction. This includes extending its situational modelling module, with the input/output handling needed for situated multiparty child-robot interaction. Also, we will develop dialogue authoring tools enabling developers to define the structure of the dialogue state, while collected interactional data will be used for training the multimodal language understanding components.
- Integration and Evaluation for Use Case 1: The main integration task here is to extend the IrisTK framework to become the communication center of the Furhat and ZENO robot. Then we will add input modules such as gesture recognition, child speech recognition, affective state tracking, action and intent recognition, and robot control interfaces and environment modelling (including object tracking). In use case 1 the dialogue experts will develop and evaluate the interactional exercises in collaboration with the educators.