A long-term goal of robotics is designing robots intelligent enough to enter a person’s home and perform daily chores for them. This requires the robot to learn specific behaviors and semantic information that can only be acquired after entering the home and interacting with the humans living there. For example, there may be a trinket that the robot has never encountered before, and the owner might want to instruct the robot on how to handle the item (i.e., object manipulation information), as well as directly specify where the item should be kept (i.e., navigation information). To approach this problem, one must consider two sub-problems: a) the agent’s representation of object manipulation actions and semantic information about the environment, and b) the method with which an agent can learn this knowledge from a teacher.
Semantic maps provide a representation sufficient for navigating an environment, but map information alone is insufficient for enabling object manipulation. Conversely, there are knowledge bases that store requisite object manipulation information, but do not help with navigation or grasping in novel orientations. Previous studies have shown that Mixed Reality (MR) interfaces are effective for specifying navigation commands and programming egocentric robot behaviors. However, none of these works have demonstrated the use of MR interfaces for teaching high-level object manipulation actions, and semantic information of the environment.
Our contribution is a system that enables humans to teach robots both object manipulation actions—in a local object frame of reference—and b) semantic information about objects in a global map. We use a Mixed Reality Head Mounted Display (MR-HMD) to enable humans to teach a robot a plannable representation of their environment. By plannable, we mean structured representations that are searchable with AI planning tools. By teach, we mean having the human explicitly provide information necessary for instantiating our representation. Our representation, the Action-Oriented Semantic Map (AOSM), enables robots to perform complex object manipulation tasks that require navigation around an environment.
To test our system for building AOSMs, we created a test environment where a mobile manipulator was tasked with learning how to perform household chores with the help of a human trainer. Three novice humans used our MR interface to teach a robot an AOSM, allowing the robot to autonomously plan to navigate to a bottle, pick it up, and throw it out. In addition, two expert users taught a robot to autonomously plan to flip a light switch off and manipulate a sink faucet to the closed position, both in under 2 minutes.
You can read more in our paper!