POMDPS and Mixed Reality for Resolving References to Objects

Communicating human knowledge and intent to robots is essential for successful human-robot interaction (HRI). Failures in communication, and thus collaboration, occur when there is mismatch between two agents’ mental states. Question-asking allows a robot to acquire information that targets its uncertainty, facilitating recovery from failure states. However, all question-asking modalities (natural language, eye-gaze, gestures, visual interfaces) have tradeoffs, making choosing which to use an important and context-dependent decision.  For example, for robots with “real” eyes or pan/tilt screens, looking requires fewer joints to move less distance compared to pointing, decreasing the speed of the referential action. However, eye gaze, or “pointing with the eyes,” is inherently more difficult to interpret since pointing gestures can reduce the distance between the referrer and the referenced item. Another approach to question-asking is to use visualizations to make the communication modality independent of the robot. While the performance of physical actions like eye gaze and pointing gestures rely on the physical robot, visualization methods like mixed reality (MR) enable the robot to communicate information by visually depicting its mental state in the real environment. Related work has investigated the effects of using MR visualizations for reducing mental workload in HRI, but there remains a gap of research on how it compares to physical actions like pointing and eye gaze for reducing robot uncertainty.

This work investigates how physical and visualization-based question-asking can be used for reducing robot uncertainty under varying levels of ambiguity. To do this, we first model our problem as a POMDP, termed the Physio-Virtual Deixis POMDP (PVD-POMDP), that observes a human’s speech, gestures, and eye gaze, and decides when to ask questions (to increase accuracy) and when to decide to choose the item (to decrease interaction time). The PVD-POMDP enables the robot to ask questions either using physical modalities (like eye-gaze and gesture) or using virtual modalities (like mixed reality). To evaluate our model, we conducted a between-subjects user study, where 80 participants interact with a robot in an item-fetching task. Participants experience one of three different conditions of our PVD-POMDP: a no feedback control condition, a physical feedback condition, or a mixed reality feedback condition. Our results show that our mixed reality model is able to successfully choose the correct item 93% of the time, with each interaction lasting about 5 seconds,  outperforming the physical and no feedback models in both quantitative metrics and qualitative metrics (highest usability, task load, and trust scores).

You can read more in our paper!

Leave a Reply

Your email address will not be published. Required fields are marked *