As drones become increasingly autonomous, it is imperative that designers create intuitive and flexible ways for untrained users to interact with these systems. A natural language interface is immediately accessible to non-technical users and does not require the user to use a touchscreen or radio control (RC). Current natural language interfaces require a predefined
model of the environment including landmarks, which is difficult for a drone to obtain. For example, given the instruction “Fly around the wall to the chair and take a picture,” the drone must already have a model of the wall and the chair to infer a policy.
We address these interaction problems by using natural language within a mixed reality headset (MR) to provide an intuitive high-level interface for controlling a drone using goal-based planning. By using MR, a user can annotate landmarks with natural language in the drone’s frame of reference. The process starts with the MR interface displaying the virtual environment of a room that can be adjusted to overlay on physical reality with gestures. Afterward, the user can annotate landmarks with colored boxes using the MR interface. This process allows the user to specify landmarks that the drone does not have the capability to detect on its own. Then, when the user sends a command, we translate this natural language text to a reward function.
Further, we conducted an exploratory user study to compare the low-level 2D interface with two MR interfaces. Overall, we found that our MR system offers a user-friendly approach to control a drone with both low-level and high-level instructions. The language interface does not require direct, continuous commands from the user. Overall, the MR system does not require the user to spend as much time controlling the drone as with the 2D interface.
You can read the paper here!