A person should be able to give a complex natural language command to a robot drone or self-driving car in a cityscale environment and have it be understood, such as “”walk along third street until the intersection with main street, then walk until you reach the charles river.” Existing approaches represent commands like these as expressions in Linear Temporal Logic, which can represent constraints such as “eventually X” and “avoid X”. This representation is powerful, but to use it requires a large dataset of language paired with LTL expressions for training the model. This paper represents the first ever framework for learning to map from English to LTL without requiring any LTL annotations at training time. We learn a semantic parsing model that does not require paired data of language and LTL logical forms, but instead learns from trajectories as a proxy. To collect trajectories on a large scale over a range
We release this data as well as the data collection procedure to simulate paths in large environments. We see the benefits of using a more expressive language such as LTL in instructions that require temporal ordering, and also see that the path taken with our approach more closely follows constraints specified in natural language. This dataset consists of 10 different environments, with up to 2,458 samples in each environment, giving us a total of 18,060 samples. To the best of our knowledge this is the largest dataset of temporal commands in existence.
You can read the paper here!