Advanced Autonomy on a Low-Cost Drone

We created a hardware and software framework for an autonomous aerial robot, in which all software for autonomy can run onboard the drone, implemented in Python. We present an Unscented Kalman Filter (UKF) for accurate state estimation. Next, we present an implementation of Monte Carlo (MC) Localization and FastSLAM for Simultaneous Localization and Mapping (SLAM). The performance of UKF, localization, and SLAM is tested and compared to ground truth, provided by a motion-capture system. Our autonomous educational framework runs quickly and accurately on a Raspberry Pi in Python. The base station machine only needs to have a web browser and does not need to have ROS installed, making it easy to use in school settings.

In partnership with the Duckietown Foundation, we are releasing our course materials for anyone to use. The Operations Manual contains a description of the drone and how to use it. The Learning Materials book contains all our projects and assignments. This work is being presented at IROS this week and has been nominated for a best paper award! Read the full paper here.

Writing a Technical Paper

This post describes my outline/structure for a technical paper. I have found a fairly specific structure works best for abstracts and introductions. For the other sections, I describe a process for writing them.

Abstracts and Introductions

The first sentence should be a general statement of the problem’s importance.  The second sentence should summarize the shortcomings of previous work. It’s very tempting to make this go on for more than one sentence, but don’t give in – just one sentence! You will have more space later.  The next few sentences should describe your technical solution to the problem.  The last two or three sentences describe the evaluation.  Usually one sentence about the quantitative evaluation, and one sentence about the robot demonstration.  Did it work? How do we know that it worked? This does not need to describe the whole evaluation, but should cover the most impressive thing the robot can do as a result of the paper.
I think this abstract is a good example of following this template:

For introductions, the pattern is similar.  The first paragraph should generally describe the problem, focusing on the problem, what it is, and why it is important.  The second paragraph should succinctly describe related approaches and why they do not solve the problem.  The next few paragraphs should describe our technical approach for solving the problem.  The last paragraph should describe the evaluation and how we show that we have solved it.   I like to have a figure on the first page of the paper (Figure 1) that summarizes or shows what the robot can do that it couldn’t do before.  This figure is often the first thing reviewers see, and should show the “impressive result” of the paper visually, often a storyboard of the video.

Related Work

The aim of a related work section is to situate your contribution with respect to other people’s work.  You want to give the reader an overview of papers that are solving similar problems or using similar methods.  You also want to tell the reader why your contribution is new, different, and better.  

Before starting, always always always use natbib.  Even if the conference template claims not to support natbib, you can engineer it in anyway, and you should.  Natbib contains \citet which is a “noun phrase” citation such as “Tellex et al. (2019) wrote that”.  It will do “et al.” with no period after et, and a period after al. It will always spell the author’s name right if your bibtex entry is right.   Natbib saves time, makes your paper have fewer bugs, and avoids annoying people by spelling their name wrong (as long as you are careful to get it right once in the bibtex entry). 
I always start by writing two sentences about each paper.  The first sentence is a one sentence summary of the paper.  Example:  “\citet{tellex11} showed a framework that learns to map between language and robot trajectories functions, enabling a robotic forklift to understand natural language commands about movement and manipulation.”  The second sentence is a statement of the differences between their paper and your paper.  For example, “Our approach is different because it requires less annotated data and maps to reward functions instead of trajectories, enabling the robot to interpret human goals.”    I start by writing a long list of these sentences, and collect them in the related work section of the paper.  It’s okay if they are not organized.  Every time someone tells me about a paper I should put in related work, I add one of these sentence pairs (or at least a note to go look up that paper).  Your goal is to not miss any areas of related work that a reviewer might be familiar with.  
As you build up longer lists of citations and sentences in this format, you can start to group them into areas.  This might already happen naturally as you follow chains of related citations.  At some point, you can explicitly start grouping the sentences together into paragraphs.  It will still be choppy, but at least related papers will be together.  Once you have paragraphs, you can start to write summary sentences.  “There are many papers that learn to map between human language and robot trajectories~\citep{tellex11, matuszekxx, …}.  Our approach is different because it requires less annotated data than these methods and maps to reward functions instead.”    At this point you have a viable related work section. 

Depending on your available space, you may find yourself needing more summary and analysis sections that group papers together.  For example, you may not have space to give a one sentence overview of each paper.  Then you can cut the detailed versions, and just keep the high-level summary.  However, I have found that the best way to create the summary version is to start with the detailed version and then generalize.  If you do find yourself cutting the detailed versions, make sure to comment it out, or copy it into the draft journal paper submission for later. 
Sometimes, there will be a very closely related paper that needs a whole paragraph on its own.  For example, in our OO-POMCP paper, we built closely on the POMCP algorithm, and needed to give a detailed overview of the whole algorithm.  Or there might be a particularly close piece of related work with a subtle distinction, and you need to give more detail to express this distinction.  Feel free to write long if this is the case.  Sometimes you can figure out a way to cut later, and sometimes the detail is necessary. 

Related work is crucial for reviewers, who are often already familiar with the field, and don’t want to have to read your whole paper in order to understand your contribution.  Writing related work often helps tease out the novel aspects of the paper that should be emphasized in the introduction and abstract.  Don’t save it until the end.

I like to put related work after the introduction, before technical approach. I believe the introduction should explain enough about the paper that you can then review related work, and that it’s helpful to situate the paper with respect to related work next because that’s part of what I read to decide whether to read the whole paper. However it’s also valid to put it after technical approach, before the conclusion. Then you can assume the reader has a deeper technical understanding of the work and give a more nuanced presentation of its differences to related work.

Technical Approach

It is important to not give any technical material with out first explaining the motivation for that material.   Technical/mathematical material is a solution to some problem, and it is extremely difficult to understand without first being clear about the problem it is solving.  Think of the paper as a tree of problems and solutions.   First is the overarching problem the paper is solving, which you give in the abstract and introduction.  Then in technical approach you break the problem down into sub-problems.  Each of these sub-problems has a solution, which itself can be a set of sub-sub problems paired with technical solutions.  At any point, the reader may decide to stop descending delving into the leaf nodes and go on to the next sub-problem in the tree, and the paper should be designed for this kind of approach.  Actually only the very rare reader will descend into all the full technical detail of the paper – it is only when implementing the paper that someone typically does this.  
Even when implementing and going all the way to the leaf nodes, it is crucial to explain the problem first, then the solution. This approach situates the technical material into the larger ark of the paper, and helps people attach it to a larger structure in their brain about how the paper is put together.  

A common failure mode, of what not to do, is to put “preliminary” or “background” material into an introductory section of the paper, that reads like a laundry list of random techniques.  When a background section makes sense, it should be situated as part of the tree of problems and solutions for the overall paper, but that this particular section is not novel to this paper.  It should still be connected to the rest of the paper as a tree of problems and solutions.   It should not be a laundry list of technical material that will come up later, because people won’t know what it’s for at this point in the paper, and won’t remember it. 


The first paragraph of the conclusion should summarize the contributions of the paper. This is different from the abstract in that you want to explain what you did in the paper: what results have been proved or what algorithms have been created, and what demonstrations have been done to show that it worked. The later paragraphs should cover future work. This part of the paper can recognize limitations of your paper and sketch out open problems that remain to be solved. It is okay if you do not plan to solve them; you are also laying out boundaries for what this paper does not solve, which helps people understand what it does solve. Moreover you are also describing possible follow-on work that may inspire someone to expand on this paper.


I am proud to announce that I have been promoted to Associate Professor with Tenure. This milestone would not have been possible without the help and support from my wonderful students and collaborators, as well as my friends and family. This acknowledgement is an accolade to them as much as it is to me. Looking forward to many more years of language and robots at Brown!

Read the full CS blog post!

Abstract Product MDPs

Natural language instructions issued to robots convey intent with multiple layers of spatial abstraction. These layers include high-level goals (“fly to the end of the street”), low-level specifications (“go ten meters south then ten meters east”), or a hybrid (“fly to the lamppost on Hex Street”). In addition, language can express constraints on reaching a goal, such as “go to the lamppost while avoiding the fire hydrant.” Current robot autonomy systems struggle to understand instructions above the lowest layer of spatial abstraction.

Our recent paper published at RSS 2019, titled Planning with State Abstractions for Non-Markovian Task Specifications, approaches this problem with Abstract Product Markov Decision Process (AP-MDP), a novel framework which enables the robot to understand higher-level spatial abstractions in natural language. This framework leverages a deep-learned model to translate natural language into Linear Temporal Logic (LTL), a symbolic form the robot can understand. From LTL, we create motion plans for the robot aligning with goals and constraints spanning different layers of abstraction.

While AP-MDPs were initially tested with a small indoor robot, we realized this framework generalizes to more complex robots and environments. We connected AP-MDPs to a Skydio R1 drone and gave natural language instructions to fly the drone around Brown’s campus! The success of this flight has led to exciting new verticals of research for the lab.

Here’s the video:

Object Oriented POMDPs

The human mind does not see a visual scene as pixels; rather it is able to pick out objects of interest and selectively reason about them. Although object-based reasoning represents a core feature of human reasoning, only recently has it been possible to study in robotics due to advances in computer vision for segmentation and classification of objects from camera images. Our recent paper published at ICRA 2019, titled Multi-Object Search in Object-Oriented POMDPs, takes a stab at the problem by solving a novel multi-object search (MOS) task: without knowing in advance where the objects are located, a robot must find them in an indoor, roomed environment. We formulate the MOS task within a new framework called an object-oriented partially observable Markov decision process (OO-POMDP). An OO-POMDP represents the state and observation spaces in terms of classes and objects.  The structure afforded by OO-POMDPs supports reasoning about each object independently while also providing a means for grounding language commands from a human on task onset. A human, for example, may issue an initial command such as “Find the mugs in the kitchen and books in the library,” where a robot can associate the locations to each object class so as to improve its search. We show that OO-POMCP with grounded language commands is sufficient for solving challenging MOS tasks both in simulation and on a physical mobile robot, which has applications for rescue- and home- robots. 

Interestingly, the OO-POMDP allows the robot to recover from lies or mistakes in what a human tells it. If the person tells the robot “Find the mug in the kitchen,” the robot will first look in the kitchen for the objects. But after failing to find them in the kitchen, it will then systematically search the rest of the environment. Our OO-POMCP inference algorithm allows the robot to quickly and efficiently use all the information it has to find the object.

Raspberry Pi Python Drone

Kevin Stacey and his team released a great article about progress in our drone project.  I especially like the video!  Thank you for a fantastic summer that included improved PID control, a working 3D Unscented Kalman Filter, and on-board, off-line SLAM, all on our little Raspberry Pi drone!

We also received the fantastic news that our IROS paper was accepted!   Stay tuned for the presentation at IROS 2018.  This paper describes the initial platform that we used for the course last fall.  We are now on Version 3 of the hardware and software stack, and we can’t wait to see what else is in store!

Understanding Language about Sequences of Actions

Many times, natural language commands issued to robots not only specify a particular target configuration or goal state but also outline constraints on how the robot goes about its execution. That is, the path taken to achieving some goal state is given equal importance to the goal state itself. One example of this could be instructing a wheeled robot to “go to the living room but avoid the kitchen,” in order to avoid scuffing the floor. This class of behaviors poses a serious obstacle to existing language understanding for robotics approaches that map to either action sequences or goal state representations. Due to the non-Markovian nature of the objective, approaches in the former category must map to potentially unbounded action sequences whereas approaches in the latter category would require folding the entirety of a robot’s trajectory into a (traditionally Markovian) state representation, resulting in an intractable decision-making problem. To resolve this challenge, we use a recently introduced probabilistic variant of Linear Temporal Logic (LTL) as a goal specification language for a Markov Decision Process (MDP). While demonstrating that standard neural sequence-to-sequence learning models can successfully ground language to this semantic representation, we also provide analysis that highlights generalization to novel, unseen logical forms as an open problem for this class of model. We evaluate our system within two simulated robot domains as well as on a physical robot, demonstrating accurate language grounding alongside a significant expansion in the space of interpretable robot behaviors.

Our paper was recently published at RSS 2018 and you can read it here!

See a video!

Ode to AIBO

We are lucky enough to have a pack of AIBOs in our lab, leftover from Robocup days of old. My student John Oberlin resurrected them, and we use them as one of our outreach platform.  Another student, Eric Rosen, just had a tutorial session teaching students how to use them for outreach.  My AIBO, Ella, is my main outreach platform that I use when I go into classrooms of very young children.  I just finished four hours of robot activities at my son’s kindergarten classroom with Ella.

AIBO was invented by Sony.  The first consumer model retailed in 1999 and retailed for $3,000.  Too much for the consumer market to really take off.  New models were released every year until 2006, when they were discontinued.  Now they are robot antiques, and new ones are on ebay for $6,000, although you can get a used one for $500.   Our AIBOs are equipped with a rich sensor package, including a camera on the snout, a microphone, two IR sensors, and joint encoders.  They are fully quadrupedal and each leg has three degrees of freedom.  Additionally you can actuate the ears, the tail, and the mouth, and the robot has a speaker for playing sounds such as barks.  You can program the robot to sit, to lie down, to walk forward or backward, to turn left and right, wag its tail, or bark, even opening its mouth.  It was added to the CMU Robot Hall of Fame because it “represents the most sophisticated product ever offered in the consumer robot marketplace.”

Sony shipped around 150,000 AIBOs before discontinuing production.  We are super excited to see the new reboot.

I realized a long time ago that the best age range for me to target for outreach activities is whatever age my son currently is.   (He’s five now!)  So when he got old enough for preschool, I started visiting preschool classrooms of him and his cousins, and the AIBO was a natural choice to take.  The first time I took AIBO to a classroom, it was clear it was a hit.I did the demo during circle time at Jay’s preschool.   The silvery color, the deco rubber hollowed out ears, the wiggly tail, the buttons just capture the essence of robot to a three year old.  As soon as I started unwrapping the robot, the kids crowd around, vying for who gets to turn her on.   We do a programming activity where kids “program” Ella by laying out cards, like “Sit”, “Stand”, “Bark” or “Lie down.”  Then I type in whatever program they did, and then she executes it!  My favorite part is asking the kids to be the robot.  I say “Kindergarten Execute!” and they all pretend to be puppy dogs.