Action Oriented Semantic Maps

A long-term goal of robotics is designing robots intelligent enough to enter a person’s home and perform daily chores for them. This requires the robot to learn specific behaviors and semantic information that can only be acquired after entering the home and interacting with the humans living there. For example, there may be a trinket that the robot has never encountered before, and the owner might want to instruct the robot on how to handle the item (i.e., object manipulation information), as well as directly specify where the item should be kept (i.e.,  navigation information). To approach this problem, one must consider two sub-problems: a) the agent’s representation of object manipulation actions and semantic information about the environment, and b) the method with which an agent can learn this knowledge from a teacher.

Semantic maps provide a representation sufficient for navigating an environment, but map information alone is insufficient for enabling object manipulation. Conversely, there are knowledge bases that store requisite object manipulation information, but do not help with navigation or grasping in novel orientations. Previous studies have shown that Mixed Reality (MR) interfaces are effective for specifying navigation commands and programming egocentric robot behaviors. However, none of these works have demonstrated the use of MR interfaces for teaching high-level object manipulation actions, and semantic information of the environment.

Our contribution is a system that enables humans to teach robots both object manipulation actions—in a local object frame of reference—and b) semantic information about objects in a global map. We use a Mixed Reality Head Mounted Display (MR-HMD) to enable humans to teach a robot a plannable representation of their environment. By plannable, we mean structured representations that are searchable with AI planning tools. By teach, we mean having the human explicitly provide information necessary for instantiating our representation. Our representation, the Action-Oriented Semantic Map (AOSM), enables robots to perform complex object manipulation tasks that require navigation around an environment.

To test our system for building AOSMs, we created a test environment where a mobile manipulator was tasked with learning how to perform household chores with the help of a human trainer. Three novice humans used our MR interface to teach a robot an AOSM, allowing the robot to autonomously plan to navigate to a bottle, pick it up, and throw it out. In addition, two expert users taught a robot to autonomously plan to flip a light switch off and manipulate a sink faucet to the closed position, both in under 2 minutes.

You can read more in our paper!

Natural Language, Mixed Reality, and Drones

As drones become increasingly autonomous, it is imperative that designers create intuitive and flexible ways for untrained users to interact with these systems. A natural language interface is immediately accessible to non-technical users and does not require the user to use a touchscreen or radio control (RC). Current natural language interfaces require a predefined
model of the environment including landmarks, which is difficult for a drone to obtain. For example, given the instruction “Fly around the wall to the chair and take a picture,” the drone must already have a model of the wall and the chair to infer a policy.

We address these interaction problems by using natural language within a mixed reality headset (MR) to provide an intuitive high-level interface for controlling a drone using goal-based planning. By using MR, a user can annotate landmarks with natural language in the drone’s frame of reference. The process starts with the MR interface displaying the virtual environment of a room that can be adjusted to overlay on physical reality with gestures. Afterward, the user can annotate landmarks with colored boxes using the MR interface. This process allows the user to specify landmarks that the drone does not have the capability to detect on its own. Then, when the user sends a command, we translate this natural language text to a reward function.

Further, we conducted an exploratory user study to compare the low-level 2D interface with two MR interfaces. Overall, we found that our MR system offers a user-friendly approach to control a drone with both low-level and high-level instructions. The language interface does not require direct, continuous commands from the user. Overall, the MR system does not require the user to spend as much time controlling the drone as with the 2D interface.

You can read the paper here!

Finding Objects with Verbs Instead of Nouns

A key bottleneck in widespread deployment of robots in human-centric environments is the ability for non-expert users to communicate with robots. Natural language is one of the most popular communication modalities due to the familiarity and comfort it affords a majority of users. However, training a robot to understand open-ended natural language commands is challenging since humans will inevitably produce words that were never seen in the robot’s training data. These unknown words can come from paraphrasing such as using “saucer” instead of “plate,” or from novel object classes in the robot’s environments, for example a kitchen with a “rolling pin” when the robot has never seen a rolling pin before. We aim to develop a model that can handle open-ended commands with unknown words and object classes. As a first step in solving this challenging problem, we focus on the natural language object retrieval task — selecting the correct object based on an indirect natural language command with constraints on the functionality of the object.

More specifically, our work focuses on fulfilling commands requesting an object for a task specified by a verb such as “Hand me a box cutter to cut.” Being able to handle these types of commands is highly useful for a robot agent in human-centric environments, as people usually ask for an object with a specific usage in mind. The robot would be able to correctly fetch the desired object for the given task, such as cut, without needing to have seen the object, a box cutter for example, or the word representing the object, such as the noun “box cutter.” In addition, the robot has the freedom to substitute objects as long as the selected object satisfies the specified task. This is particularly useful in cases where the robot cannot locate the specific object the human asked for but found another object that can satisfy the given task, such as a knife instead of a box cutter to cut.

Our work demonstrates that an object’s appearance provides sufficient signals to predict whether the object is suitable for a specific task, without needing to explicitly classify the object class and visual attributes. Our model takes in RGB images of objects and a natural language command containing a verb, generates embeddings of the input language command and images, and selects the image most similar to the given command in embedding space. The selected image should represent the object that best satisfies the task specified by the verb in the command. We train our model on natural language command-RGB image pairs. The evaluation task for the model is to retrieve the correct object from a set of five images, given a natural language command. We use ILSVRC2012 images and language commands generated from verb-object pairs extracted from Wikipedia for training and evaluation of our model. Our model achieves an average retrieval accuracy of 62.3% on a held-out test set of unseen ILSVRC2012 object classes and 53.0% on unseen object classes and unknown nouns. Our model also achieves an average accuracy of 54.7% on unseen YCB object classes. We also demonstrate our model on a KUKA LBR iiwa robot arm, enabling the robot to retrieve objects based on natural language commands, and present a new dataset of 655 verb-object pairs denoting object usage over 50 verbs and 216 object classes.

You can read more in our paper and our code and data!

Robots that Use Language

With my wonderful collaborators Cynthia Matuszek, Hadas Kress-Gazit and Nakul Gopalan, we published a review paper about language and robots. This article surveys the use of natural language in robotics from a robotics point of view. To use human language, robots must map words to aspects of the physical world, mediated by the robot’s sensors and actuators. This problem differs from other natural language processing domains due to the need to ground the language to noisy percepts and physical actions. We describe central aspects of language use by robots, including understanding natural language requests, using language to drive learning about the physical world, and engaging in collaborative dialogue with a human partner. We describe common approaches, roughly divided into learning methods, logic-based methods, and methods that focus on questions of human–robot inter-action. Finally, we describe several application domains for language-using robots.

It was a wonderful opportunity to reflect on the past 10 years since my Ph.D. thesis in 2010 and survey the amazing progress we have seen in robots and language.

Remote Teaching on a Time Budget

Many of us are preparing to virtualize our classes in response to the COVID-19 outbreak that is spreading in the United States. I wanted to share my experiences teaching my Introduction to Robotics class with video lectures last semester. My motivation for doing online lectures last fall had nothing to do with the COVID-19 outbreak. First, I felt that the time I spent preparing for lecture each class had to be spent again each year. Although I reused slides, I still needed to refresh myself on the slides before each lecture and prepare mentally for what I was going to say and when I was going to say it. Second, it seemed that blasting out a lecture at a group of students who are mostly passively was not the best way to use my face time with students. I would of course ask the group questions, and mix in pair-wise exercises where the students talk with a partner. However most of the time was me blasting technical material at them, and I wanted more informal small group interactions in the class period. Third, as we expand our drone program, it seemed useful to expand access to the lecture materials beyond the students who are on Brown’s campus.

My starting point was a curriculum, syllabus, and set of lecture slides that I developed for the course over the past two years. I previously helped out a bit with the Udacity Flying Car Nanondegree, and one of the key lessons from that experience was to focus on small blocks of content. Based on the Sheriden center’s advice, I started out each lecture by defining learning objectives. The verbs in this page were a helpful starting point. I decided to use the platform to host the online lecture content because they take care of the hosting and it has a framework for hosting videos as well as asking auto-graded questions, and Youtube to host the videos themselves. Defining learning objectives already put a lot more structure into my slides than I had before. (Yes I know I should have been doing this all along!)

Next I divided each slide deck into small chunks that I could cover in two minutes or less. The lectures I created consisted of a short two minute video followed by a few multiple choice questions. A crucial benefit of the online modality is the ability to ask every student questions after each block. Each question can be automatically graded, allowing students to get immediate feedback on what they understand and don’t understand. At first I worried that I wouldn’t be able to ask substantive questions, and it is true the format does not lend itself to extended discussions. However a large lecture itself is hard to have substantive discussions; when it happens, it’s often a few people talking and most people passively listening. Instead I tried to ask multiple choice questions that were tied to the learning objectives for each section.

We experimented with more sophisticated types of questions, such as creating a drag-and-drop question for the different blocks of a PID controller, but ultimately stuck with multiple choice questions. More complicated questions ended up being more work for incremental benefit. We also considered having coding questions for example in some kind of Jupyter notebook. However we ultimately decided that the best way to assess coding was to use our existing projects and assignments, where students program on their own drone, on their own time through structured assignments and projects. This simplified the creation of online lectures since they only needed to assess content knowledge rather than the ability to write programs or derive equations.

I used a screen capture program to record myself talking over the slides. If I made a mistake, I stopped recording and restarted from the beginning. Since each planned video was only two minutes, this process was easier and faster than editing videos later. Merely the act of recording improved things because I would find bugs in my slides, stop and fix them, and then record a clean version. The ability to stop and rerecord when something came out wrong meant that the version the students eventually saw was easier to understand, and more information-dense. As a result I was able to cover the material I wanted to cover in each lecture in far fewer minutes of recorded content than the 80 minutes I had planned for lecture.

Once I had a process set up it took me two to three hours to record content for the 80 minute lecture from my previously existing slides. The first block of time I would use to define learning objectives (which I probably should have been doing before, but wasn’t), as well as create questions. The second block of time I used to record. I then uploaded the videos to Youtube. I used the built-in editing tool to crop the beginning and the end when I was starting and stopping the screen grabber. Then I put them in EdX and entered the questions after each video.

During class, students could choose to build their drone, work on assignments, or watch the lectures. Students would do some of each. I used the class time to answer individual questions, help students debug their drones, and perform checkoffs. I felt that I got to know the individual students better with this in-class style than when I lectured to them, and they got more individual interactions from me. Effectively I used the class time as extended office hours. Of course this sort of interaction is hard to make happen now, remotely, but hopefully this experience will help those who are transitioning their classes online!

Following Commands in Cities

When a person visits a strange city, they can follow directions by finding landmarks in a map. For example, the can follow instructions like “Go to the Wegmans supermarket” by finding the supermarket in their map of the city.

Autonomous systems on robots need the ability to understand natural language instructions referring to landmarks as well . Crucially, the robot’s model for understanding the user’s instructions should be flexible to new environments. If the robot has never seen “the Wegmans supermarket” before but it is provided with a map of the environment that marks these locations, it should be able to follow these sorts of instructions. Unfortunately, existing approaches need not just a map of the environment, but also a corpus of natural language commands collected in that environment.

Our recent paper accepted to ICRA 2020, Grounding Language to Landmarks in Arbitrary Outdoor Environments, addresses this problem by enabling robots to use maps to interpret natural language commands. Our approach uses a novel domain-flexible language and planning model that enables untrained users to give landmark-based natural language instructions to a robot. We use Open Street Maps (OSM) to identify landmarks in the robot’s environment, and leverage semantic information encoded in OSM’s representation of landmarks (such as type of business and/or street address) to assess similarity between the user’s natural language expression (for example, “medicine store”) and landmarks in the robot’s environment (CVS, medical research lab). You can download the collected natural language data on the h2r GitHub repository: 

We tested our model with a Skydio R1 drone in simulation, then outdoors. Here’s the video!

How to Read a Paper

The only way to really understand a paper is to implement that paper. Obviously we don’t have time or resources to do that for most of the papers we read. So most of the papers we read, we won’t really understand. But that’s okay!

When reading a technical paper, it is important to first pay attention to the problem the paper is solving. What are the inputs, and what are the outputs? What can the robot do that it couldn’t do before? You do not need to understand HOW the paper is solving the problem in order to understand WHAT problem it is solving. Hopefully the authors of the paper made this clear in the introduction and abstract. When it’s not clear, sometimes I flip straight to the evaluation, not to look at the numbers, but to understand at the end of the day, what the system is doing.

Once you understand what problem the paper is solving, it is time to decide if it is worth digging deeper. Why is this paper important to your research? There are several common use cases. If the paper does not match any of these, it is okay to decide not to read further! You can’t possibly even skim every paper, much less implement them all so you are always trying to decide how much resources to invest in trying to understand them.

First, you could be interested in using this paper as a subcomponent of your own research system. For example, if your research is about making robots understand natural language commands, and this paper is about making robots pick up objects, you may be interested in using the ideas, algorithms and/or code presented in the paper as part of the language understanding system. If this is the case, you want to understand the inputs and outputs to the system and how it will connect to your planned system. You may also want to understand how well the system works; if you are aiming to produce a user study or collect a dataset, you need more reliability than if you are aiming to evaluate in simulation and only use this system for the video.

A second reason is that the paper may be solving a similar problem to the one you are solving, or even the same problem. This situation occurs frequently in NLP when many people work on the same corpus and report results even using the same training/testing split. For example, the GeoQuery dataset has been used by many papers to evaluate semantic parsing on a common task, asking questions about a geographic database. If this is why you are reading the paper, then it is important to understand at a technical level how their approach to solving the problem differs from yours. It is also important to characterize the performance of your work relative to the other paper. Do you outperform them because you are using a more advanced algorithm? Or leveraging more training data? Maybe you perform similarly to them, but you are using less training data or less annotation, or perform better on an interesting subset of the dataset. Ultimately you want to make a determination of whether this paper can be cited in related work as solving a different problem, or if it is solving the same problem and therefore needs to be compared as a baseline to your approach.

A third reason to read the paper is that it may be solving a different problem that is only distantly related to your problem, but using a particular technical tool or approach that you would like to apply to your problem. For example, you might wish to apply a technique that was translating from English to French to translate between English and the symbolic language your robot is using. Then you want to pay attention to the similarities and differences between your problem and the problem this paper is solving. What changes will you need to make to your input to use the technique described in this paper? Do you expect it to work as well? Better? Worse? How long will it take to train or implement?

A final reason to read the paper is to understand open problems in the field, and figure out what problem you would like to work on. This is often the situation when you are starting out in research. There is sort of a chicken-and-egg problem here in that you don’t know what problem to work on so you read the paper, but you will be more effective at reading papers when you know what problem you are working on. This feeling is okay and normal. In this situation, the most important thing to pay attention to is the problem the paper is solving. Ask yourself if it is an interesting problem. What are the important problems in your field? Is this paper describing one of them? Why or why not?

Any of these reasons may lead to your decision to implement the paper (even if an implementation already exists!). If you are using the paper as a subcomponent of your system, you may wish to own the implementation to make sure it works the way you expect and manage dependencies, although more frequently in this situation you may find yourself using the author’s implementation (when it exists!) If you are comparing to this paper as a baseline, sometimes you will implement the baseline algorithm in order to make sure you understand it. It is important, when implementing, to first reproduce the published numbers on the published dataset before trying it on your new dataset. If you are using an established dataset, many times people will not even run the algorithm on that dataset but use the previously published numbers. If you are using the method as a subcomponent of yours, you will frequently find yourself implementing it yourself in order to deeply understand the method and integrate it with your system. If you are not sure what to work on, implementing an existing seminal paper is a great way to get started. It helps you understand the core tools if your field, and often leads to a new paper of your own when you fix a problem, identify a new use case, or define a new, related problem.

Robot Writing and Drawing

Writing is a form of human linguistic expression, and Atsu Kotani has investigate this by creating a system to enable a robot to write. Atsu’s work introduces an approach
which enables manipulator robots to write handwrit ten characters or line drawings. Given an image of just-drawn handwritten characters, the robot infers a plan to replicate the image with a writing utensil, and then reproduces the image. Our approach draws each target stroke in one continuous drawing motion and does not rely on handcrafted rules or on predefined paths of characters. Instead, it learns to write from a dataset of demonstrations. We evaluate our approach in both simulation and on two real robots. Our model can draw handwritten characters in a variety of languages which are disjoint from the training set, such as Greek, Tamil, or Hindi, and also reproduce any stroke-based drawing from an image of the drawing. This work was featured in Wired,and you can learn more from the paper, which was published at ICRA last summer

Advanced Autonomy on a Low-Cost Drone

We created a hardware and software framework for an autonomous aerial robot, in which all software for autonomy can run onboard the drone, implemented in Python. We present an Unscented Kalman Filter (UKF) for accurate state estimation. Next, we present an implementation of Monte Carlo (MC) Localization and FastSLAM for Simultaneous Localization and Mapping (SLAM). The performance of UKF, localization, and SLAM is tested and compared to ground truth, provided by a motion-capture system. Our autonomous educational framework runs quickly and accurately on a Raspberry Pi in Python. The base station machine only needs to have a web browser and does not need to have ROS installed, making it easy to use in school settings.

In partnership with the Duckietown Foundation, we are releasing our course materials for anyone to use. The Operations Manual contains a description of the drone and how to use it. The Learning Materials book contains all our projects and assignments. This work is being presented at IROS this week and has been nominated for a best paper award! Read the full paper here.

Writing a Technical Paper

This post describes my outline/structure for a technical paper. I have found a fairly specific structure works best for abstracts and introductions. For the other sections, I describe a process for writing them.

Abstracts and Introductions

The first sentence should be a general statement of the problem’s importance.  The second sentence should summarize the shortcomings of previous work. It’s very tempting to make this go on for more than one sentence, but don’t give in – just one sentence! You will have more space later.  The next few sentences should describe your technical solution to the problem.  The last two or three sentences describe the evaluation.  Usually one sentence about the quantitative evaluation, and one sentence about the robot demonstration.  Did it work? How do we know that it worked? This does not need to describe the whole evaluation, but should cover the most impressive thing the robot can do as a result of the paper.
I think this abstract is a good example of following this template:

For introductions, the pattern is similar.  The first paragraph should generally describe the problem, focusing on the problem, what it is, and why it is important.  The second paragraph should succinctly describe related approaches and why they do not solve the problem.  The next few paragraphs should describe our technical approach for solving the problem.  The last paragraph should describe the evaluation and how we show that we have solved it.   I like to have a figure on the first page of the paper (Figure 1) that summarizes or shows what the robot can do that it couldn’t do before.  This figure is often the first thing reviewers see, and should show the “impressive result” of the paper visually, often a storyboard of the video.

Related Work

The aim of a related work section is to situate your contribution with respect to other people’s work.  You want to give the reader an overview of papers that are solving similar problems or using similar methods.  You also want to tell the reader why your contribution is new, different, and better.  

Before starting, always always always use natbib.  Even if the conference template claims not to support natbib, you can engineer it in anyway, and you should.  Natbib contains \citet which is a “noun phrase” citation such as “Tellex et al. (2019) wrote that”.  It will do “et al.” with no period after et, and a period after al. It will always spell the author’s name right if your bibtex entry is right.   Natbib saves time, makes your paper have fewer bugs, and avoids annoying people by spelling their name wrong (as long as you are careful to get it right once in the bibtex entry). 
I always start by writing two sentences about each paper.  The first sentence is a one sentence summary of the paper.  Example:  “\citet{tellex11} showed a framework that learns to map between language and robot trajectories functions, enabling a robotic forklift to understand natural language commands about movement and manipulation.”  The second sentence is a statement of the differences between their paper and your paper.  For example, “Our approach is different because it requires less annotated data and maps to reward functions instead of trajectories, enabling the robot to interpret human goals.”    I start by writing a long list of these sentences, and collect them in the related work section of the paper.  It’s okay if they are not organized.  Every time someone tells me about a paper I should put in related work, I add one of these sentence pairs (or at least a note to go look up that paper).  Your goal is to not miss any areas of related work that a reviewer might be familiar with.  
As you build up longer lists of citations and sentences in this format, you can start to group them into areas.  This might already happen naturally as you follow chains of related citations.  At some point, you can explicitly start grouping the sentences together into paragraphs.  It will still be choppy, but at least related papers will be together.  Once you have paragraphs, you can start to write summary sentences.  “There are many papers that learn to map between human language and robot trajectories~\citep{tellex11, matuszekxx, …}.  Our approach is different because it requires less annotated data than these methods and maps to reward functions instead.”    At this point you have a viable related work section. 

Depending on your available space, you may find yourself needing more summary and analysis sections that group papers together.  For example, you may not have space to give a one sentence overview of each paper.  Then you can cut the detailed versions, and just keep the high-level summary.  However, I have found that the best way to create the summary version is to start with the detailed version and then generalize.  If you do find yourself cutting the detailed versions, make sure to comment it out, or copy it into the draft journal paper submission for later. 
Sometimes, there will be a very closely related paper that needs a whole paragraph on its own.  For example, in our OO-POMCP paper, we built closely on the POMCP algorithm, and needed to give a detailed overview of the whole algorithm.  Or there might be a particularly close piece of related work with a subtle distinction, and you need to give more detail to express this distinction.  Feel free to write long if this is the case.  Sometimes you can figure out a way to cut later, and sometimes the detail is necessary. 

Related work is crucial for reviewers, who are often already familiar with the field, and don’t want to have to read your whole paper in order to understand your contribution.  Writing related work often helps tease out the novel aspects of the paper that should be emphasized in the introduction and abstract.  Don’t save it until the end.

I like to put related work after the introduction, before technical approach. I believe the introduction should explain enough about the paper that you can then review related work, and that it’s helpful to situate the paper with respect to related work next because that’s part of what I read to decide whether to read the whole paper. However it’s also valid to put it after technical approach, before the conclusion. Then you can assume the reader has a deeper technical understanding of the work and give a more nuanced presentation of its differences to related work.

Technical Approach

Think of the paper as a tree of problems and solutions.  It is important to not give any technical material with out first explaining the motivation for that material.   Technical/mathematical material is a solution to some problem, and it is extremely difficult to understand without first being clear about the problem it is solving.   First is the overarching problem the paper is solving, which you give in the abstract and introduction.  Then in technical approach you break the problem down into sub-problems.  Each of these sub-problems has a solution, which itself can be a set of sub-sub problems paired with technical solutions.  At any point, the reader may decide to stop descending delving into the leaf nodes and go on to the next sub-problem in the tree, and the paper should be designed for this kind of approach.  Actually only the very rare reader will descend into all the full technical detail of the paper – it is only when implementing the paper that someone typically does this.  
Even when implementing and going all the way to the leaf nodes, it is crucial to explain the problem first, then the solution. This approach situates the technical material into the larger ark of the paper, and helps people attach it to a larger structure in their brain about how the paper is put together.  

I like sentences like “In order to make the robot do X, we need Y and Z.” Then explain that bit. Then “In order to do Y we need A and B.” The whole paper should fit into a tree of these sentences, where the root of the tree is the overall motivation/main technical contribution for the paper.

A common failure mode, of what not to do, is to put “preliminary” or “background” material into an introductory section of the paper, that reads like a laundry list of random techniques.  When a background section makes sense, it should be situated as part of the tree of problems and solutions for the overall paper, but that this particular section is not novel to this paper.  It should still be connected to the rest of the paper as a tree of problems and solutions.   It should not be a laundry list of technical material that will come up later, because people won’t know what it’s for at this point in the paper, and won’t remember it. 


For the results section, I like to start with a sentence describing the aim of the evaluation. This sentence should tie things back to the motivation set out in the introduction and the abstract. The results section should deliver on the promises made in the introduction. What did the robot actually do, that it couldn’t do before? How do we know that it did it? When reading a paper, often I will skip to the results section, especially if i am confused about the technical approach to find out what the robot actually did. This paper has a good example of this approach. For each subsection or set of experiments, open it with the sub-hypothesis that that part of the evaluation is showing. A common pattern is to have some results in simulation, and a demonstration on the real robot; explain how each piece of the evaluation demonstrates the robot’s new capabilities. Make sure that the metrics you are using are as easy to understand as possible. For example you could report “reward” but that requires the person understands and agrees with your reward function; it’s often better to report “accuracy” whether the robot did the right thing or not over many trials.

Avoid writing a description of one experiment, then a description of another experiment, then the results of the first experiment and then the results of the second experiment. After understanding what the first experiment is, give the results right away while it is still in the reader’s head. There is no reason to wait to give those results, when they will forget the details of the experiment before they get to the results. Moreover hopefully your separate experiments each tell a different part of the story about how the system works.


The first paragraph of the conclusion should summarize the contributions of the paper. This is different from the abstract in that you want to explain what you did in the paper: what results have been proved or what algorithms have been created, and what demonstrations have been done to show that it worked. The later paragraphs should cover future work. This part of the paper can recognize limitations of your paper and sketch out open problems that remain to be solved. It is okay if you do not plan to solve them; you are also laying out boundaries for what this paper does not solve, which helps people understand what it does solve. Moreover you are also describing possible follow-on work that may inspire someone to expand on this paper.