Understanding Language about Sequences of Actions

Many times, natural language commands issued to robots not only specify a particular target configuration or goal state but also outline constraints on how the robot goes about its execution. That is, the path taken to achieving some goal state is given equal importance to the goal state itself. One example of this could be instructing a wheeled robot to “go to the living room but avoid the kitchen,” in order to avoid scuffing the floor. This class of behaviors poses a serious obstacle to existing language understanding for robotics approaches that map to either action sequences or goal state representations. Due to the non-Markovian nature of the objective, approaches in the former category must map to potentially unbounded action sequences whereas approaches in the latter category would require folding the entirety of a robot’s trajectory into a (traditionally Markovian) state representation, resulting in an intractable decision-making problem. To resolve this challenge, we use a recently introduced probabilistic variant of Linear Temporal Logic (LTL) as a goal specification language for a Markov Decision Process (MDP). While demonstrating that standard neural sequence-to-sequence learning models can successfully ground language to this semantic representation, we also provide analysis that highlights generalization to novel, unseen logical forms as an open problem for this class of model. We evaluate our system within two simulated robot domains as well as on a physical robot, demonstrating accurate language grounding alongside a significant expansion in the space of interpretable robot behaviors.

Our paper was recently published at RSS 2018 and you can read it here!

See a video!

Ode to AIBO

We are lucky enough to have a pack of AIBOs in our lab, leftover from Robocup days of old. My student John Oberlin resurrected them, and we use them as one of our outreach platform.  Another student, Eric Rosen, just had a tutorial session teaching students how to use them for outreach.  My AIBO, Ella, is my main outreach platform that I use when I go into classrooms of very young children.  I just finished four hours of robot activities at my son’s kindergarten classroom with Ella.

AIBO was invented by Sony.  The first consumer model retailed in 1999 and retailed for $3,000.  Too much for the consumer market to really take off.  New models were released every year until 2006, when they were discontinued.  Now they are robot antiques, and new ones are on ebay for $6,000, although you can get a used one for $500.   Our AIBOs are equipped with a rich sensor package, including a camera on the snout, a microphone, two IR sensors, and joint encoders.  They are fully quadrupedal and each leg has three degrees of freedom.  Additionally you can actuate the ears, the tail, and the mouth, and the robot has a speaker for playing sounds such as barks.  You can program the robot to sit, to lie down, to walk forward or backward, to turn left and right, wag its tail, or bark, even opening its mouth.  It was added to the CMU Robot Hall of Fame because it “represents the most sophisticated product ever offered in the consumer robot marketplace.”

Sony shipped around 150,000 AIBOs before discontinuing production.  We are super excited to see the new reboot.

I realized a long time ago that the best age range for me to target for outreach activities is whatever age my son currently is.   (He’s five now!)  So when he got old enough for preschool, I started visiting preschool classrooms of him and his cousins, and the AIBO was a natural choice to take.  The first time I took AIBO to a classroom, it was clear it was a hit.I did the demo during circle time at Jay’s preschool.   The silvery color, the deco rubber hollowed out ears, the wiggly tail, the buttons just capture the essence of robot to a three year old.  As soon as I started unwrapping the robot, the kids crowd around, vying for who gets to turn her on.   We do a programming activity where kids “program” Ella by laying out cards, like “Sit”, “Stand”, “Bark” or “Lie down.”  Then I type in whatever program they did, and then she executes it!  My favorite part is asking the kids to be the robot.  I say “Kindergarten Execute!” and they all pretend to be puppy dogs.

Picking a Fork from Water

This video demonstrates Baxter picking a metal fork from a sink filled with running water.  The light field technology uses image averaging to mitigate the reflection of the light in the surface of the water – but the right kind of image averaging that exploits the robot’s ability to move its camera to “refocus” the image to see through the water to the fork at the bottom.   For more information, check out our RSS paper!

https://youtu.be/YCjrLfYepOQ

 

Screwing a Nut onto a Bolt with Vision

Using light field methods, we can use Baxter’s monocular camera to localize the metal 0.24′ nut and corresponding bolt.  The localization is precise enough to allow the robot to use the estimated poses to then perform an open-loop pick, place, and screw to put the nut on the bolt.  The precise pose estimation enables complex routines to be quickly encoded, because once the robot knows where the parts are, it can perform accurate grasps and placement actions.  You can read more about how it works in our RSS 2017 paper.

Picking Petals From a Flower

Rebecca Pankow and John Oberlin programmed Baxter to pick petals off of a daisy during my graduate seminar last semester, Topics in Grounded Language for Robotics.   The robot localizes the petal on the daisy using synthetic photography based on light fields, then plucks each petal off of the daisy.  It looks for the largest open space when selecting the next petal to pick.  It keeps track of the parity of the petals picked so it can either nod and smile (if the answer is, “he loves me”) or frown (if the answer is, “he loves me not.”)  This project was recently featured in the New Yorker!