Teaching Kids to Program Without Screens

Teach your very young children to program without screens, and without literacy!I created a game called “Ready Robots.” Any kid can play as long as they are old enough to do “if you are happy and you know it clap your hands.” They don’t need to be able to read. You can make the game yourself by drawing your own cards, or you can download and print my cards.

To play, ask kids to lay out the cards in a particular order, then say “ready robots? Execute!” Then you and the kids should do each action in the sequence in order. Take turns making different programs for different people. Kids love making a program for their grownups to execute. See a video of the game in action!

I’ve played this game with kids for years. Even very young children can do it by imitating you and looking at the pictures on the cards. It can even build literacy skills by pairing words on the cards with actions. When I add a real robot, like an Aibo robot dog or Sphero or Baxter, I have the kids make the program with cards, then quickly type it into an Ein command line, to make the physical robot execute it. When using an Aibo robot dog, I first ask them to run the program (Sit! Wag your tail! Bark!) and then we have the robot do the same program.

For older children, you can add “if this then that.” Make little programs like “if I stomp my feet then you beep your nose.” Put the cards up on a white board or wall and add more and more rules. It is kind of like Simon Says but more confusing because you might have 4 or 5 different rules.

Most Robots Can’t Pick up Most Objects Most of the Time

Most robots can’t pick up most objects most of the time. This is for a variety of reasons. The robot is turned off. The object is out of reach. The object won’t fit in the gripper. The robot doesn’t know where the object is, because it doesn’t have a sensor at all, or because the sensor isn’t aimed at the object, or because if it is aimed at the object, the object is transparent or reflective. Or if it is aimed at the object, and it is easy to see, it might not have a detector or pose estimator for that object that can localize and predict grasps accurately enough to pick things up.

Existing work by Robb Platt, Ken Goldberg, Sergey Levine, and Chelsea Finn is working towards more robust grasping. But doing things on real robots in the real world is hard, and this won’t change any time soon. What is needed is an integrated perception/planning/motion stack with POMDPs and learned object models that can run on a robust robot with a capable, flexible general-purpose manipulator.

Computer Science Activities for the Pandemic Parent

I am a computer science professor and also a mom and an aunt. I have taken the opportunity of the COVID pandemic to take a deep dive into computer science with my son and his cousins, ages 7-11. I wanted to share some of the things we’ve tried and what worked and didn’t work.

Scratch Club

One of the first activities we started, right after schools closed, was Scratch Club. We met several times per week for one hour. The first 10-30 minutes, the kids take turns showing each other their programs. They screen-share to demo their program and explain how it works. They get feedback and ideas from the other kids. We talk about giving constructive criticism, calling attention to neat features of their programs, and getting them to say what they plan to make their program do next. The rest of the time, they program on their own, on whatever program they want. I have them split into two separate calls with a more advanced and less advanced group. Then I check in which each student individually and try to help them with their programs. Sometimes I do individual instruction/teaching. Sometimes I help them find an online tutorial to do. (We really like Warfame and Griffpatch!) If they get stuck in the tutorial, I help them find bugs. I’ve observed some really beautiful peer-teaching, where one of them helps another with a problem. This peer teaching is really good for both, because they are practicing teaching/communication skills, as well as programming.

Unfortunately the Scratch programming framework makes several big technical blunders. First they do not provide functions. It is not possible to write a function that returns a value; there are only blocks that take actions. You can hack around this by defining variables to hold the return value, but is this really the sort of programming we want to model? Second, they do not let you have lists of lists. For example, we wanted to do tic-tac-toe and the natural representation is a list of lists to represent a board, but Scratch does not allow this.

Snap is a similar framework that addresses all these problems. But what Scratch gets really really right is the social angle. All our friends use Scratch and not Snap. All the best tutorials use Scratch and not Snap. Scratch has a large user base and the coolest games are really really cool. (Check out Griffpatch’s Cloud Platform Multiplayer game!) And you can remix all those cool games and make your own variations. So together these features tied us to Scratch, even though we poked around at Snap a bit.

What surprised me was that in many cases these limitations in Scratch turned into opportunities to discuss CS concepts we would not have gotten to otherwise. Scratch doesn’t let you store strings or lists as cloud variables. So the kids learned to encode and decode strings into integers, which was super cool to see them doing.

Turing Tumble

Turing Tumble is a toy for programming with marbles. We played with it a lot when he was at a younger age, (and I had to help a lot.) But then I got asked “how do we program the computer to do addition? if we don’t already have addition?” Turing Tumble is a great way to answer this question, because it goes step by step how to build an adder using a marble computer. The curriculum is very well designed so that each puzzle builds on the next. This activity is also nice because it does not require a screen.


Unity is a widely used framework for making 2D and 3D games, so there are tons of tutorials and resources available on the web. It can be used to make phone and tablet games. I really like the idea of teaching kids the tools that are used to program games they already know and like. It turns them from a passive consumer of games to a producer, which is great not only because they can make games, but because they can look at the games they already play with a critical and creative eye. It also exposes kids to a compiled language, and the idea of rigid body physics. The bad: it is really complicated. It requires a heavyweight machine – no Chromebooks. And it’s easy for mysterious things to go wrong that mysterious check boxes fix. My son is having fun but it takes a pretty big help from me to get things going and fix things when he is stuck. There are lots of fun Youtube videos about how to make Unity games, although following the step-by-step tutorials is somewhat tricky as the Unity versions keep changing. Don’t go this route unless you are willing to do heavy lifting to help when things get stuck.

Khan Academy

This was quite good as far as it went, but it has a ceiling. The Khan Academy Javascript lessons require no software installed and run right in your web browser. It was completely internationalized so my son was doing the lessons in Polish, which was cool to see. The graphics angle was a good hook, and my son enjoyed the visual effects he could create. It was also a good introduction to a text-based language. However the ceiling is rather low; the coolest games in the Khan Academy ecosystem were not as cool as the coolest games in Scratch. They also don’t have a sharing/social aspect, or substantial built-in graphics. These features in Scratch made it possible to make a cool game faster than in Khan Academy.

Learning to Follow Directions With Less Supervision

A person should be able to give a complex natural language command to a robot drone or self-driving car in a cityscale environment and have it be understood, such as “”walk along third street until the intersection with main street, then walk until you reach the charles river.” Existing approaches represent commands like these as expressions in Linear Temporal Logic, which can represent constraints such as “eventually X” and “avoid X”. This representation is powerful, but to use it requires a large dataset of language paired with LTL expressions for training the model. This paper represents the first ever framework for learning to map from English to LTL without requiring any LTL annotations at training time. We learn a semantic parsing model that does not require paired data of language and LTL logical forms, but instead learns from trajectories as a proxy. To collect trajectories on a large scale over a range

We release this data as well as the data collection procedure to simulate paths in large environments. We see the benefits of using a more expressive language such as LTL in instructions that require temporal ordering, and also see that the path taken with our approach more closely follows constraints specified in natural language. This dataset consists of 10 different environments, with up to 2,458 samples in each environment, giving us a total of 18,060 samples. To the best of our knowledge this is the largest dataset of temporal commands in existence.

You can read the paper here!

Learning Collaborative Pushing and Grasping Policies

Imagine a robot trying to clean up a messy dinner table. Two main manipulation skills are required: grasping that enables the robot to pick up objects, and planar pushing that allows the robot to isolate objects in the dense clutter to find a good grasp pose. It is necessary to identify grasps within the full 6D space because top-down grasping is insufficient for objects with diverse shapes, e.g. a plate or a filled cup. Pushing operations are also essential because in real-world scenarios, the robot’s workspace can contain many objects and a collision-free direct grasp may not exist. Pushing operations can singulate objects in clutter, enabling future grasping of these isolated objects. We explore learning joint planar pushing and 6-degree-of-freedom (6-DoF) grasping policies in a cluttered environment.

In a Q-learning framework, we jointly train two separate neural networks with reinforcement learning to maximize a reward function. The reward function is defined as only encouraging successful grasps; we do not directly reward pushing actions, because such intermediate rewards often lead to undesired behavior. We tackle the problem of limited top-down grasping
action space by integrating a 6-DoF grasping pose sampler rather than using dense pixel-wise sampling from visual inputs and only considering hard-coded top-down grasping candidates.

We evaluate our approach by task completion rate, action efficiency, and grasp accuracy in simulation and demonstrate performance on a real robot implementation. Our system shows 10% higher action efficiency and 20% higher grasp success rate than VPG, the current state-of-the-art, indicating significantly better performance in terms of both higher
prediction accuracy and quality of grasp pose selection.

This work was published at ICRA 2021. The code is here, and you can see a video here!

Understanding Spatial Language to Find Objects Faster

Humans use spatial language to describe object locations and their relations. Consider the following scenario. A tourist is looking for an ice cream truck in an amusement park. She asks a passer-by and gets the reply “the ice cream truck is behind the ticket booth.” The tourist looks at the amusement park map and locates the ticket booth. Then, she is able to infer a region corresponding to that statement and go there to search for the ice cream truck, even though the spatial preposition “behind” is inherently ambiguous.

If robots can understand spatial language, they can leverage prior knowledge possessed by humans to search for objects more efficiently, and interface with humans more naturally. This can be useful for applications such as autonomous delivery and search-and-rescue, where the customer or people at the scene communicate with the robot via natural language.

Unfortunately, humans produce diverse spatial language phrases based on their observation of the environment and knowledge of target locations, yet none of these factors are available to the robot. In addition, the robot may operate in a different area than where it was trained. The robot must generalize its ability to understand spatial language across environments.

Prior works on spatial language understanding assume referenced objects already exist in the robot’s world model or within the robot’s field of view. Works that consider partial observability do not handle ambiguous spatial prepositions or assume a robot-centric frame of reference, limiting the ability to understand diverse spatial relations that provide critical disambiguating information, such as behind the ticket booth.

We present Spatial Language Object-Oriented POMDP (SLOOP), which extends OO-POMDP (a recent work in our lab) by considering spatial language as an additional perceptual modality. To interpret ambiguous, context-dependent prepositions (e.g. behind), we design a simple convolutional neural network that predicts the language provider’s latent frame of reference (FoR) given the environment context.

We apply SLOOP to object search in city-scale environments given a spatial language description of target locations. Search strategies are computed via an online POMDP planner based on Monte Carlo Tree Search.

Evaluation based on crowdsourced language data, collected over areas of five cities in OpenStreetMap, shows that our approach achieves faster search and higher success rate compared to baselines, with a wider margin as the spatial language becomes more complex.

Finally, we demonstrate the proposed method in AirSim, a realistic simulator where a drone is tasked to find cars in a neighborhood environment.

For future work, we plan to investigate compositionality in spatial language for partially observable domains.

See our video!

And download the paper!

Object Search in 3D

Robots operating in human spaces must find objects such as glasses, books, or cleaning supplies that could be on the floor, shelves, or tables. This search space is naturally 3D.

When multiple objects must be searched for, such as a cup and a mobile phone, an intuitive strategy is to first hypothesize likely search regions for each target object based on semantic knowledge or past experience, then search carefully within those regions by moving the robot’s camera around the 3D environment. To be successful, it is essential for the robot to produce an efficient search policy within a designated search region under limited field of view (FOV), where target objects could be partially or completely blocked by other objects. In this work, we consider the problem setting where a robot must search for multiple objects in a search region by actively moving its camera, with as few steps as possible.

Searching for objects in a large search region requires acting over long horizons under various sources of uncertainty in a partially observable environment. For this reason, previous works have used Partially Observable Markov Decision Process (POMDP) as a principled decision-theoretic framework for object search. However, to ensure the POMDP is manageable to solve, previous works reduce the search space or robot mobility to 2D, although objects exist in rich 3D environments. The key challenges lie in the intractability of maintaining exact belief due to large state space, and the high branching factor for planning due to large observation space.

In this paper, we present a POMDP formulation for multi-object search in a 3D region with a frustum-shaped field-of-view. To efficiently solve this POMDP, we propose a multi-resolution planning algorithm based on online Monte-Carlo tree search. In this approach, we design a novel octree-based belief representation to capture uncertainty of the target objects at different resolution levels, then derive abstract POMDPs at lower resolutions with dramatically smaller state and observation spaces.

Evaluation in a simulated 3D domain shows that our approach finds objects more efficiently and successfully compared to a set of baselines without resolution hierarchy in larger instances under the same computational requirement.

Finally, we demonstrate our approach on a torso-actuated mobile robot in a lab environment. The robot finds 3 out of 6 objects placed at different heights in two 10m2 x 2m2 regions in around 15 minutes.

You can download the paper here! The code is available at https://zkytony.github.io/3D-MOS/.

This demonstrates that such challenging POMDPs can be solved online efficiently and scalably with practicality for a real robot by extending existing general POMDP solvers with domain-specific structure and belief representation. This paper won RoboCup Best Paper Award!

Checklist for Paper Submissions

We all want our papers to be awesome. Here is the checklist I use before paper submission.

  • Make sure there are no latex errors. Especially if you are using Overleaf, it will make a pdf even if your latex has errors, but it may not render the way you expect. If you are building on a PC, sometimes latex errors show up far up in the lag, so check for undefined references or citations or anything like that.
  • Spellcheck!
  • Read the bibliography and make sure there are no typos/errors/missing fields there.
  • Make sure the file size is 1MB or less. If it is larger, it is probably because of included images. Make sure 1) the resolution of the images is not too large (e.g., max 800 on a side), and 2) that if it is a photograph you are using jpg (which is compressed) and not png (which is losslessly compressed). However if it’s a diagram with lots of white space, use png, or even pdf or svg which will be imported into the generated pdf as a vector graphics image rather than pixles, and be small but look good at any resolution. Taking these steps will not change how it looks, but will dramatically reduce file size, making it take less time for you to upload, for reviewers to download, and for everyone else to download the paper for the rest of its lifetime.
  • Make sure all authors know they are authors and are okay with submitting.
  • Read the conference guidelines for authors and make sure you follow them all, especially as related to space. baseline stretch is now getting explicitly disallowed so be careful!
  • If double blind, make sure 1) you are citing all your own previous work, but you are citing it anonymously (as if someone else wrote the paper) 2) there are no other bits that deanonymize you.
  • Check the acknowledgements and make sure you acknowledge all funding sources (if not double blind).

Scanning the Internet for ROS

Security is particularly important in robotics. A robot can sense the physical world using sensors, or directly change it with its actuators. Thus, it can leak sensitive information about its environment, or even cause physical harm if accessed by an unauthorized party. Existing work has assessed the state of industrial robot security and found a number of vulnerabilities. However, we are unaware of any studies that gauge the state of security in robotics research.

To address this problem we conducted several scans of the whole IPv4 address space, in order to identify unprotected hosts using the Robot Operating System (ROS), which is widely used in robotics research. Like many research platforms, the ROS designers made a conscious decision to exclude security mechanisms because they did not have a clear model of security threats and were not security experts themselves. The ROS master node trusts all nodes that connect to it, and thus should not be exposed to the public Internet. Nonetheless, our scans identified over 100 publicly-accessible hosts running a ROS master. Of those we found, a number of them are connected to simulators, such as Gazebo, while others appear to be real robots capable of being remotely moved in ways dangerous both to the robot and those around it.

As a qualitative case study, we also present a proof-of-concept “takeover” of one of the robots (with the consent of its owner), to demonstrate that an open ROS master indicates a robot whose sensors can be remotely accessed, and whose actuators can be remotely controlled.
This scan was eye-opening for us, too—we found two of our own robots as part of the scan, one Baxter [5] robot and one drone. Neither was intentionally made available on the public Internet, and both have the potential to cause physical harm if used inappropriately. Read more in our 2019 ICRA paper!

How to Fix a Broken Robot

This post describes steps for fixing a robot that is broken. Since robots are computers, many of the steps also apply to fixing a broken phone, tablet or PC, although the details differ.


The first step is to determine if the hardware itself is working. The hardware subsystems that most commonly fail are 1) the power subsystem 2) the compute subsystem (RAM/CPU/motherboard 3) the long-term storage subsystem (hard drive or SD card) . You want to therefore systematically check each subsystem (in that order) to verify that it is working correctly. Sometimes it is necessary to take the robot apart to check these subsystems. While it is rarely the case, it is possible that dust build-up inside your robot is causing ventilation issues. If you are pulling your robot apart and see a lot of dust, take a second to clear it out! Also, always make sure that when fixing your robot and the power is on, that you are near the e-stop. Be careful when taking the robot apart. Pieces can be very fragile and delicate and aren’t made for outside conditions. static charge can build on you, which is dangerous when getting near power systems to fix them. You should always wear an antistatic wrist strap when fixing power systems to ensure you don’t discharge any static into the robot. When doing this, be sure to keep track of all the pieces you unscrew, and it’s always good to have backups in case. Be careful with unscrewing, you don’t want to strip the screws.

Power Subsystem. The most common symptom of a power problem is that the robot or device just won’t turn on. This could be because the power supply failed, because the battery died, or maybe your wall outlet doesn’t have power. (Batteries have a limited number of charging cycles; they wear out and need to be replaced!) Most robots or devices have some kind of LED that indicates that power is being transmitted to the device, and the location, color, and meanings of these LEDs is often documented in the device spec sheet or on the board itself. For example, the Raspberry Pi Model B+ has two LEDs, labeled ACT (activity) and PWD (Power), which you can see in this picture. If you can’t verify power is coming to the device, get a multimeter and check that the voltage coming from the battery or power supply is what is specified in the datasheet. You might have to take the device apart to access the power supply; you might have to do some digging to find the data sheet. For example, here is the information on the voltages required to power the Raspberry Pi. Don’t forget to check the wall! Maybe the wall outlet isn’t receiving power, and your robot is fine. It’s also possible a short-circuit has occurred, and something is fried. In this case, you’ll most likely need to replace a fuse, wire, etc. One useful way to quickly check for this is the “smell test”: if the inside of your robot smells slightly smokey or burned, you most likely had a short and should look for something that looks burned.

Compute Subystem. The compute subsystem fails less frequently than the other two. However it is next in the debugging process because it is necessary for checking if your long-term storage system is working. Once a computer is powered on, it conducts a power-on self-test (POST). This POST occurs immediately after powering on, before you boot into your operating system (which requires access to long-term storage). It verifies that each of the hardware components of the machine are working. A PC that fails its post will make strange BIOS beep codes, and you need to figure out what the beep codes mean. This requires figuring out exactly which BIOS you have by looking up the specifications for that PC, or opening it up to look at the motherboard. Then the documentation for the BIOS will indicate what the beep codes mean. The Raspberry PI doesn’t have beep codes; instead it has LED flash codes to indicate different errors.

Of course, the best way to get more information about what’s going on with the POST is to plug in a monitor and keyboard into the computer. This may already be true if you are working with a PC, but if you are working with a robot, it may not have a monitor plugged in by default. The BIOS beep codes and LED flashes indicate problems without needing a monitor, but plugging in a monitor will show exactly what’s going on. Most robots support this somehow; for example our MOVO robot has an HDMI port on the bag to plug into. The Raspberry PI Model B+ also has an HDMI port, and many times my students are surprised to realize that you can plug in a monitor and keyboard and suddenly their quadrotor drone is a PC! Once you’ve verified the POST ran, you can be fairly confident the CPU, motherboard, and RAM are working. Or if not, it will indicate what’s wrong and you can try replacing those components.

Long-term Storage Subystem. If the POST tests succeed, the next step is for the computer to boot into its operating system, which is held on the long-term storage device. Hard drives and SD cards are another common failure point. Both have a limited number of read/writes before they will fail. Hard drives can have bad sectors, parts of the disk that are permanently bad, and can fail entirely. You want to verify that your robot is able to read the long-term storage and boot into its operating system. One of the simplest ways to do this is to plug it into a keyboard and mouse and see if it boots up! This allows you to watch the whole boot process and see if it completes successfully and enables checking other problems too, like networking and software issues. Fortunately the hard drive or SD card is relatively easy to replace, if you backed up your data. In many cases there may exist a standard disk image for the system you are working with; for example the SD card in our Kuri robot failed, but we were able to restore it using an image we could download from their website.


The next major goal is to connect to your robot over the network. You don’t want to start working on this until you are reasonably sure the robot is passing its POST test and booting into its operating system. The robot’s wifi can be configured in either Master mode or Managed mode. In Master mode, it will act as its own wifi hot spot. You will see an SSID on your base station laptop or desktop that corresponds to the robot’s network. In this mode if you connect to the robot (or PC), it will give your base station an IP address. You can find the robot’s IP address by using a tool like nmap to scan, or route to look at the gateway machine, or look at your base station’s logs to find the IP address of the DHCP server. Second, it can be configured in Managed mode, where the robot is looking for a wireless network. Typically in this case it is configured to look for a network with a particular SSID, and this configuration can be stored on the hard drive. Sometimes for commercial robots, there is a process where this is configured via an app. For example, I recently installed the Mysa smart thermostat. First my phone asked which network I wanted the Mysa to connect to. Then, it connected to the Mysa wireless network, where the Mysa itself was the AP master, and neither my phone nor Mysa had internet access. Then the phone tells Mysa what SSID and password to use, and then both devices connect to that SSID (the “house” internet), and can talk to each other. The Kuri robot works the same way. Third, the robot can be configured to use a fixed, static IP address. Many home routers work this way; in this case the static IP should hopefully be documented in the robot’s or device’s documentation. This is the cause of many issues: setting a static IP but the IP of the robot gets dynamically changed via DHCP server. A simple test is to make sure you can ping your robot, and that your robot can ping you. Sometimes the networking problem comes from firewalls. Make sure the firewalls aren’t stopping networking traffic your robot needs, but don’t just take them completely down; a full discussion of this is beyond the scope of this article, but note that we scanned the internet for robots and found a lot! You can configure your base station to use a different static IP address in the same subnet, and they should be able to connect. You can also snoop by using the ARP cache or network snooping tools, but that is beyond the scope of this article. If your network setup uses ethernet cables, make sure there are no issue with the wires. They can break quite easily without it being obvious. You should also be aware of your networking bandwidth/latency. If your robot is jittering around or sending incomplete sensor data, you may need to throttle things or swap from wifi to a wired connection.


Next you want to make sure the software on your robot is working. The details of this are also beyond the scope of this article, but at a high level, you want to make sure that 1) the software to make the robot move started up without errors and 2) that it can connect to whatever it needs to connect to to do its job. Typically this means using tools like ps to show running processes, checking log files to see if there are errors on startup, and writing and running client programs to see if each of the programs is running. The key is to be systematic; check each sensor or actuator subsystem on the robot to verify it started and is working, because you may find other hardware issues in this process. Typically each sensor comes with its own drivers and minipackages for checking if they work, and these drivers are used within a larger system like ROS to connect things together. Always make sure that the drivers for your sensors work on your robot platform first. For example, a common problem on our drones is being unable to connect to the flight controller so the drone can’t send a command to arm or spin its motors, because the USB socket has failed. ROS does not make this debugging process easy, because people often configure their robot to use a single roslaunch file to start the entire robot stack, making it easy to miss errors in one subsystem or another, and hard to audit a substack. For example, to debug our MOVO robot, we manually started the MoveIt stack from the command line to see the error logs and try different configurations to fix problems.

Software Dependencies.

Calibration. A very common source of errors is miscalibration of some kind. For example, our MOVO robot has a Kinect 2 sensor mounted on its head. Even though we carefully calibrated the sensor about one year ago, it got knocked or moved in that time, and when we checked it yesterday it was off by quite a bit. If your robot does not know where its sensors are relative to its base frame, then it cannot effectively use the sensor data. To check this, use a tool such as RVis to visualize the sensor data overlayed on the robot’s model, or use some kind of fiducial such as AprilTags to check the calibration or recalibrate.