We made a video of Baxter interpreting multimodal referring expressions using our multimodal Bayes filter. Our system interprets referring expressions in real time, outputting a distribution over objects at 14Hz.
https://www.youtube.com/watch?v=ZFTUcHKc1YM