Together we build Service Robots
Starting from 2006, RoboCup @Home has been the largest international annual competition for autonomous service robots as part of the RoboCup initiative. The challenge consists of a set of benchmark tests to evaluate the robots' abilities and performance in a realistic non-standardized home environment setting. It has greatly fostered artificial intelligence development in various domains including human-robot interaction, navigation and mapping in dynamic environments, computer vision, object recognition and manipulation, and many more developments on robot intelligence.
However, it is observed that the development curve of the RoboCup @Home teams have a very steep start. The amount of technical knowledge and resources (both manpower and cost) required to start a new team has made the event exclusive to only established research organizations. For this reason, our team had initiated the development of an open source robot platform for RoboCup @Home in 2013. The goal of the project is to develop a basic robot platform to facilitate startup team for the participation in RoboCup @Home. It is developed based on open source solutions for both hardware and software developments for low cost and large community support to facilitate startup of the novice teams.
The technical challenge of this work is the reduction of complexity and standardization of robot system requirements, while not to compromise too much of the technical challenges intended in RoboCup @Home. However, the impact of this work is believed to significantly promote the participation of RoboCup @Home league to foster service robot development.
The open robot platform has a current basic robot hardware configuration for fundamental robot platform and add-on modular component systems for customized applications. For example, a manipulator system (with top vision) and an extended top vision system are added to the hardware configurations during RoboCup Japan Open 2015 and RoboCup 2015 Hefei for the applications in Restaurant task and Follow Me task.
TutleBot as the basic robot hardware platform. TurtleBot is a low cost (basic kit is approximately USD 1,000), personal robot kit with close integration to popular open source software, ROS (Robot Operating System). The open source robot kit is adapted as the basic mobile platform for this development. The vertical range of the mobile manipulation can be adjusted with an elevated arm with linear motor, a secondary vision system is paired with the robotic arm for object recognition in the manipulation tasks, and 3D printed parts for component systems. An interactive interface with speech and facial expressions is in development for human-robot interaction. A general laptop PC (currently working on a single board computer system) with speakers and microphone is served as the main robot controller.
ROS as the robot software framework. ROS (Robot Operating System) is an open source robot software framework with a large community to provide huge collection of robotic tools and libraries. With ROS as the fundamental software frame-work, this work will adapt and assemble ROS packages and stacks to realize the navigation, manipulation, vision and speech functions of the robot in order to per-form the tasks in RoboCup @Home.
Cloud-connected. The robot system is controlled by an onboard computer system as the main robot controller to ensure stable low-level controls. Furthermore, the computer system can be connected with cloud systems for extra computing (e.g. image processing), knowledge database (e.g. dialogue engine) and online resources (e.g. wearable data).
With the Kobuki and RGB-D sensor as the mobile base hardware configuration, the TurtleBot navigation package is used for robot navigation with map building using gmapping and localization with amcl, while running the navigation stack in ROS. With the prebuild map and predefined waypoint locations, we can then instruct the robot to travel to a specific goal location with path planning using actionlib.
Navigation in known and unknown environments (Help-me-carry). With the top second vision system configuration, we have developed the navigation system in known and unknown environments for Help-me-carry and Restaurant tasks. Based on the TurtleBot navigation package, we have combined it with the people tracking package, for online update of the map while following the operator in the unknown environment.
SLAM Map Building
Navigation in Known and Unknown Environments (Help-Me-Carry)
For human speech interaction, we use CMU Pocket Sphinx as our robot speech recognizer. It is a lightweight speech recognizer with a support library called Sphinxbase. We build our application with the latest version "sphinxbase-5prealpha". We use gstreamer to automatically split the incoming audio into utterances to be recognized, and offers services to start and stop recognition. The recognizer requires a language model and dictionary file, which can be automatically built from a corpus of sentences. For text-to-speech (TTS), we are using the CMU Festival system together with the ROS sound_play package.
In order to improve the speech recognition efficiency, we use a strategy for the robot to listen for activation keyword first, and once the keyword is recognized, it switches to ngram search to recognize the actual command. Once the command has been recognized, the robot can switch to grammar search to recognize the confirmation, and then switch back to keyword listening mode to wait for another command.
In order to improve the speech recognition accuracy in a noisy environment, we use SphinxTrain tool in the CMUSphinx to train the recordings of sentences taken in the noisy environment. The SphinxTrain tool extracts the sounds of the noisy environment by using a large number of the above recordings as a database. We use the obtained parameters to replace the original parameters for better speech recognition detection.
Sound source localization. Apart from human speech interaction, we have also tested sound source localization using HARK for possible people search when the person is speaking outside of the robot visual perception area.
Speech Interaction
Sound Source Localization by HARK
A second vision system is built on top of robot with RGB-D sensor for people/object detection and recognition.
Person recognition. We built a Convolutional Neural Network (CNN) with Tensorflow as well as Keras. When we need to add a new person into our database, we use a digital camera to take about 1000 pictures of this person, and then we use Haar-like face detection method to detect human face in each picture and add the face region of each picture to our database with its labels. After that we use a laptop with NVIDIA GTX1070 to run the CNN, so that we can get our own model to recognize the person.
Gender recognition using an online API. During the competition, we capture and upload the photo to the Baidu-AI cloud server, and we can get the gender recognition results labeled on the photo.
Object recognition system. We use YOLO (You Only Look Once) for object detection. In the Storing Groceries task, we use the RGB-D sensor for shelf detection, table detection and object detection. Before the competition, we take photos of the predefined object, and then we do labeling by adding annotation labels and bounding boxes for each image. We capture the images in different angle, different light con-dition as well as different background to ensure suitable generalization of our model to deal with the competition conditions.
Human gesture detection. In our human gesture detection system, we use CMU OpenPose as our skeleton detector. It is a real-time multi-person keypoint detection library for body, face, hands, and foot estimation. The OpenPose demo requires a RGB image and then returns the number of people as well as their skeleton positions. To get the human pointing direction, 3D coordinates of the wrist joint and elbow joint are necessary. We combine the OpenPose result and point cloud library (PCL) to get the positions described in the head RGB-D sensor coordinate system, then the TF matrix is used to transform them to the map coordinate system. Additionally, space vector method is used to calculate which point on the ground the human is pointing to.
Operator and Crowd Recognition
Real-Time Object Detection with YOLO
We are using TurtleBot Arm for object manipulation. It consists of 5 Dynamixel AX-12A servo motors, controlled by a USB2Dynamixel. We use MoveIt! as the arm software framework, and we have integrated the arm control with object detection by image processing for object manipulation. Once we recognized the object, we perform object localization by 3D point cloud to obtain the position of the object and calculate the inverse kinematic to make the movement of the arm to grasp the object. Also, by using MoveIt! we can plan the arm movement including obstacles avoidance to avoid collision with the surrounding objects.
Object Detection and Manipulation
Robot Arm Object Manipulation