Project 11 - Path Planning

The path planning algorithm (trajectory - green dots, behaviour - terminal printout) successfully detects two slower moving cars and passes them.

The path planning algorithm (trajectory - green dots, behaviour - terminal printout) successfully detects two slower moving cars and passes them.

Below is a great flow charts of the autonomous vehicle stack. I'll give a brief overview of all the modules which not only puts this project into perspective, but the entire nanedegree up to this point.

Motion Control - This module is responsible for actuating the throttle, brakes, and steering wheel. As shown in the chart it receives trajectories to follow from the Trajectory module. The last two projects of term 2 covered PID and MPC which are two main control designs used to handle motion control.

Sensor Fusion - This module is responsible for pulling in the information from all the different sensors so that they can be used together. 

Localization - This module is responsible for determining the cars position in the world.  It gets data from Sensor Fusion and the both the particle filter and Kalman filter techniques seen in the term 2 projects are used for localization.

av-stack.png

Trajectory - This module will generate trajectories that allow it to follow the commands of the Behaviour module. The relationship between the trajectory (and control) module and the behaviour module is similar to the relationship between a human driver and another person navigating in the passenger seat. The behaviour module (navigator) gives broad commands like "turn left here" or "move into the right lane" and the trajectory module (driver) generates trajectories to follow those commands.

Designing this module made up the bulk of the Path Planning project. As seen in the gif above, the green dots are the trajectory generated while the terminal printout is the commands from the behaviour module. The trajectories are designed using a polynomial solver on a jerk (change in acceleration) minimization equation. The jerk minimization equation is a function with time as an input and location as an output with 6 coefficients. When the polynomial solver is fed with the 6 boundary conditions (cars current location, velocity, acceleration and the end location, velocity, acceleration) the solver will determine the 6 coefficients. When feeding increasing units of time into the equation with the determined coefficients, it will generate a trajectory that achieves the boundary conditions with minimal jerk.  This module also takes information from the Prediction module as it needs to verify that the trajectory it generates won't collide with any of the predicted movements of other cars or pedastirains in the area.

Prediction - This module takes in data from both Localization and Sensor Fusion and uses it too try and predict what all the other vehicles or pedestrians next actions will be. For example after measuring the location of another car multiple times it's speed can be determined and thus its movement can be predicted. This module can get very complex as it tries to predict how cars and pedestrians will react not only to each other but also the movements of your own vehicle. This is why the prediction module also takes in data from the Behaviour module.

Behaviour - In the example of a human driver and a navigator in the passenger seat, this module is the navigator. The behaviour module takes information from both the Localization and Prediction modules and makes long term decisions for where the car should go, and what high level actions it needs to perform to get there. For this project I used a finite state machine to switch between different behaviour modes. For example if the road was clear, the car would be in the keep lane state. However if another car was detected in the same lane and going slower in the same lane, the car would move to the prepare lane change state. From this state the car would determine if it was safe to make a lane change. Once it was considered safe, the car would change to the lane change state which would execute the lane change. After this it would revert back to the keep lane state. 

Finite state machines are no longer used in real autonomous vehicles because they can quickly grow in complexity and are hard to maintain while not being totally sufficient. Currently, autonomous vehicles try to parameterize all costs and behaviours as much as possible so that all actions can be fine tuned. Waymo for example, parameterized all aspects of its cars high level decision making and then runs parameter optimization techniques (such as gradient descent) when driving millions of miles in it's simulator. This allows these decision parameters to become very finely tuned for the maximum amount of efficiency and safety.


Project 12 - Semantic Segmentation

Performance of the FCN determining which pixels belong to a drivable road surface.

Performance of the FCN determining which pixels belong to a drivable road surface.

A typical convolutional neural network will consist of a series of convolution layers followed by fully connected layers and then finally a type of activation function. This type of network architecture is good for classification problems such as determining if a specific item can be found in an image. However if you want to determine where in an image a specific item can be found, you need to use something different.

Model structure of a FCN showing the increasing depth during the encoder layers and the decreasing depth during the decoder layers. 

Model structure of a FCN showing the increasing depth during the encoder layers and the decreasing depth during the decoder layers. 

A Fully Convolutional Network (FCN) starts with a series of convolution layers but instead of using fully connected layers, it uses transposed convolutions. Convolution layers reduce the height and width of an image but increase it's depth. Transposed convolution layers decrease the depth while reconstructing the height and width of the image. This means that the output layer of this network is an image the same size as the starting image. Generally the first half of the network is called the encoder, and the second half of the transposed convolutions is called the decoder. In the center, a 1x1 convolution layer is used which will retain all of the spatial information of the previous convolution where as a standard fully connected layer would lose this information.

For this project our goal was to build a FCN that was capable of determining which pixels belonged the drivable road surface. The Kitti Road Dataset was used to train the network.

 


Final Project - System integration

Blue circle is the particle filter guess, and the blue car is the true car position.

Blue circle is the particle filter guess, and the blue car is the true car position.

For the final project of the nanedegree, we grouped into teams and had to Robot Operating System (ROS) nodes so that we could run code on a real car. The car would be operated on a test track and required to follow waypoints in a large circle. However at one section of the track there is a traffic light. If the light is green the car is required to continue driving around the circle. If the light is red the car is required to stop and wait for the car to turn green.

The major tasks for this project are: Perception - detect the traffic light state, Planning -  load the waypoints and update waypoints based on car position and traffic light state, Control -  actuate the throttle, brake, and steering wheel to follow the waypoints.

Since this was a group project I only had to work on one area. Since I have a masters degree in controls and the planning problem didn't seem that interesting I decided to work on the perception problem. The approach I ended up deciding on was using a pre-trained region based fully convolutional network (R-FCN) and fine tuning it for the traffic light that would be used for on the test track. This would generate a bounding box that I could then use computer vision on determine the state of the traffic light.

I didn't go for an end to end deep learning solution because the traffic light used on the test traffic was not standard and had odd colors for both red and yellow. The colors for both of those light states were identical and when I tried an end to end approach it had trouble determining the state between those two light colors. So using the network to generate a bounding box I then used OpenCV to determine the location of the brightest pixels and by determining where that was within the bounding box, I could consistently determine the traffic light state.

Since I was using the TensorFlow Object Detection API (which was just released in June of this year) I found it very difficult to find guides and documentation on how to use it. I found a few blog posts but they were pretty sparse when it came to the actual details of using the API. Thus I created my own step by step object detection API guide which can be found below.

Step by Step TensorFlow Object Detection API Tutorial

Part 1 - Selecting a Model

Part 2 - Converting Existing Dataset to TFRecord

Part 3 - Creating Your Own Dataset

Part 4 - Training the Model

Part 5 - Saving and Deploying a Model