Build a Traffic Sign Recognition Project
The goals / steps of this project are the following:
- Load the data set (see below for links to the project data set)
- Explore, summarize and visualize the data set
- Design, train and test a model architecture
- Use the model to make predictions on new images
- Analyze the softmax probabilities of the new images
- Summarize the results with a written report
Link to my project code
Dataset given has been broken down into training, validation and testing sub sets.
- Total dataset: 51839
- Number of training examples = 34799 (67.13% of total)
- Number of validation examples = 4410 (8.51% of total)
- Number of testing examples = 12630 (24.36% of total)
- Image data shape = (32, 32, 3)
- Number of classes = 43
Signs:
The following bar chart shows data distribution in the training set:
Top 6 occurring categories in the training set:
Category # | Name | Count |
---|---|---|
2 | Speed limit (50km/h) | 2010 |
1 | Speed limit (30km/h) | 1980 |
13 | Yield | 1920 |
12 | Priority road | 1890 |
38 | Keep right | 1860 |
10 | No passing for vehicles over 3.5 metric tons | 1800 |
Least 6 occurring categories in the training set:
Category # | Name | Count |
---|---|---|
0 | Speed limit (20km/h) | 180 |
37 | Go straight or left | 180 |
19 | Dangerous curve to the left | 180 |
32 | End of all speed and passing limits | 210 |
27 | Pedestrians | 210 |
41 | End of no passing | 210 |
- Earlier, YCbCr color space was used thinking that this particular color space provides a more natural way of representing luminosity and image colors.
-
But later on RGB color space was used directly and that proved to be much better than YCbCr.
-
Normalization: In order to reduce the effect of brightness on image classification, all of the images have been normalized before training.
def normalize_rgb(data, mean = None, sigma = None):
print('Normalizing in RGB')
if mean == None or sigma == None:
print("Computing mean")
mean = np.mean(data)
sigma = np.std(data)
print('Before normalizing data[0,0,0,0]:', data[0,0,0])
data = data.astype(np.float32) - mean
data /= (sigma + 1e-7)
print('After normalizing data[0,0,0,0]:', data[0,0,0])
return data, mean, sigma
In order to normalize validation and testing sets, mean and standard deviation are not recomputed but the same ones that were computed from the training set is used.
Various executions of the model with different parameters, different color spaces and different architecutres, all resulted in overfitting of data. Hence, more data was generated from the given ones.
Also, since the distribution of data was not uniform (minimum 180 images of 'speed limit 20' versus maximum 2010 images of 'speed limit 50'), it was decided to generate images in such a way to make this distribution more unifrom.
Following functions were used for data generation:
- Scale Image to 36x36, then increase brightness, crop it back to 32x32
- Scale Image to 36x36, then decrease brightness, crop it back to 32x32
- Rotate Image randomly between 10 to 20 degrees
- Rotate Image randomly between -10 to -20 degrees
- Reduce image to 28x28
- Add random noise to the image
- Shift image 2 pixels to the left
- Shift image 2 pixels to the right
Some examples:
The difference between the original data set and the augmented data set is the following:
- Number of original training examples = 34799
- Extra generated: (49981, 32, 32, 3)
- Combined training set: (84780, 32, 32, 3)
Final model consisted of the following layers:
Layer | Description |
---|---|
Input | 32x32x3 RGB Normalized Image |
Convolution 5x5x3x6 | 1x1 stride, valid padding, outputs 28x28x6 |
RELU | |
Max Pooling | 2x2 stride, outputs 14x14x6 |
Convolution 5x5x6x16 | 1x1 stride, valid padding, outputs 10x10x16 |
RELU | |
Max Pooling | 2x2 stride, outputs 5x5x16 |
Flatten | Input: 5x5x16, Output: 400 |
Fully connected N1 | Input: 400, Output: 120 |
RELU | |
DROPOUT | Keep Probability: 0.5 |
Fully connected N2 | Input: 120, Output: 84 |
RELU | |
DROPOUT | Keep Probability: 0.5 |
Fully connected N3 | Input: 84, Output: 43 |
To train the model, the following parameters were used:
Component | Description |
---|---|
EPOCH | 400 |
Batch Size | 128 |
Learning Rate | 0.0008 |
Initial Weights | Xavier Initializer |
Dropout Keep Probability | 0.5 |
Optimizer | Adam Optimizer |
Cost Function | Softmax Cross Entropy |
Evaluation Mechanism | Best weights found in any EPOCH rather than last |
My final model results were:
- training set accuracy of 1.00
- validation set accuracy of 0.982
- test set accuracy of 0.968
An iterative approach was chosen to find the solution:
As a first step the original LeNet was used as that gives a good starting point. Only a maximum of about 0.90 accuracy on validation set was acheived. But training set accuracy was about 0.99 implying that it was overfitting.
Keeping the original LeNet, played around with different mu, sigma, Epochs and batch sizes but the validation accuracy did not go up.
It was interesting to find out that batch gradient algorithms do not work that well if the batch size is very high. In this particular case I tried using batch size from 128 to almost equal to all of the input size. As I moved higher the validation accuracy kept going down instead of moving up
Since YCbCr has a specific channel for luminosity, I tried using this instead of plain RGB as it provides an easier way to average out the brightness. I thought the network would have a better effect with the luminosity in a different channel and I would be normalizing the Y channel separately from the CbCr color channels. But this did not prove to be a good idea since it is clear from the following that some layers of C1 were not doing anything at all:
Following is the first layer of conv net for "30 Speed Limit" real world image:
Following is the first layer of conv net for "Priority Road" real world image:
Following is the first layer of conv net for "Pedestrian" real world image:
Following is the first layer of conv net for "Yield" real world image:
Following is the first layer of conv net for "Road Worker" real world image:
Changed back to RGB and the same C1 layer has the following outlook now:
Following is the first layer of conv net for "30 Speed Limit" real world image:
Following is the first layer of conv net for "Priority Road" real world image:
Following is the first layer of conv net for "Pedestrian" real world image:
Following is the first layer of conv net for "Yield" real world image:
Following is the first layer of conv net for "Road Worker" real world image:
Instead of using normally distributed weights, Xavier weight initializers have been used:
weights = {
'wc1': tf.get_variable("wc1", shape=[5,5,3,6], initializer=tf.contrib.layers.xavier_initializer()),
'wc2': tf.get_variable("wc2", shape=[5,5,6,16], initializer=tf.contrib.layers.xavier_initializer()),
'wn1': tf.get_variable("wn1", shape=[400,120], initializer=tf.contrib.layers.xavier_initializer()),
'wn2': tf.get_variable("wn2", shape=[120,84], initializer=tf.contrib.layers.xavier_initializer()),
'wn3': tf.get_variable("wn3", shape=[84,n_classes], initializer=tf.contrib.layers.xavier_initializer()),
}
The model was still overfitting, hence more data was generated using the following functions:
- Scale Image to 36x36, then increase brightness, crop it back to 32x32
- Scale Image to 36x36, then decrease brightness, crop it back to 32x32
- Rotate Image randomly between 10 to 20 degrees
- Rotate Image randomly between -10 to -20 degrees
- Reduce image to 28x28
- Add random noise to the image
- Shift image 2 pixels to the left
- Shift image 2 pixels to the right
Dropout layers were introduced in all intermediate layers including convnet layers. But this started underfitting badly.
Dropout layers were then kept in fully connected layers only. This reduced underfitting that resulted from dropout in convnet layers and then achieved the desired 0.93 result in validation set.
Poor performance on the loss function was noticed when only one dropout is used:
With two dropouts the accuracy and loss were much better:
- training set accuracy of 1.000
- validation set accuracy of 0.982
- test set accuracy of 0.968
Since it is achieving > 93% on test set, it seems to be performing quite reasonable.
Well to be honest, I don't think LeNet is the best solution for this. Recent papers clearly indicate better performance from GoogLeNet or Resnet on ImageNet so chances are that they would be better than LeNet on traffic sign classification as well.
However, LeNet is sufficient enough for this assignment as it is a much smaller network (computationally) comapred to recent other methods and provides quite good performance as well.
Instead of choosing from the web a fellow Udacity student, Sonja Krause-Harder, provided real life images. Five traffic signs were cropped out of these images:
The first image (30 Zone) might be difficult to classify because it has the word "Zone" written on the traffic sign where as the training set does not have this written:
The third image (pedistrian with woman and a child) will be near to impossible to detect successfully since nothing close to this was included in the training set:
It was included since it was a real world image found on the streets and I wanted to check what the system would classify this to be.
Here are the results of the prediction:
Image | Prediction |
---|---|
30 Zone | Speed limit (50km/h) |
Priority | Priority Road |
Pedestrian | Keep Right |
Yield | Yield |
Road Work | Road Work |
The model was able to correctly guess 3 of the 5 traffic signs, which gives an accuracy of 60%.
This does not match the accuracy we had on the test set BUT it might be due to the fact that the first sign (30 Zone) was not exactly the same kind of sign as was present in the training and test sets. And the fact that there was a new sign for pedestrians that did not match anything close to the pedistrian sign we had in training.
The code for making predictions on my final model is located in the 22nd cell of the Ipython notebook.
For the first image, the model is absolutely wrong on the prediction.
Sign | Probability |
---|---|
Speed limit (50km/h) | 1.00 |
Speed limit (30km/h) | 0.00 |
Wild animals crossing | 0.00 |
Speed limit (100km/h) | 0.00 |
Speed limit (80km/h) | 0.00 |
For the Second image, the model correctly predicts it as a priority road:
Sign | Probability |
---|---|
Priority road | 1.00 |
Traffic signals | 0.00 |
End of no passing by vehicles over 3.5 metric tons | 0.00 |
Speed limit (20km/h) | 0.00 |
Speed limit (30km/h) | 0.00 |
For the third image, the model incorrectly predicts it as keep right, which was expected since this image was never part of the training set however it was found as a real life sign on German roads:
Sign | Probability |
---|---|
Turn left ahead | 0.79 |
Go straight or right | 0.11 |
Ahead only | 0.10 |
Roundabout mandatory | 0.00 |
End of no passing | 0.00 |
For the fourth image, the model correctly predicts it as a Yield sign and it does that with a 1.0 probability:
Sign | Probability |
---|---|
Yield | 1.00 |
Speed limit (20km/h) | 0.00 |
Speed limit (30km/h) | 0.00 |
Speed limit (50km/h) | 0.00 |
Speed limit (60km/h) | 0.00 |
For the fifth image, the model correctly predicts it as a Road Work sign and it does that with a 1.0 probability:
Sign | Probability |
---|---|
Road work | 1.00 |
Bumpy road | 0.00 |
Speed limit (20km/h) | 0.00 |
Speed limit (30km/h) | 0.00 |
Speed limit (50km/h) | 0.00 |
Following is the first layer of conv net for "30 Speed Limit" real world image:
Following is the first layer of conv net for "Priority Road" real world image:
Following is the first layer of conv net for "Pedestrian" real world image:
Following is the first layer of conv net for "Yield" real world image:
Following is the first layer of conv net for "Road Worker" real world image:
-
The real world image of "30 Zone", which was a little different from the training set "30 Zone" images, should at least have the correct category in the top 5 but it failed to even come close. More investigation needs to be done to see where the bug is in the convolutional layers or maybe it is the way the image was shortened to 32x32
-
The network comes close to merging but it never fully merges and there is some variation even towards the end of 400th EPOCH
-
How effective is YCbCr compared to plain vanilla RGB, needs to be looked into