Traffic Sign Recognition

Build a Traffic Sign Recognition Project

The goals / steps of this project are the following:

Load the data set (see below for links to the project data set)
Explore, summarize and visualize the data set
Design, train and test a model architecture
Use the model to make predictions on new images
Analyze the softmax probabilities of the new images
Summarize the results with a written report

Data Set Summary & Exploration

Dataset given has been broken down into training, validation and testing sub sets.

Total dataset: 51839
Number of training examples = 34799 (67.13% of total)
Number of validation examples = 4410 (8.51% of total)
Number of testing examples = 12630 (24.36% of total)
Image data shape = (32, 32, 3)
Number of classes = 43

Signs:

Category	Name	Random Image
0	Speed limit (20km/h)
1	Speed limit (30km/h)
2	Speed limit (50km/h)
3	Speed limit (60km/h)
4	Speed limit (70km/h)
5	Speed limit (80km/h)
6	End of speed limit (80km/h)
7	Speed limit (100km/h)
8	Speed limit (120km/h)
9	No passing
10	No passing for vehicles over 3.5 metric tons
11	Right-of-way at the next intersection
12	Priority road
13	Yield
14	Stop
15	No vehicles
16	Vehicles over 3.5 metric tons prohibited
17	No entry
18	General caution
19	Dangerous curve to the left
20	Dangerous curve to the right
21	Double curve
22	Bumpy road
23	Slippery road
24	Road narrows on the right
25	Road work
26	Traffic signals
27	Pedestrians
28	Children crossing
29	Bicycles crossing
30	Beware of ice/snow
31	Wild animals crossing
32	End of all speed and passing limits
33	Turn right ahead
34	Turn left ahead
35	Ahead only
36	Go straight or right
37	Go straight or left
38	Keep right
39	Keep left
40	Roundabout mandatory
41	End of no passing
42	End of no passing by vehicles over 3.5 metric tons

2. Include an exploratory visualization of the dataset.

The following bar chart shows data distribution in the training set:

Top 6 occurring categories in the training set:

Category #	Name	Count
2	Speed limit (50km/h)	2010
1	Speed limit (30km/h)	1980
13	Yield	1920
12	Priority road	1890
38	Keep right	1860
10	No passing for vehicles over 3.5 metric tons	1800

Least 6 occurring categories in the training set:

Category #	Name	Count
0	Speed limit (20km/h)	180
37	Go straight or left	180
19	Dangerous curve to the left	180
32	End of all speed and passing limits	210
27	Pedestrians	210
41	End of no passing	210

Some sample images

Design and Test a Model Architecture

Preprocessing

Earlier, YCbCr color space was used thinking that this particular color space provides a more natural way of representing luminosity and image colors.

More details on YCbCr

But later on RGB color space was used directly and that proved to be much better than YCbCr.
Normalization: In order to reduce the effect of brightness on image classification, all of the images have been normalized before training.

def normalize_rgb(data, mean = None, sigma = None):
    print('Normalizing in RGB')
    if mean == None or sigma == None:
        print("Computing mean")
        mean = np.mean(data)
        sigma = np.std(data)
        
    print('Before normalizing data[0,0,0,0]:', data[0,0,0])
        
    data = data.astype(np.float32) - mean
    data /= (sigma + 1e-7)
    
    print('After normalizing data[0,0,0,0]:', data[0,0,0])
    
    return data, mean, sigma

In order to normalize validation and testing sets, mean and standard deviation are not recomputed but the same ones that were computed from the training set is used.

Various executions of the model with different parameters, different color spaces and different architecutres, all resulted in overfitting of data. Hence, more data was generated from the given ones.

Also, since the distribution of data was not uniform (minimum 180 images of 'speed limit 20' versus maximum 2010 images of 'speed limit 50'), it was decided to generate images in such a way to make this distribution more unifrom.

Following functions were used for data generation:

Scale Image to 36x36, then increase brightness, crop it back to 32x32
Scale Image to 36x36, then decrease brightness, crop it back to 32x32
Rotate Image randomly between 10 to 20 degrees
Rotate Image randomly between -10 to -20 degrees
Reduce image to 28x28
Add random noise to the image
Shift image 2 pixels to the left
Shift image 2 pixels to the right

Some examples:

The difference between the original data set and the augmented data set is the following:

Number of original training examples = 34799
Extra generated: (49981, 32, 32, 3)
Combined training set: (84780, 32, 32, 3)

2. Final Model Architecture

Final model consisted of the following layers:

Layer	Description
Input	32x32x3 RGB Normalized Image
Convolution 5x5x3x6	1x1 stride, valid padding, outputs 28x28x6
RELU
Max Pooling	2x2 stride, outputs 14x14x6
Convolution 5x5x6x16	1x1 stride, valid padding, outputs 10x10x16
RELU
Max Pooling	2x2 stride, outputs 5x5x16
Flatten	Input: 5x5x16, Output: 400
Fully connected N1	Input: 400, Output: 120
RELU
DROPOUT	Keep Probability: 0.5
Fully connected N2	Input: 120, Output: 84
RELU
DROPOUT	Keep Probability: 0.5
Fully connected N3	Input: 84, Output: 43

3. Model Training Parameters

To train the model, the following parameters were used:

Component	Description
EPOCH	400
Batch Size	128
Learning Rate	0.0008
Initial Weights	Xavier Initializer
Dropout Keep Probability	0.5
Optimizer	Adam Optimizer
Cost Function	Softmax Cross Entropy
Evaluation Mechanism	Best weights found in any EPOCH rather than last

4. Approach taken for finding the solution

My final model results were:

training set accuracy of 1.00
validation set accuracy of 0.982
test set accuracy of 0.968

An iterative approach was chosen to find the solution:

Original LeNet

As a first step the original LeNet was used as that gives a good starting point. Only a maximum of about 0.90 accuracy on validation set was acheived. But training set accuracy was about 0.99 implying that it was overfitting.

Weight changes to mu and sigma

Keeping the original LeNet, played around with different mu, sigma, Epochs and batch sizes but the validation accuracy did not go up.

It was interesting to find out that batch gradient algorithms do not work that well if the batch size is very high. In this particular case I tried using batch size from 128 to almost equal to all of the input size. As I moved higher the validation accuracy kept going down instead of moving up

Changed to YCbCr instead of RGB

Since YCbCr has a specific channel for luminosity, I tried using this instead of plain RGB as it provides an easier way to average out the brightness. I thought the network would have a better effect with the luminosity in a different channel and I would be normalizing the Y channel separately from the CbCr color channels. But this did not prove to be a good idea since it is clear from the following that some layers of C1 were not doing anything at all:

Following is the first layer of conv net for "30 Speed Limit" real world image:

Following is the first layer of conv net for "Priority Road" real world image:

Following is the first layer of conv net for "Pedestrian" real world image:

Following is the first layer of conv net for "Yield" real world image:

Following is the first layer of conv net for "Road Worker" real world image:

Changed back to RGB

Changed back to RGB and the same C1 layer has the following outlook now:

Following is the first layer of conv net for "30 Speed Limit" real world image:

Following is the first layer of conv net for "Priority Road" real world image:

Following is the first layer of conv net for "Pedestrian" real world image:

Following is the first layer of conv net for "Yield" real world image:

Following is the first layer of conv net for "Road Worker" real world image:

Weights changes to Xavier

Instead of using normally distributed weights, Xavier weight initializers have been used:

weights = { 
    'wc1': tf.get_variable("wc1", shape=[5,5,3,6], initializer=tf.contrib.layers.xavier_initializer()),
    'wc2': tf.get_variable("wc2", shape=[5,5,6,16], initializer=tf.contrib.layers.xavier_initializer()),
    'wn1': tf.get_variable("wn1", shape=[400,120], initializer=tf.contrib.layers.xavier_initializer()),
    'wn2': tf.get_variable("wn2", shape=[120,84], initializer=tf.contrib.layers.xavier_initializer()),
    'wn3': tf.get_variable("wn3", shape=[84,n_classes], initializer=tf.contrib.layers.xavier_initializer()),
}

More data was generated

The model was still overfitting, hence more data was generated using the following functions:

Scale Image to 36x36, then increase brightness, crop it back to 32x32
Scale Image to 36x36, then decrease brightness, crop it back to 32x32
Rotate Image randomly between 10 to 20 degrees
Rotate Image randomly between -10 to -20 degrees
Reduce image to 28x28
Add random noise to the image
Shift image 2 pixels to the left
Shift image 2 pixels to the right

Dropout Layer(s) Introduced

Dropout layers were introduced in all intermediate layers including convnet layers. But this started underfitting badly.

Dropout in the second Network Layer Only

Dropout layers were then kept in fully connected layers only. This reduced underfitting that resulted from dropout in convnet layers and then achieved the desired 0.93 result in validation set.

Poor performance on the loss function was noticed when only one dropout is used:

Dropout in the first & second Network Layers

With two dropouts the accuracy and loss were much better:

Final Result on Validation Set

training set accuracy of 1.000
validation set accuracy of 0.982
test set accuracy of 0.968

Since it is achieving > 93% on test set, it seems to be performing quite reasonable.

Why Do I believe LeNet is 'OK' for traffic sign classification BUT not the best

Well to be honest, I don't think LeNet is the best solution for this. Recent papers clearly indicate better performance from GoogLeNet or Resnet on ImageNet so chances are that they would be better than LeNet on traffic sign classification as well.

However, LeNet is sufficient enough for this assignment as it is a much smaller network (computationally) comapred to recent other methods and provides quite good performance as well.

Test a Model on New Images

1. Real life german traffic sign images:

Instead of choosing from the web a fellow Udacity student, Sonja Krause-Harder, provided real life images. Five traffic signs were cropped out of these images:

The first image (30 Zone) might be difficult to classify because it has the word "Zone" written on the traffic sign where as the training set does not have this written:

The third image (pedistrian with woman and a child) will be near to impossible to detect successfully since nothing close to this was included in the training set:

It was included since it was a real world image found on the streets and I wanted to check what the system would classify this to be.

2. Model's performance on real world images

Here are the results of the prediction:

Image	Prediction
30 Zone	Speed limit (50km/h)
Priority	Priority Road
Pedestrian	Keep Right
Yield	Yield
Road Work	Road Work

The model was able to correctly guess 3 of the 5 traffic signs, which gives an accuracy of 60%.

This does not match the accuracy we had on the test set BUT it might be due to the fact that the first sign (30 Zone) was not exactly the same kind of sign as was present in the training and test sets. And the fact that there was a new sign for pedestrians that did not match anything close to the pedistrian sign we had in training.

3. Model Prediction Certainity

The code for making predictions on my final model is located in the 22nd cell of the Ipython notebook.

For the first image, the model is absolutely wrong on the prediction.

Sign	Probability
Speed limit (50km/h)	1.00
Speed limit (30km/h)	0.00
Wild animals crossing	0.00
Speed limit (100km/h)	0.00
Speed limit (80km/h)	0.00

For the Second image, the model correctly predicts it as a priority road:

Sign	Probability
Priority road	1.00
Traffic signals	0.00
End of no passing by vehicles over 3.5 metric tons	0.00
Speed limit (20km/h)	0.00
Speed limit (30km/h)	0.00

For the third image, the model incorrectly predicts it as keep right, which was expected since this image was never part of the training set however it was found as a real life sign on German roads:

Sign	Probability
Turn left ahead	0.79
Go straight or right	0.11
Ahead only	0.10
Roundabout mandatory	0.00
End of no passing	0.00

For the fourth image, the model correctly predicts it as a Yield sign and it does that with a 1.0 probability:

Sign	Probability
Yield	1.00
Speed limit (20km/h)	0.00
Speed limit (30km/h)	0.00
Speed limit (50km/h)	0.00
Speed limit (60km/h)	0.00

For the fifth image, the model correctly predicts it as a Road Work sign and it does that with a 1.0 probability:

Sign	Probability
Road work	1.00
Bumpy road	0.00
Speed limit (20km/h)	0.00
Speed limit (30km/h)	0.00
Speed limit (50km/h)	0.00

Network Visualization

Following is the first layer of conv net for "30 Speed Limit" real world image:

Following is the first layer of conv net for "Priority Road" real world image:

Following is the first layer of conv net for "Pedestrian" real world image:

Following is the first layer of conv net for "Yield" real world image:

Following is the first layer of conv net for "Road Worker" real world image:

Short Commings

The real world image of "30 Zone", which was a little different from the training set "30 Zone" images, should at least have the correct category in the top 5 but it failed to even come close. More investigation needs to be done to see where the bug is in the convolutional layers or maybe it is the way the image was shortened to 32x32
The network comes close to merging but it never fully merges and there is some variation even towards the end of 400th EPOCH
How effective is YCbCr compared to plain vanilla RGB, needs to be looked into

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Traffic Sign Recognition

Data Set Summary & Exploration

2. Include an exploratory visualization of the dataset.

Some sample images

Design and Test a Model Architecture

Preprocessing

2. Final Model Architecture

3. Model Training Parameters

4. Approach taken for finding the solution

Original LeNet

Weight changes to mu and sigma

Changed to YCbCr instead of RGB

Changed back to RGB

Weights changes to Xavier

More data was generated

Dropout Layer(s) Introduced

Dropout in the second Network Layer Only

Dropout in the first & second Network Layers

Final Result on Validation Set

Why Do I believe LeNet is 'OK' for traffic sign classification BUT not the best

Test a Model on New Images

1. Real life german traffic sign images:

2. Model's performance on real world images

3. Model Prediction Certainity

Network Visualization

Short Commings

Files

README.md

Latest commit

History

README.md

File metadata and controls

Traffic Sign Recognition

Data Set Summary & Exploration

2. Include an exploratory visualization of the dataset.

Some sample images

Design and Test a Model Architecture

Preprocessing

2. Final Model Architecture

3. Model Training Parameters

4. Approach taken for finding the solution

Original LeNet

Weight changes to mu and sigma

Changed to YCbCr instead of RGB

Changed back to RGB

Weights changes to Xavier

More data was generated

Dropout Layer(s) Introduced

Dropout in the second Network Layer Only

Dropout in the first & second Network Layers

Final Result on Validation Set

Why Do I believe LeNet is 'OK' for traffic sign classification BUT not the best

Test a Model on New Images

1. Real life german traffic sign images:

2. Model's performance on real world images

3. Model Prediction Certainity

Network Visualization

Short Commings