GTSRB

Classifying german traffic signs

Having reached something like a checkpoint in the cs231n lecture plan I decided to train a CNN to detect German street signs before going on with recurrent neural networks. Tensorflow will be employed and the network shall be deployed with tensorflow.js.

I am going to use the German Traffic Sign Benchmark whose leaderboard is currently lead by a team from the University of Seville where I spent an Erasmus semester.

Load the data

The first step (as always) is to load the data. Fortunately the dataset came with a “getting started” script in python which I extended to scale the images to ndarrays IMAGE_SIZExIMAGE_SIZEx3.

% matplotlib inline

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import csv
from skimage.transform import resize

# Cifar-10 was only 32x32 pixels. And it already took a vast amount
# of training. Therefore I am going to scale all images to:
IMAGE_SIZE = 32


# function for reading the images
# arguments: path to the traffic sign data, for example './GTSRB/Training'
# returns: list of images, list of corresponding labels 
def readTrafficSigns(rootpath, limit=None):
    '''Reads traffic sign data for German Traffic Sign Recognition Benchmark.

    Arguments: path to the traffic sign data, for example './GTSRB/Training'
    Returns:   numpy array of shape(number_images, 32x32x3), list of corresponding labels'''
    images = []
    labels = [] # corresponding labels
    # loop over all 43 classes
    for c in range(0,43):
        print('Class', c)
        prefix = rootpath + '/' + format(c, '05d') + '/' # subdirectory for class
        gtFile = open(prefix + 'GT-'+ format(c, '05d') + '.csv') # annotations file
        gtReader = csv.reader(gtFile, delimiter=';') # csv parser for annotations file
        gtReader.next() # skip header
        # loop over all images in current annotations file
        # until limit
        class_limit = float('inf') if limit is None else float(limit)
        class_counter = -1
        for row in gtReader: 
            class_counter += 1
            if class_counter >= class_limit:
                break
                
            img_resized = resize(plt.imread(prefix + row[0]), (IMAGE_SIZE, IMAGE_SIZE), anti_aliasing=False)
            img_flat = img_resized.reshape(1, -1)
            images.append(img_flat)
            # images.append(img_resized) # the 1th column is the filename
            labels.append(row[7]) # the 8th column is the label
        gtFile.close()
    return np.vstack(images), np.asarray(labels).astype(np.uint8)


trainImages, trainLabels = readTrafficSigns('GTSRB/Final_Training/Images', limit=None)
plt.imshow(trainImages[42].reshape((IMAGE_SIZE,IMAGE_SIZE,3)))
plt.show()

The next step would be to declare some settings and convert the images to data types which are compatible with tensorflow:

# convert the data types
trainImages = trainImages.astype(np.float16)
trainLabels = trainLabels.astype(np.int32)

# Basic settings
NUM_CLASSES = 43
BATCH_SIZE = 16
N_TRAINING_DATA = trainImages.shape[0]
EPOCHS = 2
# using python3 ;)
TRAINING_STEPS = N_TRAINING_DATA // BATCH_SIZE

The following histogram shows the size of the images and justifies the opinion why scaling all images to 32x32 pixels could be alright.

The first model

My first model was my (the starter) CIFAR-10 CNN. It consisted of two convolutional layers with max-pooling, a dense layer and softmax in the end. The input images were of size 32x32x3 and SGD was used to minimize the loss.

Screenshot from 2018-07-23 09-53-44.png

After training for an hour or two the learning curve got too flat so I stopped the training process.

Screenshot from 2018-07-23 20-27-08.png

The second model

My first guess was that the model just was not capable enough so I took a look at VGG net, reduced the size of the conv filter to 3x3 and made the model deeper (4 convolutional layers with more filters, image size 64x64, an additional dense layer with 2048 units.

It felt like crypto mining on my laptop. Every step of that network took more than 20 seconds but it was late so I decided to stop for the day, dim the brightness and let the network train…

72 hours later! :D

So I came back, refreshed the Tensorboard and realized, that the model was not at the accuracy at which the smaller model was after 1 hour of training.

Screenshot from 2018-07-26 23-10-12.png

I stopped the training… Although my hardware is really limited and I have to do something about this, my model has to be the primary problem. But how should I know what’s wrong if I can’t “see” into the model.

Adding more summaries to the Tensorboard

Adding summaries is not as easy as I thought because of some reasons:

I am using tensorflow’s higher level tf.layers API so it is not straight forward to get the variables I am looking for
What variables should I plot? What is meaningful? Is it already a problem if the loss does not go down or should I plot the L2 norm of the weights?
How to visualize the convolutional filters?

I imagined there to be some easy to use builtin methods or just something like “… tf.layers.conv2d(visualize=True…)” option but that is not the case.

My first conv layer ended up looking like this:

# Convolutional Layer #1
    conv1 = tf.layers.conv2d(
      inputs=input_layer,
      filters=32,
      kernel_size=[3, 3],
      padding="valid",
      activation=tf.nn.relu,
      name='conv1'
    )

    with tf.variable_scope('conv1', reuse=tf.AUTO_REUSE) as scope_conv:
        weights = tf.get_variable('kernel', dtype=tf.float16)

        # scale weights to [0 255] and convert to uint8 (maybe change scaling?)
        x_min = tf.reduce_min(weights)
        x_max = tf.reduce_max(weights)
        weights_0_to_1 = (weights - x_min) / (x_max - x_min)
        weights_0_to_255_uint8 = tf.image.convert_image_dtype (weights_0_to_1, dtype=tf.uint8)

        # to tf.image_summary format [batch_size, height, width, channels]
        weights_transposed = tf.transpose (weights_0_to_255_uint8, [3, 0, 1, 2])

        # this will display random 3 filters from the 64 in conv1
        tf.summary.image('conv1/filters', weights_transposed, max_outputs=3)

The third model

I reduced the complexity of the model and introduced Adam in order to minimize the cross_entropy. Let’s see whether that helps. If not I have got some more summaries to look into…

Screenshot from 2018-07-27 01-18-57.png

The weight visualization looks suspicious to me. In the first run every filter is really colorful and after that everything is dead? This behaviour is not really probable.

Screenshot from 2018-07-27 01-31-20.png

The accuracy is 0.005 right now. I am going to wait whether something is going to happen here. Okay, I see that if every filter is black that the network cannot learn anything though. Maybe Adam has to be initialized in a special way…

But for right now I am switching to SGD with momentum with a large learning rate: optimizer = tf.train.MomentumOptimizer(0.01, 0.9)

…

After 4 hours of training the accuracy was at 65% and the curve still pointed up slightly.

Screenshot from 2018-07-27 11-10-23.png

Data set augmentation

I wanted to try data set augmentation. What a coincidence that I started with randomly flipping images and realize that that’s a bad idea with traffic signs because if you mirror some of them along a vertical axis you also change the meaning. As it is visible in the following image: Traffic signs are not symmetric per se.

Screenshot from 2018-07-27 12-22-22.png