Generating images of traffic with Restricted Boltzmann Machines
As part of a traffic object classification system, Restricted Boltzmann Machines (RBMs) can be used to generate images of traffic. This post isn’t meant to explain why it’s helpful to generate images, just how the generation procedure itself works.
A real car image
A generated image
A single RBM takes a binary input vector and produces a single activation probability. It can also be used in reverse; going from an activation state to an input probability. An extension for continuous input data (known as a cRBM) is necessary for RGB images as used in this example.
The activation probability p for a single RBM is calculated by multiplying the input vector b by the RBM’s weight vector w and taking the logistic function of the result:
where σ is the logistic function:
A single RBM is completely represented by its weight vector w. In practice, an array of RBMs is trained in order to reproduce some training set. Typically, a much smaller quantity of RBMs is trained relative to the number of training images. If the RBMs can learn an efficient representation of the input samples, an image can be reproduced by calculating the response for each RBM in the sets and re-sampling the inputs based on RBM activation. The state of RBM activations in response to a single image can be viewed as a compression of the input image. A 30×30 RGB image (a 2700-length vector) is converted to a hidden-layer activation vector of length 50, in the case of a 50-hidden unit RBM set.
See here for an introduction to RBMs in general:
Introduction to Restricted Boltzmann Machines
And a more detailed guide here:
A Practical Guide to Training Restricted Boltzmann Machines
By setting an RBM’s activation state to ‘on’ and rendering its input probabilities as an image, it’s possible to visualize the RBM’s response.
Single RBM Weight Visualization
50 RBM Weight Visualizations
A set of 50 hidden-state RBMs was trained on 967 training images for 2 hours on an i5. Here are the full set of weight visualizations after training:
Here is a sample of the training images that were used to calculate the weights above:
And here is a sample of reconstructed input images:
Sometimes it’s helpful to see the RBM’s reconstruction of a single image. Here is a set of inputs with their corresponding reconstructions:
The reconstructions aren’t particularly accurate but they’re good enough for our purposes. Ultimately, the RBM array’s hidden activations will be used as inputs to a neural network for classification. The important thing is that the RBM reconstruction retains enough information to separate cars, background, trucks and so on. Rendering the RBM array weights and reconstructions mostly serves as a sanity check that the RBM’s hidden activations are a good enough compression of the input space to distinguish different object types.