Anomaly Detection via Convolutional Autoencoders
Anomaly detection is an important quality control method used in many industries. Two areas where this technique is used are in the manufacturing of new products and the inspection of existing goods. With recent advances in hardware and state-of-the-art deep learning techniques, much of this can be automated. Here we present a recent effort to detect anomalous objects from raw pixels in a pill capsule manufacturing setting.
We would like to automatically detect the presence of foreign objects in a manufacturing setting for capsules. The capsules move along a conveyor belt and are visible for inspection. A camera, positioned above the belt, points straight down and captures a number of images per second as the pills move along with the belt. Figure 1 shows three images from this scenarios: pills on the belt with no anomaly (left), with a foreign object present, and with an anomaly present (right).
We would like to create a model which analyzes every frame taken by the camera, computes the location and likelihood of anomalies, and presents this information in a human-parsable format.
To capture these anomalies, we are going to train a convolutional autoencoder (CAE) to reproduce our training images. A CAE is a neural network consisting of two parts: an encoder, which transforms the original image into an encoding of lower dimensionality via a series of convolutional layers, and a decoder, which transforms the encoding back to a tensor with the same size as the original image via a series of transposed convolutional layers. A high-level representation of this type of network is shown in Figure 2.
CAEs are trained by feeding an input image into the network, calculating the error between the reconstructed image and the input image, and propagating this error backwards through the network to update the invididual neuron weights. Because the error is defined as the difference between the entire input and output images, CAEs have the convenient advantage of not requiring labeled data.
However, when training a CAE to detect anomalies, some care must be taken when choosing training images. The general strategy for anomaly detection is: overtrain the CAE to reproduce commonly-seen objects as accurately as possible. In this way, any other objects that the network has not seen before (or has not seen many of) will be reproduced poorly by the network. Note that this is different than usual machine learning applications, where oftentimes the goal is to have the model generalize well to as-yet-unseen inputs. When comparing the input and output of a reconstructed image with an anomalous object, the pixels in the vicinity of the object will have a larger error than the rest of the image.
The strategy for choosing training images is to therefore obtain a large number of images containing the usual objects without any anomalies (in effect, low variance). For this application, we collected a few thousand images of capsules on a conveyor belt and trained our CAE until the error rate did not improve for 20 epochs in a row.
Another important factor is to choose the smallest encoding possible while still being able to accurately reproduce the commonly-seen objects. Being able to accurately reproduce the common objects results in a low total image reconstruction error, and using a small encoding ensures that the network has a difficult time reproducing anomalies. We chose an encoding size of 20 x 12 x 12 not because it has the absolute lowest error, but because it had the largest relative anomaly error.
Figure 3 shows the optimization process we used to determine this size. The total reconstruction error of the image, along with the error of the image in the vicinity of the anomaly, are plotted as functions of the number of convolutional layers in the network. As expected, the total and anomaly errors decrease as the size of the encoding increases since the network is able to store more information in this encoding. However, the maximum ratio of anomaly to total error occurs for a distinct combination of network layers and filters. Increasing this ratio allows us to more accurately locate anomalies during post-processing, so we chose this combination of values.
Now that we have a trained model, we still need to display our results in a meaningful way. We also have to get rid of as much inconsequential error (noise) as we can. Post-processing through OpenCV can help us achieve these goals.
Figure 4 shows the input image, the reconstructed image, and the normalized error between the two.
We can see that the network did a reasonable job at recreating the pills. It managed to capture the location, orientation, size, and colors, but there are some expected small-scale errors. The network managed to recreate the color of the background but not the texture. Finally, the CAE failed entirely at recreating the anomalous ring object. It looks like it appeared to recreate it as a series of pills, but due to the size and shape of the object this was not possible. This manifests itself as a rather obvious white object in the error plot.
We can remove small-scale errors from the background and the pills using a few techniques. First, we will compute the predicted error of the CAE by running the recreated image through a single-filter convolutional layer. In a sense, the output of this represents what the network knows that it got wrong. This is shown in the left-most plot of Figure 5.
Here, lighter pixels represent a higher predicted error. The center pane of Figure 5 shows the result of creating a mask from the predicted error. We set a conservative threshold so that only pixels with the highest predicted error will be removed. The result of applying this mask to the reconstruction error is shown in the right pane of Figure 5. Compared to Figure 4, we can see that quite a few of the small-scale errors have been masked out, especially near the edges of the image.
One more technique to reduce the small-scale errors still visible is specific to this application. The errors from the pills tend to result from the CAE failing to fully reproduce the shape and the glare present on each. We can apply a simple median blur with a conservative kernel size of 7 pixels to remove many of these. The results are shown in the left pane in Figure 6.
The blurred image has removed most of the small errors but has barely altered the error from the anomaly. We apply a simple threshold to this frame, and then convert the remaining error pixels to a blue-to-red color map, with higher value pixels being represented as more red and vice versa. This result, along with the final superimposed frame, are shown in the center and right frames of Figure 6, respectively.
Here is a video showing the result of anomaly detection on a moving conveyor belt with a handful of anomalies present:
We look forward to applying this technology to other anomaly detection applications.