A data loader for the MVTec dataset

In this post, we introduce a data loader for the MVTec dataset, a dataset for benchmarking anomaly detection methods with a focus on industrial inspection.

The problem

Anomaly detection is one of the most fundamental problems in Machine Learning. Although it has different ‘flavors’, and there exists different techniques to approach it, the underlying problem is the same: to detect rare occurrences or observations which deviate from a common behavior and do not follow an established pattern. It has numerous applications in cyber-security, finance, computer vision and industrial inspection, among others.

Anomaly Detection example

Figure 1. MNIST digit '3' with '1' and '6' anomalies

The MVTec dataset

Paraphrasing from the company’s website: ‘MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection. It contains over 5000 high-resolution images divided into fifteen different object and texture categories. Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects. Pixel-precise annotations of all anomalies are also provided’.

More precisely, the MVTec dataset is a dataset composed of 5354 high resolution square images of different objects, comprising 15 categories with 3629 images for training and validation, and 1725 images for testing. The training set only contains normal, i.e., defect-free images (label 1) and the test set contains both types of images: images containing various types of defects (label 0) and defect-free images. The anomalies manifest themselves in the form of over 70 different types of defects such as scratches, dents, contaminations, and various structural changes. Additionally, pixel-precise ground truth regions for all anomalies are provided in the dataset.

Hazelnut examples

Figure 2. Normal, defective and ground truth samples for 'Hazelnut' category

The main reference for the dataset is the paper MVTec AD - A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection, written by Bergmann and his collaborators. In this work, they introduce the dataset and also conduct a detailed evaluation of different unsupervised anomaly detection methods based on deep architectures such as convolutional autoencoders and generative adversarial networks, as well as classical computer vision methods. Table 1 in the paper gives a statistical overview for each category. Table 1 in this post is inspired from table 1 in the paper, with some additional relevant information, like the number of channels for each category.

Table 1. Statistical overview of the dataset

	Category	# Train	# Test (good)	# Test (defective)	Size	# Channels
Textures	Carpet	280	28	89	1024	3
	Grid	264	21	57	1024	1
	Leather	245	32	92	1024	3
	Tile	230	33	84	840	3
	Wood	247	19	60	1024	3
Objects	Bottle	209	20	63	900	3
	Cable	224	58	92	1024	3
	Capsule	219	23	109	1000	3
	Hazelnut	391	40	70	1024	3
	Metal nut	220	22	93	700	3
	Pill	267	26	141	800	3
	Screw	320	41	119	1024	1
	Toothbrush	60	12	30	1024	3
	Transistor	213	60	40	1024	3
	Zipper	240	32	119	1024	1
	Total	3629	467	1258	-	-

The data loader

Popular datasets like MNIST or CIFAR10 have data loaders implemented in most frameworks like Pytorch and TensorFlow. However, due its novelty, an MVTec data loader has not yet been implemented in most frameworks. Because of this, my collaborator and I decided to implement a data loader for this dataset, inspired from TorchVision’s CIFAR10 data loader. The implementation can be found in this repo, as well as a jupyter notebook that works as a test and example of the data loader.

Main functionality

Using the data loader is as simple as import it

import mvtec

and start to use it

transform = transforms.Compose(
    [transforms.ToTensor()])

data = mvtec.MVTEC(root='./mvtec',
                   train=True,
                   transform=transform,
                   target_transform=None,
                   resize=64,
                   interpolation=3,
                   category='toothbrush')

The data loader has 7 input arguments. 3 of them are optional (transform, target_transform and resize), and all of them except ‘root’ have a default value. We can see a brief summary of the input arguments in table 2.

Table 2. Input arguments for data loader

Argument	Optional?	Default value	Data type	Values
root	No	-	string	-
train	No	True	bool	[True, False]
transform	Yes	None	Callable	See next paragraph
target_transform	Yes	None	Callable	See next paragraph
resize	Yes	None	int	See next paragraph
interpolation	No	2	int	[0, 5]
category	No	‘carpet’	string	See next paragraph

Specifically:

‘root’ is a string that specifies the root directory of dataset where directories ‘bottle’, ‘cable’, etc., exists.
‘train’ is a boolean that specifies which data to load, training or testing data
‘transform’ is a function/transform that takes in a PIL image and returns a transformed version of the image, for instance, ‘transforms.RandomCrop’. For more details see the test notebook
‘target_transform’ is a function/transform that takes in a label (1 or 0) and returns a transformed version of the label.
‘resize’ is an integer with the desired output image size (H=W=resize)
‘interpolation’ is an integer that specifies the interpolation method for downsizing the image. If ‘resize’ is not None, a value for interpolation must be provided. See this link for more details
‘category’ is a string specifying which data to load (bottle, cable, capsule, etc.)

Once the data loader has finished to load the data required, a message similar to the following message will be displayed:

original data shape: (N, H, W, C) (60, 1024, 1024, 3)

where N is the number of images loaded, H and W are the original image size (before resizing), and C is the number of channels in the images.

The data loader has two main functions: __len__ and __getitem__:

__len__ simply returns the number of images loaded (i.e., N).
__getitem__(index) returns a tuple: the index-th image and its corresponding label, after resizing and the corresponding transformations.

Figure 3 shows a screenshot from the test notebook, where the training data from the toothbrush category was loaded, transforming the images to pytorch tensors of size 256x256x3, with the bicubic interpolation method (interpolation method number 3).

Data loader example

Figure 3. Loading toothbrush training images

Finally, the test notebook provides an example on how we can use the data loader in combination with the data loader from pytorch. Pytorch’s data loader takes the data loaded with our implementation and creates a new data loader with two main additional features (among others): it breaks the dataset in batches of a chosen size and it allows us to shuffle the data in the dataset. Figures 4 and 5 shows a couple of screenshots from the test notebook, where the test data from the pill category was loaded, transforming the images to pytorch tensors of size 128x128x3, with the bicubic interpolation method (interpolation method number 3). After that, a pytorch data loader is created with this data, breaking the dataset in batches of size 20.

Pytorch data loader example 1

Figure 4. Loading pill test images

Pytorch data loader example 2

Figure 5. Using our data loader with pytorch's data loader

We hope that this implementation may be useful for the ML/AD/CV community, and we would like to hear your feedback.

Until our next post colegas,
B