# Reading CIFAR images

Download the CIFAR-10 or CIFAR-100 Python images from:
http://www.cs.toronto.edu/~kriz/cifar.html

Note that these are low resolution images, though the images should still be recognizable.

Extract the image files, which are stored in batches. The code below demonstrates how to read in a batch, and assumes that this Notebook is in the same directory as the image files. 

First, we define a function for reading in the data.

In [None]:
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='latin-1')
    return dict

Read in the first batch; note that batch information is stored as a dictionary:

In [None]:
b1 = unpickle('data_batch_1')
b1.keys()

In [None]:
b1['data'].shape

### Image structure

The data is already *flattened*, which is appropriate for use in machine learning methods, although the data should also be scaled. 

The data for each sample represents a color image that is 32 x 32 pixels. The total number of values stored is 3x32x32 = 3072, because 3 *channels* are stored (for red, green, and blue). To display the image, we need to reformat the data. The *imshow* method requires that color images be stored in ndimensional arrays with the last dimension corresponding to the color channel. For this dataset, we also need to rotate the image.

First, select an image index.

In [None]:
image_index = 4

Extract the image and format it for visualization

In [None]:
import numpy as np
img = b1['data'][image_index]
img = img.reshape(3,32,32)
img.shape
img = img.transpose()
img = np.rot90(img,3)
img.shape

In [None]:
import matplotlib.pyplot as plt
fig = plt.figure()
fig.set_size_inches(1,1)
plt.imshow(img)
plt.axis('off')
None

### This is the target value of the selected image

In [None]:
b1['labels'][image_index]

### The *meta* data tell us the names corresponding to the target values

In [None]:
meta = unpickle('batches.meta')

In [None]:
meta.keys()

In [None]:
meta['label_names']

### Combining multi-dimensional numpy arrays

Multi-dimensional numpy arrays can be combined by using *np.vstack* on a list of *numpy* arrays. The code below creates a new dataset containing two copies of the feature data for the first batch (in practice, you would combine the first batch with the second batch, etc). Note 

In [None]:
import numpy as np
x1= b1['data']
print('shape of x1 =', x1.shape)
x2 = b1['data'] # in practice, read from second batch, etc
x_combined = np.vstack([x1,x2])
print('shape of combined data =', x_combined.shape)