This lesson is being piloted (Beta version)

Convolution Neural Network for image classification: CIFAR10

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How to train a CNN model with Keras

Objectives
  • Master Keras

Convolutional Neural Network - CNN

image

Architecture of CNNs

image Source

Convolutional Layer (CNN or ConvNet)

Hyperparameter: Depth (L)

image

image

Hyperparameter: Kernel (K) and Filter

image

image

image

image

Hyperparameter: Stride (S):

Stride tuned for the compression of images and video data

image

Hyperparameter: Padding (P):

image

Pooling Layer

image

In which Max Pooling performs a lot better than Average Pooling.

image

Flatten Layer

Batch Normalization Layer

for example:

model.add(Conv2D(75, (3, 3), strides=1, activation="relu", input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2))

Dropout Layer

for example, randomly shutoff 20% neuron:

model.add(Conv2D(75, (3, 3), strides=1, activation="relu", input_shape=(28, 28, 1)))
model.add(Dropout(0.2))

More information can be found here

A sample of CNN model

image

Letnet-5 (1998) by Yann LeCun

image

model = Sequential()
model.add(Conv2D(6, (5, 5), strides=(1, 1), activation=tanh, padding=valid, input_shape=(32, 32, 1)))
model.add(BatchNormalization())
model.add(AveragePooling2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(16, (5, 5), strides=(1, 1), activation=tanh, padding=valid))
model.add(AveragePooling2D(pool_size=(2,2),strides=(2,2)
model.add(Conv2D(120, (5, 5), strides=(1, 1), activation=tanh, padding=valid))
model.add(Flatten())
model.add(Dense(84,activation=tanh))
model.add(Dropout(0.2))
model.add(Dense(10,activation=softmax))

Alex-Net (2012) by Hinton and Alex Krizhevsky

image

image

VGG16 (2014)

image

GoogleNet (2014)

image

Application of CNN in image classification

The CIFAR10 database

In this chapter, we are using CIFAR10 database with additional layer of Conv2D.

Importing libraries

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

Import convolution, max pooling and flatten as mentioned above:

from tensorflow.keras.layers import Conv2D # convolutional layers to reduce image size
from tensorflow.keras.layers import MaxPooling2D # Max pooling layers to further reduce image size
from tensorflow.keras.layers import AveragePooling2D # Max pooling layers to further reduce image size
from tensorflow.keras.layers import Flatten # flatten data from 2D to column for Dense layer

Load CIFAR10

from tensorflow.keras.datasets import cifar10
# load data
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
# Normalized data to range (0, 1):
X_train, X_test = X_train/X_train.max(), X_test/X_test.max()

Using One Hot Encoding from Keras to convert the label:

num_categories = 10 # Ranges from 0-9
input_shape = (32,32,3) # 32 pixels with 3D color scale

y_train = tf.keras.utils.to_categorical(y_train,num_categories)
y_test = tf.keras.utils.to_categorical(y_test,num_categories)

Construct Convolutional Neural Network

model = Sequential()
model.add(Conv2D(8, (3, 3), strides=(1, 1), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

model.add(Flatten())
model.add(Dense(100, activation='relu'))
#Output layer contains 10 different number from 0-9
model.add(Dense(10, activation='softmax'))

model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 30, 30, 10)        280       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 15, 15, 10)       0         
 )                                                               
                                                                 
 flatten (Flatten)           (None, 2250)              0         
                                                                 
 dense (Dense)               (None, 100)               225100    
                                                                 
 dense_1 (Dense)             (None, 10)                1010      
                                                                 
=================================================================
Total params: 226,390
Trainable params: 226,390
Non-trainable params: 0

Compile model

model.compile(optimizer='adam', loss='categorical_crossentropy',  metrics=['accuracy'])                            

Train model

Fit the model

# fit the model
history = model.fit(X_train, y_train, epochs=10, 
                    validation_data=(X_test, y_test))
Epoch 1/10
1563/1563 [==============================] - 6s 2ms/step - loss: 1.4979 - accuracy: 0.4661 - val_loss: 1.3041 - val_accuracy: 0.5401
Epoch 2/10
1563/1563 [==============================] - 3s 2ms/step - loss: 1.2271 - accuracy: 0.5668 - val_loss: 1.2155 - val_accuracy: 0.5687
Epoch 3/10
1563/1563 [==============================] - 3s 2ms/step - loss: 1.1153 - accuracy: 0.6101 - val_loss: 1.2015 - val_accuracy: 0.5777
Epoch 4/10
1563/1563 [==============================] - 3s 2ms/step - loss: 1.0302 - accuracy: 0.6395 - val_loss: 1.1256 - val_accuracy: 0.6108
Epoch 5/10
1563/1563 [==============================] - 3s 2ms/step - loss: 0.9635 - accuracy: 0.6622 - val_loss: 1.0992 - val_accuracy: 0.6172
Epoch 6/10
1563/1563 [==============================] - 3s 2ms/step - loss: 0.9080 - accuracy: 0.6826 - val_loss: 1.1595 - val_accuracy: 0.6001
Epoch 7/10
1563/1563 [==============================] - 3s 2ms/step - loss: 0.8541 - accuracy: 0.6997 - val_loss: 1.1124 - val_accuracy: 0.6259
Epoch 8/10
1563/1563 [==============================] - 3s 2ms/step - loss: 0.8037 - accuracy: 0.7167 - val_loss: 1.1263 - val_accuracy: 0.6234
Epoch 9/10
1563/1563 [==============================] - 3s 2ms/step - loss: 0.7567 - accuracy: 0.7343 - val_loss: 1.1237 - val_accuracy: 0.6260
Epoch 10/10
1563/1563 [==============================] - 3s 2ms/step - loss: 0.7154 - accuracy: 0.7509 - val_loss: 1.1630 - val_accuracy: 0.6156

Evaluate the output

Visualize the training/testing accuracy:

def plot_acc_loss(history):
    plt.plot(history.history['accuracy'])
    plt.plot(history.history['val_accuracy'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['training', 'validation'], loc='best')

    plt.show()

    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')

    plt.xlabel('epoch')
    plt.legend(['training', 'validation'], loc='best')

    plt.show()

plot_acc_loss(model_CNN)

image

Discussions:

Save the CNN model for CIFAR10

model.save('model1_CNN_CIFAR10.keras')

Improving the model?

The model can be more accurate if you increase the size of it.

In the revised version, let increase the size of hidden layers:

model = Sequential()
model.add(Conv2D(8, (3, 3), strides=(1, 1), activation='relu', input_shape=(32, 32, 3)))

model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(512, (3, 3), activation='relu'))

model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(256, (3, 3), activation='relu'))

model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(10, activation='softmax'))

# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy',  metrics=['accuracy'])               
model.summary()
Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_30 (Conv2D)          (None, 30, 30, 8)         224       
                                                                 
 max_pooling2d_20 (MaxPoolin  (None, 15, 15, 8)        0         
 g2D)                                                            
                                                                 
 conv2d_31 (Conv2D)          (None, 13, 13, 512)       37376     
                                                                 
 max_pooling2d_21 (MaxPoolin  (None, 6, 6, 512)        0         
 g2D)                                                            
                                                                 
 conv2d_32 (Conv2D)          (None, 4, 4, 256)         1179904   
                                                                 
 flatten_10 (Flatten)        (None, 4096)              0         
                                                                 
 dense_19 (Dense)            (None, 100)               409700    
                                                                 
 dense_20 (Dense)            (None, 10)                1010      
                                                                 
=================================================================
Total params: 1,628,214
Trainable params: 1,628,214
Non-trainable params: 0

Now we can see that the number of parameters increased from 226k to 1.6 million!

Let’s train the model

model_CNN = model.fit(X_train, y_train, epochs=10, 
                    validation_data=(X_test, y_test))
Epoch 1/10
1563/1563 [==============================] - 6s 4ms/step - loss: 1.4947 - accuracy: 0.4533 - val_loss: 1.1870 - val_accuracy: 0.5721
Epoch 2/10
1563/1563 [==============================] - 5s 3ms/step - loss: 1.1082 - accuracy: 0.6083 - val_loss: 1.0441 - val_accuracy: 0.6349
Epoch 3/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.9404 - accuracy: 0.6690 - val_loss: 0.9733 - val_accuracy: 0.6665
Epoch 4/10
1563/1563 [==============================] - 5s 4ms/step - loss: 0.8229 - accuracy: 0.7118 - val_loss: 0.8952 - val_accuracy: 0.6976
Epoch 5/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.7262 - accuracy: 0.7439 - val_loss: 0.9003 - val_accuracy: 0.6946
Epoch 6/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.6406 - accuracy: 0.7754 - val_loss: 0.8999 - val_accuracy: 0.6979
Epoch 7/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.5609 - accuracy: 0.8028 - val_loss: 0.9167 - val_accuracy: 0.6990
Epoch 8/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.4875 - accuracy: 0.8279 - val_loss: 0.9412 - val_accuracy: 0.7164
Epoch 9/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.4146 - accuracy: 0.8536 - val_loss: 0.9965 - val_accuracy: 0.7086
Epoch 10/10
1563/1563 [==============================] - 5s 4ms/step - loss: 0.3499 - accuracy: 0.8756 - val_loss: 1.0582 - val_accuracy: 0.7045

By increasing the number of parameters from 226k to 1.6mil, we have the accuracy of training improved from 0.75 to 0.87 and testing 0.6 to 0.7!

Now, Let’s save the model

model.save('model2_CNN_CIFAR10.keras')

And validating with some testing data

import numpy as np
predictions = model.predict(X_test)
ypreds = pd.Series(np.argmax(predictions, axis=1))

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_test[i].reshape(32,32,3))
    plt.title(class_names[ypreds[i]])
plt.show()

image

Cluster Visualization using PCA

Principal component analysis (PCA) is a technique that transforms high-dimensions data into lower-dimensions while retaining as much information as possible.

image

In this exercise, we implement the PCA technique to clusterize the CIFAR10 dataset

from sklearn import decomposition
pca = decomposition.PCA(n_components=10)
pca_pred = pca.fit_transform(predictions)

Convert to DataFrame

df_pca = pd.DataFrame(pca_pred,columns=['PC1','PC2','PC3','PC4','PC5','PC6','PC7','PC8','PC9','PC10'])
df_pca['Cluster'] = y_test
df_pca['Cluster_name'] = ypreds.apply(lambda x:class_names[x])

Visualize the Principal Components:

import seaborn as sns
df_pca_var = pd.DataFrame({'var':pca.explained_variance_ratio_,
             'PC':['PC1','PC2','PC3','PC4','PC5','PC6','PC7','PC8','PC9','PC10']})
sns.barplot(x='PC',y="var", 
           data=df_pca_var);

image

sns.lmplot( x="PC1", y="PC2",
  data=df_pca, 
  fit_reg=False, 
  hue='Cluster_name', # color by cluster
  legend=True,
  scatter_kws={"s": 5}, # specify the point size
  height=8)           

image

It’s hard to clearly identify the clusters with images using PCA approach so we move on to other method:

Clusters Visualization using t-SNE

t-Distributed Stochastic Neighbor Embedding (t-SNE) is an unsupervised, non-linear technique primarily used for data exploration and visualizing high-dimensional data.

In simpler terms, t-SNE gives you a feel or intuition of how the data is arranged in a high-dimensional space. It was developed by Laurens van der Maatens and Geoffrey Hinton in 2008.

t-SNE vs PCA?

More Information

from sklearn.manifold import TSNE
model_tsne = TSNE(n_components=2)
tsne_pred = model_tsne.fit_transform(predictions)

Convert to DataFrame

df_tsne = pd.DataFrame(tsne_pred,columns=['TSNE1','TSNE2'])
df_tsne['Cluster'] = ypreds
df_tsne['Cluster_name'] = ypreds.apply(lambda x:class_names[x])

Visualize with seaborn

tsne1 = sns.lmplot( x="TSNE1", y="TSNE2",
  data=df_tsne, 
  fit_reg=False, 
  hue='Cluster_name', # color by cluster
  legend=True,
  scatter_kws={"s": 5}, # specify the point size
  height=8)
#tsne1.savefig('TSNE1.png')  

image

Visualize with images

tx, ty = df_tsne['TSNE1'], df_tsne['TSNE2']
tx = (tx-np.min(tx)) / (np.max(tx) - np.min(tx))
ty = (ty-np.min(ty)) / (np.max(ty) - np.min(ty))
from PIL import Image
width = 4000
height = 3000
max_dim = 100
full_image_TSNE = Image.new('RGB', (width, height))
for idx, x in enumerate(X_test):
    tile = Image.fromarray(np.uint8(x * 255))
    rs = max(1, tile.width / max_dim, tile.height / max_dim)
    tile = tile.resize((int(tile.width / rs),
                        int(tile.height / rs)),
                       Image.Resampling.LANCZOS)
    full_image.paste(tile, (int((width-max_dim) * tx[idx]),
                            int((height-max_dim) * ty[idx])))
full_image_TSNE  
#full_image_TSNE.save("TSNE2.png")

image

Loading a pre-trained model VGG16

image

image

Load VGG16 model from tensorflow

from tensorflow.keras.applications import VGG16
  
# load the VGG16 network *pre-trained* on the ImageNet dataset
model = VGG16(weights="imagenet")
model.summary()
Model: "vgg16"
________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 56, 56, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 56, 56, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 28, 28, 256)       0         
                                                                 
 block4_conv1 (Conv2D)       (None, 28, 28, 512)       1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 28, 28, 512)       2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
 flatten (Flatten)           (None, 25088)             0         
                                                                 
 fc1 (Dense)                 (None, 4096)              102764544 
                                                                 
 fc2 (Dense)                 (None, 4096)              16781312  
                                                                 
 predictions (Dense)         (None, 1000)              4097000   
                                                                 
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0

Let’s get some images:

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

!wget https://cdn.britannica.com/29/150929-050-547070A1/lion-Kenya-Masai-Mara-National-Reserve.jpg

def show_image(image_path):
    image = mpimg.imread(image_path)
    print(image.shape)
    plt.imshow(image)
    
show_image("lion-Kenya-Masai-Mara-National-Reserve.jpg")   

image

Use the pretrained VGG16 model to process the image

from tensorflow.keras.preprocessing import image as image_utils
from tensorflow.keras.applications.vgg16 import preprocess_input

def load_and_process_image(image_path):
    # Print image's original shape, for reference
    print('Original image shape: ', mpimg.imread(image_path).shape)
    
    # Load in the image with a target size of 224, 224
    image = image_utils.load_img(image_path, target_size=(224, 224))
    # Convert the image from a PIL format to a numpy array
    image = image_utils.img_to_array(image)
    # Add a dimension for number of images, in our case 1
    image = image.reshape(1,224,224,3)
    # Preprocess image to align with original ImageNet dataset
    image = preprocess_input(image)
    # Print image's shape after processing
    print('Processed image shape: ', image.shape)
    return image

Prediction using VGG16

Now that we have our image in the right format, we can pass it into our model and get a prediction. We are expecting an output of an array of 1000 elements, which is going to be difficult to read. Fortunately, models loaded directly with Keras have yet another helpful method that will translate that prediction array into a more readable form.

Fill in the following function to implement the prediction:

from tensorflow.keras.applications.vgg16 import decode_predictions

def readable_prediction(image_path):
    # Show image
    show_image(image_path)
    # Load and pre-process image
    image = load_and_process_image(image_path)
    # Make predictions
    predictions = model.predict(image)
    # Print predictions in readable form
    print('Predicted:', decode_predictions(predictions, top=3))

readable_prediction("lion-Kenya-Masai-Mara-National-Reserve.jpg")
(1085, 1600, 3)
Original image shape:  (1085, 1600, 3)
Processed image shape:  (1, 224, 224, 3)
1/1 [==============================] - 0s 315ms/step
Predicted: [[('n02129165', 'lion', 0.9831131), ('n02117135', 'hyena', 0.0046178186), ('n02486410', 'baboon', 0.0028358104)]]

Other pre-trained model?

https://www.tensorflow.org/api_docs/python/tf/keras/applications

Key Points

  • CNN, keras, CIFAR10