Skip to content

jfdelgad/Convolutional-Autoencoders

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autoencoders

Implementing (Deep)Auto-encoders with keeas and tensor-flow

If you are not familiar with auto-encoders I recommend to read this.

Simple autoencoder:

The simplest auto-encoder maps an input to itself. This is interesting as the mapping is done by representing the input in a lower dimensional space, that is, compressing the data. Let's implement it.

import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
from keras.datasets import mnist

# Load Mnist Data
(trainData,trainLabels),(testData,testLabels)  = mnist.load_data()

The sahpe of trainData is (60000,28,28), that is, 60K images of 28 by 28 pixels. Now we format the data such that we have new matrices of shape (60000,784). We flattened the image and scale it to have avalues between 0 and 1 by dividing by 255. We do the same with testData, which is of shape (10000,28,28).

dim = trainData.shape
trainData = trainData.astype('float32')/255
trainData = trainData.reshape((dim[0],dim[1]*dim[2]))

dim = testData.shape
testData = testData.astype('float32')/255
testData = testData.reshape((dim[0],dim[1]*dim[2]))

We then create a model. This model has inputs of 784 elements a single hidden layer of 32 units and the output is 784. This means we will map the 784 pixels to 32 elemets; then we expand the 32 elements to 784 pixels. Lose of information is expected but the amount of compression gained is in most cases worth.

#%% create the network
model = Sequential()

# Add a dense layer with relu activations and input of 784 elements and 32 units. 
model.add(Dense(32, activation='relu', input_shape=(784,)))

# Connect hidden layer to an output layer with teh same dimension and the input. Sigmoid activations.
model.add(Dense(784, activation='sigmoid')) 

# set the learning parameters
model.compile(optimizer='adadelta', loss='binary_crossentropy')

#learn, use 10 percent for validation (just to see differences between training and testing performance)
model.fit(trainData,trainData,batch_size=256,epochs=50, validation_split = 0.1)

We have now learned the network coefficients, let's see how well it reconstruct the inputs using the first five trials as an example.

# predict the output
output = model.predict(testData)
output = output.reshape((len(output),28,28))
testData = testData.reshape((len(testData),28,28))

for i in range(0,5):
    ax = plt.subplot(2,5,i + 1)
    plt.imshow(output[i,:,:])
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    
    ax = plt.subplot(2,5,i + 1 + 5)
    plt.imshow(testData[i,:,:])
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

Figure 1

where the first row of images show the output and the second the input. We can see that some information is lost but is possible to distinguish the digits.

We can take a look at the coefficients (weights) that the models learned. We are interested on the weights that map the input to the hidden layer. We have 32 set of 784 weights. The can be plotted doing:

w = model.get_weights()

for i in range(0,32):
    ax = plt.subplot(4,8,i+1)
    tmp = w[0][:,i].reshape((28,28))
    plt.imshow(tmp)
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

Figure 2

There is one set of coefficients related to ech hidden neuron. Each image then show the pattern in the input that will activate maximally each neuron in the hidden layer.

The main idea is that this method allow to extract the main features needed to representthe data. We could build deeper networks expecting that each layer will make a higher level abstraction compare dto the previous one. Lets see how that work.

Stacked autoencoders:

We can make autoencoders that are deep, menaing that there is more than one hidden layer. But why?

We know that the autoencoder can be used for unsupervised feature extraction. Above we saw that compressing the image from 748 pixels to 32 degrades the image but the digits are clearly identifiable, therefore we has found that the amount of information in the original image is more or less the same in the compressed images. So auto encoders are good.

Now think about a dense neural network used to classify, assume you have N hidden layers. As the number of layers increases the flexibility of our model increases as well, but the amount of data needed increases and the vanishing gradient problem becomes more important.

Therfore initialization of the network becomens important. We can model the dense network as series of stacked autoencoders, which will allow us to pre train each layer as an autoencoder and put them together at the end.

Lets code it. Assume a classification problem using MNIST. we will have two hidden layers learned with autoencoders a softwax layer in the output.

Data and labels:

import numpy as np
import keras
from keras import regularizers
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.models import Model
# Load Mnist Data
(trainData,trainLabels),(testData,testLabels)  = mnist.load_data()


# Organize data
dim = trainData.shape
trainData = trainData.astype('float32')/255
trainData = trainData.reshape((dim[0],dim[1]*dim[2]))
dim = testData.shape
testData = testData.astype('float32')/255
testData = testData.reshape((dim[0],dim[1]*dim[2]))
trainLabels = keras.utils.to_categorical(trainLabels, num_classes=10)
testLabels = keras.utils.to_categorical(testLabels, num_classes=10)

Now lets create the first autoencoder:

coeff = []

# first autoencoder
autoencoder = Sequential()
autoencoder.add(Dense(128, activation='relu', input_shape=(784,)))
autoencoder.add(Dense(784, activation='sigmoid'))
autoencoder.compile(optimizer='rmsprop', loss='binary_crossentropy')
autoencoder.fit(trainData,trainData,batch_size=256,epochs=50, validation_split = 0.1)


# save the encoding part of teh autoencoder to use at the end as initialization of the complete network
w = autoencoder.get_weights()
coeff.append(w[0])
coeff.append(w[1])

#get the output of the hidden layer to be used as input to the next
encoder = Model(inputs=autoencoder.input,outputs=autoencoder.layers[0].output)
encodedInput = encoder.predict(trainData)

Now we repeat this with the next layers, note that encodedInput will become the input of the next layer:

#%% Second autoencoder
autoencoder = Sequential()
autoencoder.add(Dense(64, activation='relu', input_shape=(128,)))
autoencoder.add(Dense(128, activation='linear'))
autoencoder.compile(optimizer='rmsprop', loss='mean_squared_error')
autoencoder.fit(encodedInput,encodedInput,batch_size=256,epochs=50, validation_split = 0.1)

# save the encoding part of teh autoencoder to use at the end as initialization of the complete network
w = autoencoder.get_weights()
coeff.append(w[0])
coeff.append(w[1])

#get the output of the hidden layer to be used as input to the next
encoder = Model(inputs=autoencoder.input,outputs=autoencoder.layers[0].output)
encodedInput = encoder.predict(encodedInput)

Finally the softwax layer:

#%% softmax
sm = Sequential()
sm.add(Dense(10, activation='softmax', input_shape=(64,)))
sm.compile(optimizer='rmsprop', loss='categorical_crossentropy',metrics=['accuracy'])
sm.fit(encodedInput,trainLabels,batch_size=256,epochs=50, validation_split = 0.1)

# save the encoding part of teh autoencoder to use at the end as initialization of the complete network
w = sm.get_weights()
coeff.append(w[0])
coeff.append(w[1])

The saved weights are a good tarting point, we can now fine-tune the complete network, staking all teh autoencoders. Note that weights found in the previous stages are used to nitialize the network.

# Dense layers
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(784,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))

model.layers[0].set_weights(coeff[0:2])
model.layers[1].set_weights(coeff[2:4])
model.layers[2].set_weights(coeff[4:])
# set the learning parameters
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',metrics=['accuracy'])

#learn, use 10 perecnt for validation (just to see differences between training and testing performance)
model.fit(trainData,trainLabels,batch_size=256,epochs=50, validation_split = 0.1)

score = model.evaluate(testData,testLabels)# 0.9745

This give us an accuracy in the test set of 97.8% not bad but far from being the state of the art.

The idea of autoencoders is excellent, but having as fundament (as shown here) that the images can be compressed sounds pretty simple. We may explore particular patterns that appear in the signal. We will need some filters that extract the features and allow us to produce decomposition of the image in fundamental components.

We can use convolutional neural networks, in our case, convolutional autoencoders.

Convolutional Autoencoders:

In convolutional autoencoders we try to represent a given inputs as a combination of general features extracted from the input itself. See this for mor information. Now lets implement it.

# create the network
model = Sequential()

# 1 convolutional layer, 32 filters
model.add(Conv2D(32,(3,3), padding='same', activation='relu',input_shape=(trainData.shape[1],trainData.shape[2],1)))
model.add(MaxPooling2D(pool_size=(2,2)))
 
model.add(Conv2D(32,(3,3), padding='same', activation='relu'))
model.add(UpSampling2D(size=(2,2)))
model.add(Conv2D(1,(3,3), padding='same', activation='relu'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy')
model.fit(trainData,trainData,batch_size=256,epochs=10, validation_split = 0.1)

we can now extract the output of the first layer to have an idea of what features are extracted:

# create the network
model = Sequential()

# 3 convolutional layers, 32, 64 and 64 filters
model.add(Conv2D(32,(3,3), padding='same', activation='relu',input_shape=(trainData.shape[1],trainData.shape[2],1)))
model.add(MaxPooling2D(pool_size=(2,2)))
 
model.add(Conv2D(32,(3,3), padding='same', activation='relu'))
model.add(UpSampling2D(size=(2,2)))

model.add(Conv2D(1,(3,3), padding='same', activation='relu'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy')
model.fit(trainData,trainData,batch_size=256,epochs=10, validation_split = 0.1)

Lets see how well the signals are reconstructed:

# predict the output
output = model.predict(testData)
tmp = output.reshape((len(output),28,28))
tmptest = testData.reshape((len(testData),28,28))


#%%
for i in range(0,5):
    ax = plt.subplot(2,5,i + 1)
    plt.imshow(tmp[i,:,:])
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    
    ax = plt.subplot(2,5,i + 1 + 5)
    plt.imshow(tmptest[i,:,:])
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

Figure 3

We observe that the output is very similar to the original, which is expected as we have a rich set of features extracted from the input images (32 filters) there is no dimensionality reduction, in fact it is the opposite. The output of the hidden layer can be represented by 32 images each one is expected to highlight a (luckily) a different feature of the input signal. The features extracted from each filter can be visualized by finding the input that activates each neuron, for that some tools are available: Keras-vis

Let's keep it simple her. We can take a look at the output of the filters for a single input and see what the extracted features are. In order to generate the output of the hidden layer we can create a new model like this:

encoder = Model(inputs=model.input,outputs=model.layers[0].output)
encoded0 = encoder.predict(trainData)

and visualize the output:

for i in range(0,32):
    ax = plt.subplot(4,8,i + 1)
    plt.imshow(encoded0[1,:,:,i])
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    

The outputs for the first two inputs in the training data look like this:

Figure 4 Figure 4

Notice that this gives the idea that the filters learn basic function like gradients and edge detection. Furthermore these operations seem to be performed in different directions. This of course is mere interpretation.

The main idea is that the convolutional auto-encoder can be used to extract features that allow reconstruction of the images. Note that this is unsupervised and therefore is useful as a first steep when we want to perform classification.

This will be all.

Thanks for reading.

About

Implementation of Vanilla and Convolutional Autoencoders

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published