It is oriented from https://www.researchgate.net/post/Should_the_output_function_for_outer_layer_and_activation_function_of_hidden_layer_in_auto_encoder_be_same, and the copyrights are owned by the respondents and the website. I reprint the content here just for recoding and learning.
Should the output function for outer layer and activation function of hidden layer in auto encoder be same ?
I have created my synthetic datasets from mixture of Gaussians with k number of components, and 2 dimensions. The data are in the range of 1-100. Now I feed it into autoencoder neural network having 2 neurons in input layer, 7 neurons in hidden layer and 2 neurons in output layer. I expect to have output of output layer neuron to be same as input value. But it is not. While training, I used sigmoid function as activation function in hidden layer, and also as output function in output layer. I am comparing this output value with input during training. Is it good ? Or the output function should be different one ?
Hi,
if you use the sigmoid function as output, it can only produce values between -1 and 1 (https://en.wikipedia.org/wiki/Sigmoid_function).
First you can try to normalize your input data to the range of [1/100 1] and then use the sigmoid function in the output layer.
A different approach would be to use the ReLu activation function through out the network or just in the output layer (https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) since it produces values between [0 x] for input x and it is implemented in every deep learning framework.
Hi Bastian,
I normalized my input data in the range of 0 to 1. And I applied sigmoid function for hidden layer as well as for ouput layer . And I took mean square error between output and input. The learning curve of neural network(cost vs epochs) seems to be good. It gradually decreases and after about 2000 iterations, it gets saturated. But I expect output to be same as input, by definition of auto-encoder, which I am not getting. What might be the problem ?
Hi Shyam,
that can have different reasons. Also there will almost always be reconstruction errors which is expected. What is expected depends on the network size, the data and what the target application is i guess.
The obvious thing you can try is of course: Increase the network size, increase the data used for training.A generell adivce, if you use stochatisc gradient descent for training, use momentum (0.9 for example)
If the reconstruction error does not decline at at some point but keeps oscillating around some value, try learning rate decay.
I hope this helps!
Hi Bastian,
Thanks for suggestions. I still have some questions.
- Is one hidden layer is sufficient to recover exact output from input ?
- I am following this link http://deeplearning.net/tutorial/dA.html#autoencoders for implementation. And it has used only one hidden layer for this.
I got the solution myself. My mistake was that I was applying sigmoid function in output layer. I avoided it and it worked well. Even with one hidden layer, I was able to get reconstructed output somehow similar to input.
Shyam Krishna Khadka so you used a linear layer as the output function?
also are autoecoders applicable on images higher spatial depth than 3?
for example hyperspectral images?
It has no effect that you are using sigmoid as an activation function and sigmoid as an output layer. Sigmoid is a binary classifier, it give 2 outputs.In one of my work I did the same. if you want to get more than 2 output, you can use softmax layer as an output layer.
It's not mandatory to use same activation functions for both hidden and output layers. It depends on your problem and neural net architecture. In my case, I found Autoencoder giving better results using ReLu in hidden layers and linear (i.e. no activation function) in output layer.
Autoencoder is mainly used to models which will give me a replica of inputs with same dimension loosing few amount of information (we can say generalization of PCA). And I believe Sigmoid and Softmax wouldn't be the right choice in output layer as both functions transforms real numbers between 0 and 1. Mostly in classification problems.