关于Autoencoder的激活函数

最新推荐文章于 2024-12-01 09:07:24 发布

开飞机的小毛驴儿

最新推荐文章于 2024-12-01 09:07:24 发布

阅读量1.4k

点赞数

CC 4.0 BY-SA版权

分类专栏：深度学习

原文链接：https://www.researchgate.net/post/Should_the_output_function_for_outer_layer_and_activation_function_of_hidden_layer_in_auto_encoder_be_same

深度学习专栏收录该内容

141 篇文章

订阅专栏

探讨了在自编码器中使用不同激活函数的影响，特别是在输入数据归一化后的表现。文章通过实验证明，使用ReLU作为隐藏层的激活函数，并在输出层避免使用sigmoid函数，可以更好地重构输入数据。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

It is oriented from https://www.researchgate.net/post/Should_the_output_function_for_outer_layer_and_activation_function_of_hidden_layer_in_auto_encoder_be_same, and the copyrights are owned by the respondents and the website. I reprint the content here just for recoding and learning.

Should the output function for outer layer and activation function of hidden layer in auto encoder be same ?

I have created my synthetic datasets from mixture of Gaussians with k number of components, and 2 dimensions. The data are in the range of 1-100. Now I feed it into autoencoder neural network having 2 neurons in input layer, 7 neurons in hidden layer and 2 neurons in output layer. I expect to have output of output layer neuron to be same as input value. But it is not. While training, I used sigmoid function as activation function in hidden layer, and also as output function in output layer. I am comparing this output value with input during training. Is it good ? Or the output function should be different one ?

Hi,

if you use the sigmoid function as output, it can only produce values between -1 and 1 (https://en.wikipedia.org/wiki/Sigmoid_function).

First you can try to normalize your input data to the range of [1/100 1] and then use the sigmoid function in the output layer.

A different approach would be to use the ReLu activation function through out the network or just in the output layer (https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) since it produces values between [0 x] for input x and it is implemented in every deep learning framework.

Hi Bastian,

I normalized my input data in the range of 0 to 1. And I applied sigmoid function for hidden layer as well as for ouput layer . And I took mean square error between output and input. The learning curve of neural network(cost vs epochs) seems to be good. It gradually decreases and after about 2000 iterations, it gets saturated. But I expect output to be same as input, by definition of auto-encoder, which I am not getting. What might be the problem ?

Hi Shyam,

that can have different reasons. Also there will almost always be reconstruction errors which is expected. What is expected depends on the network size, the data and what the target application is i guess.

The obvious thing you can try is of course: Increase the network size, increase the data used for training.A generell adivce, if you use stochatisc gradient descent for training, use momentum (0.9 for example)

If the reconstruction error does not decline at at some point but keeps oscillating around some value, try learning rate decay.

I hope this helps!

Hi Bastian,

Thanks for suggestions. I still have some questions.

Is one hidden layer is sufficient to recover exact output from input ?
I am following this link http://deeplearning.net/tutorial/dA.html#autoencoders for implementation. And it has used only one hidden layer for this.

I got the solution myself. My mistake was that I was applying sigmoid function in output layer. I avoided it and it worked well. Even with one hidden layer, I was able to get reconstructed output somehow similar to input.

Shyam Krishna Khadka so you used a linear layer as the output function?

also are autoecoders applicable on images higher spatial depth than 3?

for example hyperspectral images?

It has no effect that you are using sigmoid as an activation function and sigmoid as an output layer. Sigmoid is a binary classifier, it give 2 outputs.In one of my work I did the same. if you want to get more than 2 output, you can use softmax layer as an output layer.

It's not mandatory to use same activation functions for both hidden and output layers. It depends on your problem and neural net architecture. In my case, I found Autoencoder giving better results using ReLu in hidden layers and linear (i.e. no activation function) in output layer.

Autoencoder is mainly used to models which will give me a replica of inputs with same dimension loosing few amount of information (we can say generalization of PCA). And I believe Sigmoid and Softmax wouldn't be the right choice in output layer as both functions transforms real numbers between 0 and 1. Mostly in classification problems.