深度自编码器 (Deep Autoencoder, DAE)

sbc-study

于 2025-06-13 14:53:56 发布

阅读量1.2k

点赞数 28

CC 4.0 BY-SA版权

文章标签：人工智能机器学习

本文链接：https://blog.csdn.net/qq_38769809/article/details/148626784

深度自编码器是自编码器（Autoencoder, AE）的深度神经网络版本，用于无监督学习、降维、特征提取和数据去噪。它通过编码（Encoder）和解码（Decoder）结构学习数据的低维表示。

一自编码器的基本概念

自编码器是一种神经网络，其结构包括：

编码器（Encoder）：将输入数据压缩成低维表示。

解码器（Decoder）：将压缩后的表示重构回原始数据格式。

其目标是 最小化重构误差（Reconstruction Error），即输入与输出之间的差异。

$\min_{\theta} \mathcal{L}(x, \hat{x}) = \| x - \hat{x} \|_2^2$

$x$ 是原始输入数据，

$\hat{x} = \text{Decoder}(\text{Encoder}(x))$ 是重构后的输出，

$\mathcal{L}$ 是均方误差（MSE）或其他损失函数

二深度自编码器概念

深度自编码器在传统自编码器基础上增加了多层神经网络，可学习更复杂的非线性映射，适用于更高维的数据（如高分辨率图像、音频信号等）。

结构：多个隐藏层 用于编码（压缩）和解码（重构），通常使用对称结构。

2.1 DAE 的关键特点

非线性变换：使用 ReLU、Sigmoid 等激活函数，使其比 PCA 等线性方法更能表征复杂数据关系。

多层结构：每一层逐步提取抽象特征。

无监督学习：不需要标签，直接学习数据的低维流形。

2.2 DAE结构示例

以 784 → 256 → 64 → 32 → 64 → 256 → 784（MNIST 数据集为例）：

输入层：784 维（28×28 图像）

编码器：

784 → 256（全连接 + ReLU）

256 → 64（全连接 + ReLU）

64 → 32（瓶颈层 / Latent Code）

解码器：

32 → 64（全连接 + ReLU）

64 → 256（全连接 + ReLU）

256 → 784（输出层 + Sigmoid，以匹配输入范围）

像U-Net网络一样

三 DAE的变体

3.1 去噪自编码器 (Denoising Autoencoder, DAE)

输入数据被随机添加噪声（如高斯噪声、遮挡等），但要求重构原始干净数据。

使模型学习 更鲁棒的特征表示，避免过拟合。

常用于图像降噪、异常检测。

3.2 稀疏自编码器 (Sparse Autoencoder)

在损失函数中加入稀疏约束（L1 正则化），使隐层神经元大部分活动接近0（稀疏编码）。

公式：

$\mathcal{L}_{\text{sparse}} = \|x - \hat{x}\|_2^2 + \lambda \sum |h_i|$ （ $h_i$ 是隐层单元激活值）

3.3 变分自编码器 (Variational Autoencoder, VAE)

概率建模方法，隐层学习均值和方差分布（而不是固定编码）。

可用于 生成新数据（如生成手写数字、人脸等）。

四 DAE 的训练

4.1 优化目标

通常用 均方误差（MSE） 或 交叉熵损失（Cross-Entropy）

处理连续数据（如图像、音频）时用MSE：

$L(x, \hat{x}) = \frac{1}{N} \sum_{i=1}^N (x_i - \hat{x}_i)^2$

处理离散数据（如文本、分类数据）时用Cross-Entropy

$L(x, \hat{x}) = - \sum_{i=1}^N x_i \log \hat{x}_i$

4.2 训练方法

（1）随机初始化（如 Xavier / He 初始化）

（2）前向传播：

$\text{Encoder: } \mathbf{h} = f(\mathbf{W}_e x + \mathbf{b}_e) \quad \text{Decoder: } \hat{x} = g(\mathbf{W}_d h + \mathbf{b}_d)$

其中 f,g 是激活函数。

（3）反向传播（BP） + SGD/Adam 优化损失函数。

（4）批量归一化（Batch Norm） 加速训练

代码示例：

import torch
import torch.nn as nn
import torch.optim as optim

class DeepAutoencoder(nn.Module):
    def __init__(self, input_dim, latent_dim):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 64),
            nn.ReLU(),
            nn.Linear(64, latent_dim))
        
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 256),
            nn.ReLU(),
            nn.Linear(256, input_dim),
            nn.Sigmoid())

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

# 示例训练代码
model = DeepAutoencoder(784, 32)
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()

for epoch in range(100):
    for batch in dataloader:
        x, _ = batch
        x = x.view(-1, 784)  # Flattened input (e.g., MNIST)
        optimizer.zero_grad()
        x_recon = model(x)
        loss = criterion(x_recon, x)
        loss.backward()
        optimizer.step()

五 DAE 的应用

降维（Dimensionality Reduction）：比 PCA 更强大，适用于非线性数据。

特征提取（Feature Extraction）：可堆叠到 CNN/RNN 进行预训练。

去噪（Denoising）：去除图像、语音中的噪声。

异常检测（Anomaly Detection）：重构误差大的样本可能是异常点。

生成建模（Generative Modeling）：如 VAE 可用于数据生成。

六总结

特点	说明
非线性降维	比 PCA 更强大，处理复杂数据
无监督学习	不需要标注数据
深度结构	多隐层提取高层次特征
广泛应用	去噪、生成、异常检测等