深入浅出Pytorch函数——torch.nn.Softmax-CSDN博客

本文链接：https://blog.csdn.net/hy592070616/article/details/131884600

Softmax函数是一种常用的激活函数，用于将多分类问题的输出归一化到[0,1]区间，且所有输出的和为1。Pytorch中的torch.nn.Softmax模块提供了该功能，支持对n维输入张量进行操作，尤其在处理神经网络的输出层时非常有用。当输入包含稀疏张量时，未指定的值会被视为-inf。用户可以指定维度dim来计算softmax，确保沿该维度的每个切片和为1。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

分类目录：《深入浅出Pytorch函数》总目录
相关文章：
· 机器学习中的数学——激活函数：Softmax函数
· 深入浅出Pytorch函数——torch.softmax/torch.nn.functional.softmax
· 深入浅出Pytorch函数——torch.nn.Softmax

将Softmax函数应用于 $n$ 维输入张量，重新缩放它们，使得 $n$ 维输出张量的元素位于 $[0, 1]$ 的范围内，且总和为1。当输入张量是稀疏张量时，未指定的值被视为-inf。

语法

torch.nn.Softmax(dim=None)

参数

dim：[int] Softmax函数将沿着dim轴计算，即沿dim的每个切片的和为1

返回值

与输入张量具有相同尺寸和形状的张量，且其元素值在 $[0 ， 1]$ 范围内。

实例

>>> m = torch.nn.Softmax(dim=1)
>>> input = torch.randn(2, 3)
>>> output = m(input)
tensor([[0.4773, 0.0833, 0.4395],
        [0.0281, 0.6010, 0.3709]])

函数实现

class Softmax(Module):
    r"""Applies the Softmax function to an n-dimensional input Tensor
    rescaling them so that the elements of the n-dimensional output Tensor
    lie in the range [0,1] and sum to 1.

    Softmax is defined as:

    .. math::
        \text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}

    When the input Tensor is a sparse tensor then the unspecified
    values are treated as ``-inf``.

    Shape:
        - Input: :math:`(*)` where `*` means, any number of additional
          dimensions
        - Output: :math:`(*)`, same shape as the input

    Returns:
        a Tensor of the same dimension and shape as the input with
        values in the range [0, 1]

    Args:
        dim (int): A dimension along which Softmax will be computed (so every slice
            along dim will sum to 1).

    .. note::
        This module doesn't work directly with NLLLoss,
        which expects the Log to be computed between the Softmax and itself.
        Use `LogSoftmax` instead (it's faster and has better numerical properties).

    Examples::

        >>> m = nn.Softmax(dim=1)
        >>> input = torch.randn(2, 3)
        >>> output = m(input)

    """
    __constants__ = ['dim']
    dim: Optional[int]

    def __init__(self, dim: Optional[int] = None) -> None:
        super().__init__()
        self.dim = dim

    def __setstate__(self, state):
        super().__setstate__(state)
        if not hasattr(self, 'dim'):
            self.dim = None

    def forward(self, input: Tensor) -> Tensor:
        return F.softmax(input, self.dim, _stacklevel=5)

    def extra_repr(self) -> str:
        return 'dim={dim}'.format(dim=self.dim)