【机器学习】逻辑回归——数学原理推导

本文深入解析逻辑回归模型的数学原理,从线性回归到Sigmoid函数的应用,详细推导了似然估计、损失函数及梯度下降的计算过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

以逻辑回归的二分类模型作出如下推导:

1. 定义
在线性回归上套一层sigmoid函数
g(z)=11+e−z g(z) = \frac{1}{1 + e^{-z}} g(z)=1+ez1

y=hθ(x)=g(θTx)=11+e−θTx=11+e−(θ0+θ1x1+θ2x2+...+θnxn) y = h_\theta(x) = g(\theta^Tx) = \frac{1}{1 + e^{-\theta^Tx}} = \frac{1}{1 + e^{-(\theta_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_nx_n)}} y=hθ(x)=g(θTx)=1+eθTx1=1+e(θ0+θ1x1+θ2x2+...+θnxn)1

注:x0x_0x0是为了便于计算,人为增添的一列,值全为1

这里对函数g(z)g(z)g(z)进行下求导运算,后面推导会用到。

g′(z)=(11+e−z)′g'(z) = (\frac{1}{1 + e^{-z}})'g(z)=(1+ez1)

=e−z+1−1(1+e−z)2\quad\quad=\frac{e^{-z} +1-1}{{(1+e^{-z})}^2}=(1+ez)2ez+11

=11+e−z−1(1+e−z)2\quad\quad=\frac{1}{1+e^{-z}} - \frac{1}{{(1+e^{-z})}^2}=1+ez1(1+ez)21

=g(z)(1−g(z))\quad\quad=g(z)(1-g(z))=g(z)(1g(z))

2. 计算概率
假定:

  • p(y=1∣x;θ)=hθ(x)p(y=1|x;\theta) = h_\theta(x)p(y=1x;θ)=hθ(x)
  • p(y=0∣x;θ)=1−hθ(x)p(y=0|x;\theta) = 1 - h_\theta(x)p(y=0x;θ)=1hθ(x)

组合上述两式:

p(y∣xθ)=hθ(x)yi(1−hθ(x))1−yip(y|x\theta) = h_\theta(x)^{y_i}(1-h_\theta(x))^{1-y_i}p(yxθ)=hθ(x)yi(1hθ(x))1yi

yyy是标签,正类标记1,负类标记0

3. 极大似然估计

L(θ)=∏i=1m(hθ(xi)yi(1−hθ(xi))1−yi))L(\theta) = \prod_{i=1}^{m}{(h_\theta(x_i)^{y_i}(1-h_\theta(x_i))^{1-y_i}))}L(θ)=i=1m(hθ(xi)yi(1hθ(xi))1yi))

取对数,转累加

l(θ)=ln⁡L(θ)l(\theta) = \ln L(\theta)l(θ)=lnL(θ)

=∑i=1mln⁡(hθ(xi)yi(1−hθ(xi))1−yi))\quad =\sum_{i=1}^{m}{\ln(h_\theta(x_i)^{y_i}(1-h_\theta(x_i))^{1-y_i}))}=i=1mln(hθ(xi)yi(1hθ(xi))1yi))

=∑i=1m[yiln⁡hθ(xi)+(1−yi)ln⁡(1−hθ(xi))]\quad =\sum_{i=1}^{m}{[y_i \ln h_\theta(x_i) + (1-y_i)\ln(1-h_\theta(x_i))]}=i=1m[yilnhθ(xi)+(1yi)ln(1hθ(xi))]

说明:

  • 当y=1时,我们期望p(y=1∣x;θ)p(y=1|x;\theta)p(y=1x;θ)的值越大,即预测结果为正类的概率越大,误差就越小
  • 当y=0时,我们期望p(y=0∣x;θ)p(y=0|x;\theta)p(y=0x;θ)的值越大,即预测结果为负类的概率越大,误差也越小

因此我们的目标是求取似然函数l(θ)l(\theta)l(θ)的最大值。

4. 损失函数

对似然函数求最大值需要使用梯度上升的方式,这里我们引入J(θ)=−l(θ)J(\theta) = -l(\theta)J(θ)=l(θ),转化为使用梯度下降的方式计算损失函数的最小值。

5. 梯度下降

∂∂θJ(θj)=−∂∂θ∑i=1m[yiln⁡hθ(xi)+(1−yi)ln⁡(1−hθ(xi))]\frac{\partial}{\partial\theta}J(\theta_j) = -\frac{\partial}{\partial\theta}\sum_{i=1}^{m}{[y_i \ln h_\theta(x_i) + (1-y_i)\ln(1-h_\theta(x_i))]}θJ(θj)=θi=1m[yilnhθ(xi)+(1yi)ln(1hθ(xi))]

=−∑i=1m[yi1hθ(xi)∂∂θhθ(xi)−(1−yi)11−hθ(xi)∂∂θhθ(xi)]\quad\quad\quad = -\sum_{i=1}^{m}{[y_i\frac{1}{h_\theta(x_i)}\frac{\partial}{\partial \theta} h_\theta(x_i)-(1-y_i)\frac{1}{1-h_\theta(x_i)}\frac{\partial}{\partial \theta} h_\theta(x_i)]}=i=1m[yihθ(xi)1θhθ(xi)(1yi)1hθ(xi)1θhθ(xi)]

=−∑i=1m[yi1hθ(xi)−(1−yi)11−hθ(xi)]∂∂θhθ(xi)\quad\quad\quad = -\sum_{i=1}^{m}{[y_i\frac{1}{h_\theta(x_i)} - (1-y_i)\frac{1}{1-h_\theta(x_i)}]}\frac{\partial}{\partial \theta} h_\theta(x_i)=i=1m[yihθ(xi)1(1yi)1hθ(xi)1]θhθ(xi)

=−∑i=1m[yi1g(θTx)−(1−yi)11−g(θTx)]∂∂θg(θTx)\quad\quad\quad = -\sum_{i=1}^{m}{[y_i\frac{1}{g(\theta^Tx)} - (1-y_i)\frac{1}{1-g(\theta^Tx)}]}\frac{\partial}{\partial \theta} g(\theta^Tx)=i=1m[yig(θTx)1(1yi)1g(θTx)1]θg(θTx)

=−∑i=1m[yi1g(θTx)−(1−yi)11−g(θTx)]g(θTx)(1−g(θTx))∂∂θθTx\quad\quad\quad = -\sum_{i=1}^{m}{[y_i\frac{1}{g(\theta^Tx)} - (1-y_i)\frac{1}{1-g(\theta^Tx)}]}g(\theta^Tx)(1-g(\theta^Tx))\frac{\partial}{\partial \theta}\theta^Tx=i=1m[yig(θTx)1(1yi)1g(θTx)1]g(θTx)(1g(θTx))θθTx

=−∑i=1m[yi(1−g(θTx))−(1−yi)g(θTx)]xi(j)\quad\quad\quad = -\sum_{i=1}^{m}{[y_i(1-g(\theta^Tx)) - (1-y_i)g(\theta^Tx)]}x_i^{(j)}=i=1m[yi(1g(θTx))(1yi)g(θTx)]xi(j)

=−∑i=1m[yi−g(θTx)]xi(j)\quad\quad\quad = -\sum_{i=1}^{m}{[y_i - g(\theta^Tx)]}x_i^{(j)}=i=1m[yig(θTx)]xi(j)

=∑i=1m(hθ(xi)−yi)xi(j)\quad\quad\quad = \sum_{i=1}^{m}{(h_\theta(x_i) - y_i)}x_i^{(j)}=i=1m(hθ(xi)yi)xi(j)

更新参数:

θj:=θj−α∑i=1m(hθ(xi)−yi)xi(j)\theta_j := \theta_j - \alpha\sum_{i=1}^{m}{(h_\theta(x_i) - y_i)}x_i^{(j)}θj:=θjαi=1m(hθ(xi)yi)xi(j)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值