维特比算法

维特比算法(命名实体识别)

维特比算法(Viterbi Algorithm)是求解隐马尔可夫模型(HMM)中最可能的隐藏状态序列的动态规划算法,广泛用于序列标注任务,如词性标注、命名实体识别等。


一、什么是维特比算法

假设我们有:

  • 一个观测序列 O=(o1,o2,...,oT)\mathbf{O} = (o_1, o_2, ..., o_T)O=(o1,o2,...,oT)
  • 一个状态集合 S={s1,s2,...,sN}\mathbf{S} = \{s_1, s_2, ..., s_N\}S={s1,s2,...,sN}
  • 转移概率 Pij=P(sj∣si)P_{ij} = P(s_j | s_i)Pij=P(sjsi):从状态 sis_isi 转移到状态 sjs_jsj 的概率
  • 发射概率 Oj(ot)=P(ot∣sj)O_j(o_t) = P(o_t | s_j)Oj(ot)=P(otsj):在状态 sjs_jsj 生成观测 oto_tot 的概率
  • 初始状态概率 πj=P(sj at t=1)\pi_j = P(s_j \text{ at } t=1)πj=P(sj at t=1)

目标:

找出在给定观测序列 o1,o2,...,oTo_1, o_2, ..., o_To1,o2,...,oT 时,最可能的隐藏状态序列 s1,s2,...,sTs_1, s_2, ..., s_Ts1,s2,...,sT


二、BIO序列标注

BIO是一种常用的序列标注方案,含义如下:

  • B-:一个实体的开头(Begin)
  • I-:实体的内部(Inside)
  • O:不是任何实体的一部分(Outside)

例如标注句子:“我 爱 北京 天安门”,实体是“北京天安门”这个地点,标注为:

我  O
爱  O
北京  B-LOC
天安门  I-LOC

三、维特比算法的实现步骤

定义:

  • Vt(j)V_t(j)Vt(j):表示前 ttt 个观测中,以状态 sjs_jsj 结尾的最大概率路径的概率
  • patht(j)\text{path}_t(j)patht(j):记录路径

初始化(t=1):

V1(j)=πj⋅Oj(o1) V_1(j) = \pi_j \cdot O_j(o_1) V1(j)=πjOj(o1)

path1(j)=[j] \text{path}_1(j) = [j] path1(j)=[j]

递推(t=2 到 T):

Vt(j)=max⁡i(Vt−1(i)⋅Pij⋅Oj(ot)) V_t(j) = \max_i \left( V_{t-1}(i) \cdot P_{ij} \cdot O_j(o_t) \right) Vt(j)=imax(Vt1(i)PijOj(ot))

patht(j)=patht−1(i∗)+[j],i∗=arg⁡max⁡i(Vt−1(i)⋅Pij) \text{path}_t(j) = \text{path}_{t-1}(i^*) + [j], \quad i^* = \arg\max_i \left( V_{t-1}(i) \cdot P_{ij} \right) patht(j)=patht1(i)+[j],i=argimax(Vt1(i)Pij)

终止:

最终路径=arg⁡max⁡jVT(j) \text{最终路径} = \arg\max_j V_T(j) 最终路径=argjmaxVT(j)


四、具体例子

假设你给出了如下:

  • 状态集合(BIO标签):B, I, O
  • 观测序列(长度为3):[“李白”, “是”, “诗人”]
  • 转移概率 PijP_{ij}Pij:如下(从行到列):
From\ToBIO
B0.10.60.3
I0.00.70.3
O0.50.20.3
  • 发射概率 Oj(ot)O_j(o_t)Oj(ot):如下:
TokenBIO
李白0.80.10.1
0.10.10.8
诗人0.40.50.1
  • 初始概率 π=[0.5,0.0,0.5]\pi = [0.5, 0.0, 0.5]π=[0.5,0.0,0.5]

五、用维特比算法计算BIO序列

我们用动态规划表 Vt(j)V_t(j)Vt(j)(即每一步每个状态的最大概率),并记录路径。

Step 1:t = 1,观测 “李白”

V1(B)=0.5⋅0.8=0.4V1(I)=0.0⋅0.1=0V1(O)=0.5⋅0.1=0.05 V_1(B) = 0.5 \cdot 0.8 = 0.4\\ V_1(I) = 0.0 \cdot 0.1 = 0\\ V_1(O) = 0.5 \cdot 0.1 = 0.05 V1(B)=0.50.8=0.4V1(I)=0.00.1=0V1(O)=0.50.1=0.05

Step 2:t = 2,观测 “是”

V2(B)=max⁡(0.4⋅0.1,0⋅0,0.05⋅0.5)⋅0.1=max⁡(0.04,0,0.025)⋅0.1=0.004 V_2(B) = \max \left( 0.4 \cdot 0.1, 0 \cdot 0, 0.05 \cdot 0.5 \right) \cdot 0.1 = \max(0.04, 0, 0.025) \cdot 0.1 = 0.004 V2(B)=max(0.40.1,00,0.050.5)0.1=max(0.04,0,0.025)0.1=0.004

V2(I)=max⁡(0.4⋅0.6,0⋅0.7,0.05⋅0.2)⋅0.1=max⁡(0.24,0,0.01)⋅0.1=0.024 V_2(I) = \max \left( 0.4 \cdot 0.6, 0 \cdot 0.7, 0.05 \cdot 0.2 \right) \cdot 0.1 = \max(0.24, 0, 0.01) \cdot 0.1 = 0.024 V2(I)=max(0.40.6,00.7,0.050.2)0.1=max(0.24,0,0.01)0.1=0.024

V2(O)=max⁡(0.4⋅0.3,0⋅0.3,0.05⋅0.3)⋅0.8=max⁡(0.12,0,0.015)⋅0.8=0.096 V_2(O) = \max \left( 0.4 \cdot 0.3, 0 \cdot 0.3, 0.05 \cdot 0.3 \right) \cdot 0.8 = \max(0.12, 0, 0.015) \cdot 0.8 = 0.096 V2(O)=max(0.40.3,00.3,0.050.3)0.8=max(0.12,0,0.015)0.8=0.096

Step 3:t = 3,观测 “诗人”

V3(B)=max⁡(0.004⋅0.1,0.024⋅0.0,0.096⋅0.5)⋅0.4=max⁡(0.0004,0,0.048)⋅0.4=0.0192 V_3(B) = \max(0.004 \cdot 0.1, 0.024 \cdot 0.0, 0.096 \cdot 0.5) \cdot 0.4 = \max(0.0004, 0, 0.048) \cdot 0.4 = 0.0192 V3(B)=max(0.0040.1,0.0240.0,0.0960.5)0.4=max(0.0004,0,0.048)0.4=0.0192

V3(I)=max⁡(0.004⋅0.6,0.024⋅0.7,0.096⋅0.2)⋅0.5=max⁡(0.0024,0.0168,0.0192)⋅0.5=0.0096 V_3(I) = \max(0.004 \cdot 0.6, 0.024 \cdot 0.7, 0.096 \cdot 0.2) \cdot 0.5 = \max(0.0024, 0.0168, 0.0192) \cdot 0.5 = 0.0096 V3(I)=max(0.0040.6,0.0240.7,0.0960.2)0.5=max(0.0024,0.0168,0.0192)0.5=0.0096

V3(O)=max⁡(0.004⋅0.3,0.024⋅0.3,0.096⋅0.3)⋅0.1=max⁡(0.0012,0.0072,0.0288)⋅0.1=0.00288 V_3(O) = \max(0.004 \cdot 0.3, 0.024 \cdot 0.3, 0.096 \cdot 0.3) \cdot 0.1 = \max(0.0012, 0.0072, 0.0288) \cdot 0.1 = 0.00288 V3(O)=max(0.0040.3,0.0240.3,0.0960.3)0.1=max(0.0012,0.0072,0.0288)0.1=0.00288


六、最终结果

max⁡(V3)=max⁡(0.0192,0.0096,0.00288)=0.0192 \max(V_3) = \max(0.0192, 0.0096, 0.00288) = 0.0192 max(V3)=max(0.0192,0.0096,0.00288)=0.0192

最终状态是 B,对应路径:

  • t=1:B(0.4)
  • t=2:O(选0.05 → O → B)
  • t=3:B

所以最优路径为:

[B, O, B]


七、小结

  • 维特比算法通过动态规划高效求解最优状态路径
  • BIO标注是实体识别中的序列标注方式
  • 利用**转移概率 Pij 和发射概率 O(bj)**可以一步步填表
  • 最终输出的是概率最高的状态序列
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

洪小帅

靓仔靓女看过来~

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值