少样本NER研究：基于Transformer和预训练语言模型的综合探索

PDF文件

下载需积分: 5 | 733KB | 更新于2024-08-05 | 5 浏览量 | 4 评论 | 举报 1 收藏

立即下载

"微软与UIUC韩家炜团队合作发表了一篇关于少样本NER(命名实体识别)的全面研究论文，探讨在有限标注数据条件下如何有效地构建NER系统。该研究基于最新的基于Transformer的自监督预训练语言模型(PLMs)，并提出了三种正交策略来提升模型在少样本情况下的泛化能力：(1)使用元学习构建不同实体类型的原型，(2)通过有噪声的网络数据进行监督预训练以提取与实体相关的通用表示，(3)利用未标注的领域内数据进行自我训练。文章还研究了这些策略的不同组合，并在10个公共NER数据集上进行了广泛的实证比较，涵盖了不同比例的标注数据。" 在这篇论文中，研究者关注的核心问题是如何在标记数据稀缺的环境中优化NER系统的性能。传统的机器学习和深度学习模型通常依赖大量标记数据来训练，但在少样本情况下，这些模型可能会过拟合训练数据，导致泛化能力下降。为了解决这个问题，论文提出了以下三个关键策略： 1. **元学习构造原型**：元学习是一种让模型能够快速适应新任务的学习策略。在NER的背景下，这可能意味着通过元学习方法创建代表各种实体类型的原型，使模型能够在有限的样本中学习到更通用的特征，从而提高对新实体类型的识别能力。 2. **监督预训练**：研究人员使用含有噪声的网络数据进行预训练，以获取与实体相关的通用表示。这种方法可以增强模型对实体的理解，即使这些数据没有经过精细的标注。预训练后，模型将具备一定的处理未见过的实体类型的能力。 3. **自我训练**：自我训练是一种无监督学习策略，利用未标注的数据来改进模型。在这种情况下，模型先基于少量的标注数据进行初步训练，然后用预测的标签去指导对未标注数据的进一步学习，形成迭代过程，逐渐提升模型的性能。通过组合这三个策略，研究者进行了广泛实验，评估了它们在不同比例的标注数据上的表现。这样的工作对于实际应用非常有价值，因为在现实世界中，获取大量精确标注的数据往往是昂贵且耗时的。这篇论文为解决这个问题提供了一个系统性的框架和深入的洞察，为未来的研究提供了新的方向。

Mr. Bush

asked

Congress

to raise to

$ 6 billion

Org

Money

Person Others

Transformer-based Backbone Network

Linear Layer + SoftMax

Org M M MO O OOP P

Model:

Target:

Input Sentence:

Entity Types:

Mr. Bush

asked

Congress

to raise to

$ 6 billion

Person

Org Money

Support set:

Gates

co-founded Microsoft …Query set:

Jobs

founded

NeXT Inc

. with

$ 7 million

Distance

SoftMax

(a) Baseline: NER with a linear classiﬁer (b) Prototype-based method

Series 5

runners up

JLS

and

Florence and the Machine

performed on show

Event Musician Artist

Entity Types:

Input Sentence:

Examples

Others

Labeled set

Unlabeled set

Student

Teacher

O M OP

…

Distillation

Figure 2: Illustration of different methods for few-shot NER. In this example, each token in the input sentence is

categorized into one of the four entity types. (a) A typical NER system, where a linear classiﬁer is built on top of

unsupervised pre-trained Transformer-based networks such as BERT/Roberta. (b) A prototype set is constructed

via averaging features of all tokens belonging to a given entity type in the support set (e.g., the prototype for

Person is an average of three tokens: Mr., Bush and Jobs). For a token in the query set, its distances from

different prototypes are computed, and the model is trained to maximize the likelihood to assign the query token

to its target prototype. (c) The Wikipedia dataset is employed for supervised pre-training, whose entity types are

related but different (e.g., Musician and Artist are more ﬁne-grained types of Person in the downstream

task). The associated types on each token can be noisy. (d) Self-training: An NER system (teacher model) trained

on a small labeled dataset is used to predict soft labels for sentences in a large unlabeled dataset. The joint of the

predicted dataset and original dataset is used to train a student model.

How to leverage unlabeled in-domain sentences in

a semi-supervised manner? Note that these three

directions are complementary to each other, can be

further used jointly to extrapolate the methodology

space in Figure 1.

3.1 Prototype-based Methods

Meta-learning (Ravi and Larochelle, 2017) have

shown promising results for few-shot image classi-

ﬁcation (Tian et al., 2020) and sentence classiﬁca-

tion (Yu et al., 2018; Geng et al., 2019). It is natural

to adapt this idea to few-shot NER. The core idea is

to use episodic classiﬁcation paradigm to simulate

few-shot settings during model training. Speciﬁ-

cally in each episode,

entity types (usually

M <

|Y|

) are randomly sampled from

, containing a

support set

S = {(X

, Y

}

M×K

i=1

(

sentences per

type) and a query set

Q = {(

}

M×K

i=1

(

sentences per type).

We build our method based on prototypical net-

work (Snell et al., 2017), which introduces the no-

tion of prototypes, representing entity types as vec-

tors in the same representation space of individual

tokens. To construct the prototype for the

-th

entity type

, the average of representations is

computed for all tokens belonging to this type in

the support set S:

x∈S

(x), (3)

where

is the tokens set of the

-th type in

and

is deﬁned in

(2)

. For an input token

x ∈ Q

from the query set, its prediction distribution is

computed by a softmax function of the distance be-

tween

and all the entity prototypes. For example,

the prediction probability for the

-th prototype is:

q(y =I

|x)=

exp (−d(f

(x), c

))

exp (−d(f

(x), c

))

(4)

where

is the one-hot vector with 1 for

-th

coordinate and 0 elsewhere, and

d(f

(x), c

) =

(x) − c

is used in our implementation.

We provide a simple example to illustrate the

prototype method in Figure 2(b). In each train-

ing iteration, a new episode is sampled, and the

model parameter

is updated via plugging

(4)

into

(1)

. In testing phase, the label of a new token

is assigned using the nearest neighbor criterion

arg min

d(f

(x), c

3.2 Noisy Supervised Pre-training

Generic representations via self-supervised

pre-trained language models (Devlin et al.,

剩余11页未读，继续阅读

资源评论

狼You

2025.07.13

韩家炜组的这一综述，对于理解少样本NER领域具有重要意义。🍖

艾斯·歪

2025.05.20

面对数据稀缺挑战，这篇文章为少样本NER的未来发展指明了方向。

韩金虎

2025.04.28

微软联合UIUC推出少样本NER综述，为标注数据稀缺问题提供解决方案。

袁大岛

2025.04.24

综述深入探讨少样本条件下的命名实体识别技术，内容丰富详实。

syp_net

粉丝: 157

少样本NER研究：基于Transformer和预训练语言模型的综合探索

CST_airfoil_机翼参数化_cst_翼型参数_翼型优化_翼型CST参数化_

综述深度学习优化综述【UIUC】.zip

Chinese NER data MSRA 中文命名实体识别语料

machine-learning-uiuc：:desktop_computer:CS446：2018年Spring机器学习，伊利诺伊大学香槟分校

UIUC-course-analyzer-:显示即将到来的学期的可视化

uiuc-cs-assistant:UIUC CS助手

UIUC-SDC-2013:UIUC 2013年中国首届太阳能十项全能智能家居APP

UIUC人工智能项目深度解析：Wargame游戏AI技术

UIUC CS233离散结构讲义：图论与集合论

UIUC CS-411配方项目：搭建开发环境指南

UIUC课程平均GPA查询利器：GenEd GPA Finder插件

UIUC图书馆开源EAD工具：档案处理与转换

基于UIUC的校园探索GPS游戏：捕捉神奇宝贝体验

UIUC MASSMAIL电子邮件存档解析：从COVID-19通讯案例

UIUC信息的GraphQL API端点：微服务架构与本地部署指南

UIUC研究生算法课程笔记：NP完全性、动态规划与近似算法

UIUC Adaboost训练样本集：车辆检测与训练数据

UIUC学生调查应用开发：SIGSoft项目解析

UIUC课程自动注册神器：UIUCEnterpriseBot网络机器人

数仓--理论知识

sparkling-water-ml_2.12-3.44.0.1-1-3.5.jar

最新资源