简单记录Bert

最新推荐文章于 2024-11-25 14:59:32 发布

xiaoyue_666

最新推荐文章于 2024-11-25 14:59:32 发布

阅读量112

点赞数

CC 4.0 BY-SA版权

分类专栏： NLP基础文章标签： nlp

本文链接：https://blog.csdn.net/tan_1999/article/details/117572290

NLP基础专栏收录该内容

3 篇文章

订阅专栏

BERT是一种基于Transformer架构的模型，通过左右上下文条件学习文本表示。其预训练包括两个任务： masked language model (MLM) 和 next sentence prediction (NSP)。在MLM中，15%的词汇被[MASK]替换，模型预测被遮蔽的词；在NSP中，模型判断句子对是否连续。BERT使用特殊标记[CLS]进行分类任务，[SEP]用于分隔打包在一起的句子对。此外，还有一种特殊学习嵌入来指示每个词来自哪个句子。预训练数据包括英文维基百科和BookCorpus。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

读到一篇论文中对Bert的一点简单介绍，感觉比较便于加深理解，在此mark一下。

BERT [6] is a transformer [49] model that learns textual representations by conditioning on both left and right context for all layers. BERT was pre-trained for two different tasks, MLM and NSP. For MLM, 15% of the tokens are replaced with a [MASK] token, and the model is trained to predict the masked tokens). For NSP, the model is trained to distinguish (binary classification) between pairs of sentences A and B, where 50% of the time B is the next and 50% it is not the next
sentence (a random sentence is selected). The special token [CLS] is added to every sentence during pre-training; it is used for classification tasks. [SEP] is another special token that is used to separate sentence pairs that are packed together into a single sequence. Additionally, there is a special learned embedding which indicates whether each token comes from sentence A or B. BERT was pre-trained using both English Wikipedia (2.5m words) and the BookCorpus [63],