活动介绍

def __init__(self, config): super(Model, self).__init__() if config.embedding_pretrained is not None: self.embedding = nn.Embedding.from_pretrained(config.embedding_pretrained, freeze=False) else: self.embedding = nn.Embedding(config.n_vocab, config.embed, padding_idx=config.n_vocab - 1) self.lstm = nn.LSTM(config.embed, config.hidden_size, config.num_layers, bidirectional=True, batch_first=True, dropout=config.dropout) self.fc = nn.Linear(config.hidden_size * 2, config.num_classes)

时间: 2024-04-28 18:20:19 浏览: 237
这是一个神经网络模型的初始化函数,它包含以下几个部分: 1. 调用父类的初始化函数,即nn.Module的__init__()方法。 2. 根据config中的embedding_pretrained是否为None,选择使用预训练的词向量还是随机初始化的词向量。如果使用预训练的词向量,则调用nn.Embedding.from_pretrained()方法加载预训练的词向量,否则使用nn.Embedding()方法随机初始化词向量。 3. 初始化一个LSTM层,其中输入维度为config.embed,隐藏状态维度为config.hidden_size,层数为config.num_layers,双向LSTM,batch_first设置为True,dropout为config.dropout。 4. 初始化一个全连接层,将LSTM输出的双向隐藏状态拼接后,经过一个线性变换得到最终的输出结果,输出维度为config.num_classes。 这个模型的输入是一个batch的词索引序列,输出是每个样本对应的类别。
相关问题

class Model(nn.Module): def __init__(self, config): super(Model, self).__init__() if config.embedding_pretrained is not None: self.embedding = nn.Embedding.from_pretrained(config.embedding_pretrained, freeze=False) else: self.embedding = nn.Embedding(config.n_vocab, config.embed, padding_idx=config.n_vocab - 1) self.lstm = nn.LSTM(config.embed, config.hidden_size, config.num_layers, bidirectional=True, batch_first=True, dropout=config.dropout) self.fc = nn.Linear(config.hidden_size * 2, config.num_classes) def forward(self, x): x, _ = x out = self.embedding(x) # [batch_size, seq_len, embeding]=[128, 32, 300] out, _ = self.lstm(out) out = self.fc(out[:, -1, :]) # 句子最后时刻的 hidden state return out

这段代码是一个基于 LSTM 的文本分类模型的实现,模型的输入是一个由文本数据构成的批次(batch),输出是这个批次中每个文本数据的分类结果。在该模型中,首先根据配置信息创建了一个 Embedding 层,用于将每个词汇映射为一个向量。如果配置信息中指定了预训练的 Embedding 权重,则使用预训练的权重;否则,随机初始化 Embedding 权重。接下来,将 Embedding 层的输出输入到一个多层 LSTM 网络中进行处理,得到每个文本数据的最后一个时刻上的 hidden state。最后,将这些 hidden state 输入到一个全连接层中进行分类,得到分类结果。在 forward 函数中,输入数据 x 是一个元组,其中第一个元素是输入数据,第二个元素是序列长度,但是在本模型中并没有用到序列长度,只用到了输入数据。在 forward 函数中,首先将输入数据输入到 Embedding 层中,得到词向量表达;然后将词向量表达输入到 LSTM 网络中进行处理,得到每个文本数据的 hidden state;最后,将 hidden state 输入到全连接层中进行分类,得到分类结果。

class Model(nn.Module): def init(self, config): super(Model, self).init() if config.embedding_pretrained is not None: self.embedding = nn.Embedding.from_pretrained(config.embedding_pretrained, freeze=False) else: self.embedding = nn.Embedding(config.n_vocab, config.embed, padding_idx=config.n_vocab - 1) self.lstm = nn.LSTM(config.embed, config.hidden_size, config.num_layers, bidirectional=True, batch_first=True, dropout=config.dropout) self.fc = nn.Linear(config.hidden_size * 2, config.num_classes) def forward(self, x): x, _ = x out = self.embedding(x) # [batch_size, seq_len, embeding]=[128, 32, 300] out, _ = self.lstm(out) out = self.fc(out[:, -1, :]) # 句子最后时刻的 hidden state return out

这段代码是一个PyTorch的模型定义,其中包括embedding层、LSTM层和全连接层。模型的输入是x,输出是out。其中,输入x的形状为[batch_size, seq_len],表示一批输入数据中每个序列的长度为seq_len,总共有batch_size个序列。模型会将输入的每个词转换为向量形式,然后通过LSTM层对整个序列进行编码,最后使用全连接层将编码结果映射到目标类别的概率分布上。
阅读全文

相关推荐

# Copyright 2020 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================ """wide and deep model""" import time import numpy as np from mindspore import nn, context from mindspore import Parameter, ParameterTuple import mindspore.common.dtype as mstype import mindspore.ops as ops from mindspore.nn import Dropout from mindspore.nn.optim import Adam, FTRL from mindspore.common.initializer import Uniform, initializer from mindspore.context import ParallelMode from mindspore.nn.wrap.grad_reducer import DistributedGradReducer from mindspore.communication.management import get_group_size np_type = np.float32 ms_type = mstype.float32 def init_method(method, shape, name, max_val=1.0): ''' parameter init method ''' if method in ['uniform']: params = Parameter(initializer( Uniform(max_val), shape, ms_type), name=name) elif method == "one": params = Parameter(initializer("ones", shape, ms_type), name=name) elif method == 'zero': params = Parameter(initializer("zeros", shape, ms_type), name=name) elif method == "normal": params = Parameter(initializer("normal", shape, ms_type), name=name) return params def init_var_dict(init_args, in_vars): ''' var init function ''' var_map = {} _, _max_val = init_args for _, item in enumerate(in_vars): key, shape, method = item if key not in var_map.keys(): if method in ['random', 'uniform']: var_map[key] = Parameter(initializer( Uniform(_max_val), shape, ms_type), name=key) elif method == "one": var_map[key] = Parameter(initializer( "ones", shape, ms_type), name=key) elif method == "zero": var_map[key] = Parameter(initializer( "zeros", shape, ms_type), name=key) elif method == 'normal': var_map[key] = Parameter(initializer( "normal", shape, ms_type), name=key) return var_map class DenseLayer(nn.Cell): """ Dense Layer for Deep Layer of WideDeep Model; Containing: activation, matmul, bias_add; Args: """ def __init__(self, input_dim, output_dim, weight_bias_init, act_str, keep_prob=0.5, use_activation=True, convert_dtype=True, drop_out=False): super(DenseLayer, self).__init__() weight_init, bias_init = weight_bias_init self.weight = init_method( weight_init, [input_dim, output_dim], name="weight") self.bias = init_method(bias_init, [output_dim], name="bias") self.act_func = self._init_activation(act_str) self.matmul = ops.MatMul(transpose_b=False) self.bias_add = ops.BiasAdd() self.cast = ops.Cast() self.dropout = Dropout(keep_prob=(1 - keep_prob)) self.use_activation = use_activation self.convert_dtype = convert_dtype self.drop_out = drop_out def _init_activation(self, act_str): act_str = act_str.lower() if act_str == "relu": act_func = ops.ReLU() elif act_str == "sigmoid": act_func = ops.Sigmoid() elif act_str == "tanh": act_func = ops.Tanh() return act_func def construct(self, x): ''' Construct Dense layer ''' if self.training and self.drop_out: x = self.dropout(x) if self.convert_dtype: x = self.cast(x, mstype.float16) weight = self.cast(self.weight, mstype.float16) bias = self.cast(self.bias, mstype.float16) wx = self.matmul(x, weight) wx = self.bias_add(wx, bias) if self.use_activation: wx = self.act_func(wx) wx = self.cast(wx, mstype.float32) else: wx = self.matmul(x, self.weight) wx = self.bias_add(wx, self.bias) if self.use_activation: wx = self.act_func(wx) return wx class WideDeepModel(nn.Cell): """ From paper: " Wide & Deep Learning for Recommender Systems" Args: config (Class): The default config of Wide&Deep """ def __init__(self, config): super(WideDeepModel, self).__init__() self.batch_size = config.batch_size host_device_mix = bool(config.host_device_mix) parameter_server = bool(config.parameter_server) parallel_mode = context.get_auto_parallel_context("parallel_mode") is_auto_parallel = parallel_mode in (ParallelMode.SEMI_AUTO_PARALLEL, ParallelMode.AUTO_PARALLEL) if is_auto_parallel: self.batch_size = self.batch_size * get_group_size() sparse = config.sparse self.field_size = config.field_size self.emb_dim = config.emb_dim self.weight_init, self.bias_init = config.weight_bias_init self.deep_input_dims = self.field_size * self.emb_dim self.all_dim_list = [self.deep_input_dims] + config.deep_layer_dim + [1] init_acts = [('Wide_b', [1], config.emb_init)] var_map = init_var_dict(config.init_args, init_acts) self.wide_b = var_map["Wide_b"] self.dense_layer_1 = DenseLayer(self.all_dim_list[0], self.all_dim_list[1], config.weight_bias_init, config.deep_layer_act, convert_dtype=True, drop_out=config.dropout_flag) self.dense_layer_2 = DenseLayer(self.all_dim_list[1], self.all_dim_list[2], config.weight_bias_init, config.deep_layer_act, convert_dtype=True, drop_out=config.dropout_flag) self.dense_layer_3 = DenseLayer(self.all_dim_list[2], self.all_dim_list[3], config.weight_bias_init, config.deep_layer_act, convert_dtype=True, drop_out=config.dropout_flag) self.dense_layer_4 = DenseLayer(self.all_dim_list[3], self.all_dim_list[4], config.weight_bias_init, config.deep_layer_act, convert_dtype=True, drop_out=config.dropout_flag) self.dense_layer_5 = DenseLayer(self.all_dim_list[4], self.all_dim_list[5], config.weight_bias_init, config.deep_layer_act, use_activation=False, convert_dtype=True, drop_out=config.dropout_flag) self.wide_mul = ops.Mul() self.deep_mul = ops.Mul() self.reduce_sum = ops.ReduceSum(keep_dims=False) self.reshape = ops.Reshape() self.deep_reshape = ops.Reshape() self.square = ops.Square() self.concat = ops.Concat(axis=1) self.unique = ops.Unique().shard(((1,),)) self.wide_gatherv2 = ops.Gather() self.deep_gatherv2 = ops.Gather() if is_auto_parallel and sparse and not config.field_slice and not parameter_server: target = 'CPU' if host_device_mix else 'DEVICE' self.wide_embeddinglookup = nn.EmbeddingLookup(config.vocab_size, 1, target=target, slice_mode=nn.EmbeddingLookup.TABLE_ROW_SLICE) if config.deep_table_slice_mode == "column_slice": self.deep_embeddinglookup = nn.EmbeddingLookup(config.vocab_size, self.emb_dim, target=target, slice_mode=nn.EmbeddingLookup.TABLE_COLUMN_SLICE) if config.use_sp: self.dense_layer_1.matmul.shard(((1, get_group_size()), (get_group_size(), 1))) self.dense_layer_1.bias_add.shard(((get_group_size(), 1), (1,))) self.deep_mul.shard(((1, 1, get_group_size()), (1, 1, 1))) else: self.dense_layer_1.dropout.dropout.shard(((1, get_group_size()),)) self.dense_layer_1.matmul.shard(((1, get_group_size()), (get_group_size(), 1))) self.deep_mul.shard(((1, 1, get_group_size()), (1, 1, 1))) self.dense_layer_1.matmul.add_prim_attr("field_size", self.field_size) self.deep_reshape.add_prim_attr("skip_redistribution", True) else: self.deep_embeddinglookup = nn.EmbeddingLookup(config.vocab_size, self.emb_dim, target=target, slice_mode=nn.EmbeddingLookup.TABLE_ROW_SLICE) self.reduce_sum.add_prim_attr("cross_batch", True) self.embedding_table = self.deep_embeddinglookup.embedding_table elif is_auto_parallel and host_device_mix and config.field_slice and config.full_batch and config.manual_shape: manual_shapes = tuple((s[0] for s in config.manual_shape)) self.deep_embeddinglookup = nn.EmbeddingLookup(config.vocab_size, self.emb_dim, slice_mode=nn.EmbeddingLookup.FIELD_SLICE, manual_shapes=manual_shapes) self.wide_embeddinglookup = nn.EmbeddingLookup(config.vocab_size, 1, slice_mode=nn.EmbeddingLookup.FIELD_SLICE, manual_shapes=manual_shapes) self.deep_mul.shard(((1, get_group_size(), 1), (1, get_group_size(), 1))) self.wide_mul.shard(((1, get_group_size(), 1), (1, get_group_size(), 1))) self.reduce_sum.shard(((1, get_group_size(), 1),)) self.dense_layer_1.dropout.dropout.shard(((1, get_group_size()),)) self.dense_layer_1.matmul.shard(((1, get_group_size()), (get_group_size(), 1))) self.embedding_table = self.deep_embeddinglookup.embedding_table elif parameter_server: cache_enable = config.vocab_cache_size > 0 target = 'DEVICE' if cache_enable else 'CPU' if not cache_enable: sparse = True if is_auto_parallel and config.full_batch and cache_enable: self.deep_embeddinglookup = nn.EmbeddingLookup(config.vocab_size, self.emb_dim, target=target, slice_mode=nn.EmbeddingLookup.TABLE_ROW_SLICE, sparse=sparse, vocab_cache_size=config.vocab_cache_size) self.wide_embeddinglookup = nn.EmbeddingLookup(config.vocab_size, 1, target=target, slice_mode=nn.EmbeddingLookup.TABLE_ROW_SLICE, sparse=sparse, vocab_cache_size=config.vocab_cache_size) else: self.deep_embeddinglookup = nn.EmbeddingLookup(config.vocab_size, self.emb_dim, target=target, sparse=sparse, vocab_cache_size=config.vocab_cache_size) self.wide_embeddinglookup = nn.EmbeddingLookup(config.vocab_size, 1, target=target, sparse=sparse, vocab_cache_size=config.vocab_cache_size) self.embedding_table = self.deep_embeddinglookup.embedding_table self.deep_embeddinglookup.embedding_table.set_param_ps() self.wide_embeddinglookup.embedding_table.set_param_ps() else: self.deep_embeddinglookup = nn.EmbeddingLookup(config.vocab_size, self.emb_dim, target='DEVICE', sparse=sparse, vocab_cache_size=config.vocab_cache_size) self.wide_embeddinglookup = nn.EmbeddingLookup(config.vocab_size, 1, target='DEVICE', sparse=sparse, vocab_cache_size=config.vocab_cache_size) self.embedding_table = self.deep_embeddinglookup.embedding_table def construct(self, id_hldr, wt_hldr): """ Args: id_hldr: batch ids; wt_hldr: batch weights; """ # Wide layer wide_id_weight = self.wide_embeddinglookup(id_hldr) # Deep layer deep_id_embs = self.deep_embeddinglookup(id_hldr) mask = self.reshape(wt_hldr, (self.batch_size, self.field_size, 1)) # Wide layer wx = self.wide_mul(wide_id_weight, mask) wide_out = self.reshape(self.reduce_sum(wx, 1) + self.wide_b, (-1, 1)) # Deep layer vx = self.deep_mul(deep_id_embs, mask) deep_in = self.deep_reshape(vx, (-1, self.field_size * self.emb_dim)) deep_in = self.dense_layer_1(deep_in) deep_in = self.dense_layer_2(deep_in) deep_in = self.dense_layer_3(deep_in) deep_in = self.dense_layer_4(deep_in) deep_out = self.dense_layer_5(deep_in) out = wide_out + deep_out return out, self.embedding_table class NetWithLossClass(nn.Cell): """" Provide WideDeep training loss through network. Args: network (Cell): The training network config (Class): WideDeep config """ def __init__(self, network, config): super(NetWithLossClass, self).__init__(auto_prefix=False) host_device_mix = bool(config.host_device_mix) parameter_server = bool(config.parameter_server) sparse = config.sparse parallel_mode = context.get_auto_parallel_context("parallel_mode") is_auto_parallel = parallel_mode in (ParallelMode.SEMI_AUTO_PARALLEL, ParallelMode.AUTO_PARALLEL) self.no_l2loss = (is_auto_parallel if (host_device_mix or config.field_slice) else parameter_server) if sparse: self.no_l2loss = True self.network = network self.l2_coef = config.l2_coef self.loss = ops.SigmoidCrossEntropyWithLogits() self.square = ops.Square() self.reduceMean_false = ops.ReduceMean(keep_dims=False) if is_auto_parallel: self.reduceMean_false.add_prim_attr("cross_batch", True) self.reduceSum_false = ops.ReduceSum(keep_dims=False) def construct(self, batch_ids, batch_wts, label): ''' Construct NetWithLossClass ''' predict, embedding_table = self.network(batch_ids, batch_wts) log_loss = self.loss(predict, label) wide_loss = self.reduceMean_false(log_loss) if self.no_l2loss: deep_loss = wide_loss else: l2_loss_v = self.reduceSum_false(self.square(embedding_table)) / 2 deep_loss = self.reduceMean_false(log_loss) + self.l2_coef * l2_loss_v return wide_loss, deep_loss class IthOutputCell(nn.Cell): def __init__(self, network, output_index): super(IthOutputCell, self).__init__() self.network = network self.output_index = output_index def construct(self, x1, x2, x3): predict = self.network(x1, x2, x3)[self.output_index] return predict class TrainStepWrap(nn.Cell): """ Encapsulation class of WideDeep network training. Append Adam and FTRL optimizers to the training network after that construct function can be called to create the backward graph. Args: network (Cell): The training network. Note that loss function should have been added. sens (Number): The adjust parameter. Default: 1024.0 host_device_mix (Bool): Whether run in host and device mix mode. Default: False parameter_server (Bool): Whether run in parameter server mode. Default: False """ def __init__(self, network, sens=1024.0, host_device_mix=False, parameter_server=False, sparse=False, cache_enable=False): super(TrainStepWrap, self).__init__() parallel_mode = context.get_auto_parallel_context("parallel_mode") is_auto_parallel = parallel_mode in (ParallelMode.SEMI_AUTO_PARALLEL, ParallelMode.AUTO_PARALLEL) self.network = network self.network.set_train() self.trainable_params = network.trainable_params() weights_w = [] weights_d = [] for params in self.trainable_params: if 'wide' in params.name: weights_w.append(params) else: weights_d.append(params) self.weights_w = ParameterTuple(weights_w) self.weights_d = ParameterTuple(weights_d) if (sparse and is_auto_parallel) or (sparse and parameter_server): self.optimizer_d = Adam( self.weights_d, learning_rate=5e-4, eps=1e-8, loss_scale=sens, use_lazy=True) self.optimizer_w = FTRL(learning_rate=1e-3, params=self.weights_w, l1=1e-8, l2=1e-8, initial_accum=1.0, loss_scale=sens) if host_device_mix or (parameter_server and not cache_enable): self.optimizer_w.target = "CPU" self.optimizer_d.target = "CPU" else: self.optimizer_d = Adam( self.weights_d, learning_rate=5e-4, eps=1e-8, loss_scale=sens) self.optimizer_w = FTRL(learning_rate=1e-3, params=self.weights_w, l1=1e-8, l2=1e-8, initial_accum=1.0, loss_scale=sens) self.hyper_map = ops.HyperMap() self.grad_w = ops.GradOperation(get_by_list=True, sens_param=True) self.grad_d = ops.GradOperation(get_by_list=True, sens_param=True) self.sens = sens self.loss_net_w = IthOutputCell(network, output_index=0) self.loss_net_d = IthOutputCell(network, output_index=1) self.loss_net_w.set_grad() self.loss_net_d.set_grad() self.reducer_flag = False self.grad_reducer_w = None self.grad_reducer_d = None self.reducer_flag = parallel_mode in (ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL) if self.reducer_flag: mean = context.get_auto_parallel_context("gradients_mean") degree = context.get_auto_parallel_context("device_num") self.grad_reducer_w = DistributedGradReducer(self.optimizer_w.parameters, mean, degree) self.grad_reducer_d = DistributedGradReducer(self.optimizer_d.parameters, mean, degree) def construct(self, batch_ids, batch_wts, label): ''' Construct wide and deep model ''' weights_w = self.weights_w weights_d = self.weights_d loss_w, loss_d = self.network(batch_ids, batch_wts, label) sens_w = ops.Fill()(ops.DType()(loss_w), ops.Shape()(loss_w), self.sens) sens_d = ops.Fill()(ops.DType()(loss_d), ops.Shape()(loss_d), self.sens) grads_w = self.grad_w(self.loss_net_w, weights_w)(batch_ids, batch_wts, label, sens_w) grads_d = self.grad_d(self.loss_net_d, weights_d)(batch_ids, batch_wts, label, sens_d) if self.reducer_flag: grads_w = self.grad_reducer_w(grads_w) grads_d = self.grad_reducer_d(grads_d) return ops.depend(loss_w, self.optimizer_w(grads_w)), ops.depend(loss_d, self.optimizer_d(grads_d)) class PredictWithSigmoid(nn.Cell): """ Predict definition """ def __init__(self, network): super(PredictWithSigmoid, self).__init__() self.network = network self.sigmoid = ops.Sigmoid() parallel_mode = context.get_auto_parallel_context("parallel_mode") full_batch = context.get_auto_parallel_context("full_batch") is_auto_parallel = parallel_mode in (ParallelMode.SEMI_AUTO_PARALLEL, ParallelMode.AUTO_PARALLEL) if is_auto_parallel and full_batch: self.sigmoid.shard(((1, 1),)) def construct(self, batch_ids, batch_wts, labels): logits, _, = self.network(batch_ids, batch_wts) pred_probs = self.sigmoid(logits) return logits, pred_probs, labels # Pre processing def pre_process_criteo_wide_deep(x): return x class WideDeepPostProcess: def __init__(self): self.good = 0 self.total = 0 self.roc_auc = 0 self.results = [] self.labels = [] def __call__(self, results, expected=None, result_dict=None): processed_results = [] n = len(results) for idx in range(0, n): result = results['auc'] processed_results.append(result) self.good += 1 self.total += 1 return processed_results def add_results(self, labels, results): self.results.append(results) self.labels.append(labels) def start(self): self.good = 0 self.total = 0 self.roc_auc = 0 self.results = [] def finalize(self, result_dict, ds=False, output_dir=None): result_dict["good"] = self.good result_dict["total"] = self.total 这段代码的作用是什么?

# architectures/base_model.py import torch import inspect import torch.nn as nn from abc import ABC, abstractmethod from components import * import os from datetime import datetime class BaseMultiArchModel(ABC, nn.Module): """多架构模型基类,定义通用组件和接口""" # 组件白名单(核心定型后仅创建者可修改) # 注意:必须注册具体实现类,而非抽象基类! ALLOWED_COMPONENTS = { "embedding_Standard": StandardEmbedding, # 具体实现类 "embedding_Position": PositionalEmbedding, # 具体实现类 "attention_Self": SelfAttentionModule, # 具体实现类 "attention_Dynamic": DynamicAttention, # 具体实现类 "norm_Layer": LayerNormModule, # 具体实现类 "norm_Dynamic": DynamicNormModule, # 具体实现类 "cnn_Conv1d": Conv1dModule, # 具体实现类 "cnn_MultiK": MultiKernelCNN, # 具体实现类 "rnn_GRU": GRUModule, # 具体实现类 "rnn_LSTM": LSTMModule, # 具体实现类 "transformer_Transformer": TransformerModule, # 具体实现类 "transformer_Lightweight": LightweightTransformer, # 具体实现类 "fusion_Conca": ConcatFusion, # 具体实现类 "fusion_Attention": AttentionFusion # 具体实现类 } # 基因快照配置 GENE_SNAPSHOT_DIR = "gene_snapshots/" def __init__(self, model_config): super().__init__() self.model_config = model_config self._init_components() self._ensure_snapshot_dir() def _init_components(self): """初始化通用组件(增加白名单校验)""" # 创建嵌入层(任务特定,需子类实现) self.embedding = self._create_embedding() # 创建通用组件(使用白名单校验) for comp_type in [ "embedding_Standard", "embedding_Position", "attention_Self", "attention_Dynamic", "norm_Layer", "norm_Dynamic", "cnn_Conv1d", "cnn_MultiK", "rnn_GRU", "rnn_LSTM", "transformer_Transformer", "transformer_Lightweight", "fusion_Conca", "fusion_Attention" ]: config = self.model_config["model_components"].get(comp_type, {}) comp_class = self.ALLOWED_COMPONENTS[comp_type] # 校验组件是否被篡改 assert issubclass(comp_class, BaseComponent), f"非法组件:{comp_type}" # 确保是具体实现类 assert not inspect.isabstract(comp_class), f"非法组件:{comp_type} 是抽象类" # 创建组件实例 setattr(self, comp_type, comp_class(**config)) # 【修改】根据配置设置默认组件的快捷引用 # 不再通过命名规则自动推断,而是通过配置文件显式指定 default_components = self.model_config.get("default_components", {}) # 默认组件映射关系 default_mapping = { "attention": "attention_Self", "norm": "norm_Layer", "cnn": "cnn_Conv1d", "rnn": "rnn_GRU", "transformer": "transformer_Transformer", "fusion": "fusion_Conca" } # 应用默认组件配置 for attr_name, default_key in default_mapping.items(): # 如果配置中指定了其他组件,则使用配置中的组件 config_key = default_components.get(attr_name, default_key) # 检查组件是否存在 if not hasattr(self, config_key): raise KeyError(f"默认组件 {config_key} 未在组件初始化中创建,请检查配置!") # 设置快捷引用,例如 self.attention = self.attention_Self setattr(self, attr_name, getattr(self, config_key)) def _ensure_snapshot_dir(self): """确保基因快照目录存在""" if not os.path.exists(self.GENE_SNAPSHOT_DIR): os.makedirs(self.GENE_SNAPSHOT_DIR) @abstractmethod def _create_embedding(self): """创建任务特定的嵌入层""" pass @abstractmethod def _create_fusion(self): """创建特征融合层""" pass # 通用组件创建方法(保留,但修改为使用配置中指定的组件) def _create_attention(self): # 从配置中获取使用哪个注意力组件 attention_type = self.model_config["model_components"].get("attention_type", "attention_Self") config = self.model_config["model_components"].get(attention_type, {}) return self.ALLOWED_COMPONENTS[attention_type]( embed_dim=self.model_config["embedding_dim"], **config ) def _create_normalization(self): # 从配置中获取使用哪个归一化组件 norm_type = self.model_config["model_components"].get("norm_type", "norm_Layer") config = self.model_config["model_components"].get(norm_type, {}) return self.ALLOWED_COMPONENTS[norm_type]( num_features=self.model_config["embedding_dim"], **config ) def _create_cnn(self): # 从配置中获取使用哪个CNN组件 cnn_type = self.model_config["model_components"].get("cnn_type", "cnn_Conv1d") config = self.model_config["model_components"].get(cnn_type, {}) return self.ALLOWED_COMPONENTS[cnn_type]( in_channels=self.model_config["embedding_dim"], out_channels=self.model_config["hidden_dim"], **config ) def _create_rnn(self): # 从配置中获取使用哪个RNN组件 rnn_type = self.model_config["model_components"].get("rnn_type", "rnn_GRU") config = self.model_config["model_components"].get(rnn_type, {}) return self.ALLOWED_COMPONENTS[rnn_type]( input_size=self.model_config["embedding_dim"], hidden_size=self.model_config["hidden_dim"], **config ) def _create_transformer(self): # 从配置中获取使用哪个Transformer组件 transformer_type = self.model_config["model_components"].get("transformer_type", "transformer_Transformer") config = self.model_config["model_components"].get(transformer_type, {}) return self.ALLOWED_COMPONENTS[transformer_type]( d_model=self.model_config["embedding_dim"], **config ) def forward_features(self, x): """通用特征提取流程""" # 嵌入层处理 x_embed = self.embedding(x) # 使用默认的归一化和注意力组件 x_embed = self.norm(x_embed) if hasattr(self, 'norm') else x_embed x_embed = self.attention(x_embed) if hasattr(self, 'attention') else x_embed # 多架构特征提取 cnn_out = self.cnn(x_embed.transpose(1, 2)).transpose(1, 2) if hasattr(self, 'cnn') else None rnn_out = self.rnn(x_embed) if hasattr(self, 'rnn') else None trans_out = self.transformer(x_embed) if hasattr(self, 'transformer') else None # 特征融合 fusion_inputs = {} if cnn_out is not None: fusion_inputs["cnn"] = cnn_out if rnn_out is not None: fusion_inputs["rnn"] = rnn_out if trans_out is not None: fusion_inputs["transformer"] = trans_out fused_features = self.fusion(fusion_inputs) return fused_features @abstractmethod def forward(self, x): """主前向传播方法""" pass def save_gene_snapshot(self, snapshot_name=None, evolution_note=""): """保存核心组件权重(基因快照)""" snapshot_name = snapshot_name or datetime.now().strftime("%Y%m%d_%H%M%S") snapshot_path = os.path.join(self.GENE_SNAPSHOT_DIR, f"{snapshot_name}.pth") # 保存所有组件的权重 state_dict = { "model_config": self.model_config } # 保存主要组件的权重 for attr in ["embedding", "attention", "norm", "cnn", "rnn", "transformer", "fusion"]: if hasattr(self, attr): state_dict[attr] = getattr(self, attr).state_dict() # 保存所有注册的组件权重 for comp_type in self.ALLOWED_COMPONENTS.keys(): if hasattr(self, comp_type): state_dict[comp_type] = getattr(self, comp_type).state_dict() torch.save(state_dict, snapshot_path) # 写入进化日志 with open(os.path.join(self.GENE_SNAPSHOT_DIR, "evolution_log.md"), "a") as f: f.write(f"## {snapshot_name}\n") f.write(f"变更原因:{evolution_note}\n") f.write(f"时间:{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n") def load_gene_snapshot(self, snapshot_path): """加载基因快照(复活/回退用)""" state_dict = torch.load(snapshot_path) # 更新配置 self.model_config = state_dict["model_config"] # 加载主要组件权重 for attr in ["embedding", "attention", "norm", "cnn", "rnn", "transformer", "fusion"]: if attr in state_dict and hasattr(self, attr): getattr(self, attr).load_state_dict(state_dict[attr]) # 加载所有注册的组件权重 for comp_type in self.ALLOWED_COMPONENTS.keys(): if comp_type in state_dict and hasattr(self, comp_type): getattr(self, comp_type).load_state_dict(state_dict[comp_type]) print(f"[萧默芯] 成功加载基因快照: {snapshot_path}") 帮我分析一下这个代码的可靠性和能力!

import torch import torch.nn as nn import torch.nn.init as init from TransformerBlock import MultiheadAttention from .NeuralNetwork import NeuralNetwork import torch.nn.functional as F from .GAT import GATConv import torch_geometric.utils as utils class Attention(nn.Module): def __init__(self, in_features, hidden_size): super(Attention, self).__init__() self.linear1 = nn.Linear(in_features*2, hidden_size) self.linear2 = nn.Linear(hidden_size, 1) self.activation = nn.ReLU() self.dropout = nn.Dropout(0.5) self.reset_parameters() def reset_parameters(self): init.xavier_normal_(self.linear1.weight) init.xavier_normal_(self.linear2.weight) def forward(self, K, V, mask = None): ''' :param K: (batch_size, d) :param V: (batch_size, hist_len, d) :return: (batch_size, d) ''' K = K.unsqueeze(dim=1).expand(V.size()) fusion = torch.cat([K, V], dim=-1) fc1 = self.activation(self.linear1(fusion)) score = self.linear2(fc1) if mask is not None: mask = mask.unsqueeze(dim=-1) score = score.masked_fill(mask, -2 ** 32 + 1) alpha = F.softmax(score, dim=1) alpha = self.dropout(alpha) att = (alpha * V).sum(dim=1) return att class GLAN(NeuralNetwork): def __init__(self, config, graph): super(GLAN, self).__init__() self.config = config embedding_weights = config['embedding_weights'] V, D = embedding_weights.shape maxlen = config['maxlen'] dropout_rate = config['dropout'] alpha = 0.4 self.graph = graph self.word_embedding = nn.Embedding(V, D, padding_idx=0, _weight=torch.from_numpy(embedding_weights)) self.user_tweet_embedding = nn.Embedding(graph.num_nodes, 300, padding_idx=0) self.mh_attention = MultiheadAttention(input_size=300, output_size=300) self.linear_fuse = nn.Lin

代码出现问题:(style_tune) C:\Users\28996\Desktop\AI\persona_contrastive_finetuning>python Contrastive_Training_LM.py INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set max_memory in to a higher value to use more memory (at your own risk). trainable params: 1,572,864 || all params: 1,838,401,536 || trainable%: 0.0856 训练集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]} 验证集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]} Trainer.tokenizer is now deprecated. You should use Trainer.processing_class = processing_class instead. INFO:__main__:GPU内存使用: 已分配 2.93GB, 保留 4.13GB 可训练参数列表: - base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight 0%| | 0/3 [00:00<?, ?it/s]You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.00GB, 保留 4.21GB Could not estimate the number of tokens of the input, floating-point operations will not be computed Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.22GB 33%|████████████████████████████ | 1/3 [00:03<00:06, 3.25s/it]Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.01GB, 保留 4.25GB Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.26GB 67%|████████████████████████████████████████████████████████ | 2/3 [00:06<00:02, 2.98s/it]Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.01GB, 保留 4.25GB Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.26GB {'train_runtime': 9.034, 'train_samples_per_second': 0.664, 'train_steps_per_second': 0.332, 'train_loss': 1.0772175788879395, 'epoch': 3.0} 100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:09<00:00, 3.01s/it] Traceback (most recent call last): File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 356, in <module> eval_results = trainer.evaluate() File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4076, in evaluate output = eval_loop( File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4270, in evaluation_loop losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4496, in prediction_step outputs = model(**inputs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 818, in forward return model_forward(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 806, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\amp\autocast_mode.py", line 44, in decorate_autocast return func(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\peft\peft_model.py", line 1719, in forward return self.base_model( File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\peft\tuners\tuners_utils.py", line 197, in forward return self.model.forward(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 816, in forward outputs = self.model( File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 521, in forward raise ValueError("You must specify exactly one of input_ids or inputs_embeds") ValueError: You must specify exactly one of input_ids or inputs_embeds (style_tune) C:\Users\28996\Desktop\AI\persona_contrastive_finetuning>python Contrastive_Training_LM.py Traceback (most recent call last): File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 57, in <module> class ContrastiveTrainer(Trainer): File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 63, in ContrastiveTrainer eval_dataset: Optional[Dataset] = None, NameError: name 'Dataset' is not defined 原代码如下:import torch import torch.nn as nn import torch.nn.functional as F from transformers import ( AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, PreTrainedTokenizerBase, BitsAndBytesConfig ) from transformers.tokenization_utils_base import PreTrainedTokenizerBase from transformers.utils import PaddingStrategy from datasets import load_dataset from typing import Any, Dict, List, Optional, Tuple, Union import logging from dataclasses import dataclass import os import gc from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training @dataclass class EvalDataCollator: """评估专用的数据收集器""" tokenizer: PreTrainedTokenizerBase padding: Union[bool, str, PaddingStrategy] = True max_length: Optional[int] = None pad_to_multiple_of: Optional[int] = None return_tensors: str = "pt" def __call__(self, features: List[Dict[str, Any]]) -> Dict[str, torch.Tensor]: # 评估时只使用正样本(用于语言建模评估) positive_features = [{"input_ids": f["positive_input_ids"]} for f in features] # 对正样本进行填充 batch_positive = self.tokenizer.pad( positive_features, padding=self.padding, max_length=self.max_length, pad_to_multiple_of=self.pad_to_multiple_of, return_tensors=self.return_tensors, ) # 创建注意力掩码 attention_mask = (batch_positive["input_ids"] != self.tokenizer.pad_token_id).int() # 创建标签(用于语言建模) labels = batch_positive["input_ids"].clone() labels[labels == self.tokenizer.pad_token_id] = -100 return { "input_ids": batch_positive["input_ids"], "attention_mask": attention_mask, "labels": labels } class ContrastiveTrainer(Trainer): """内存优化的训练器""" # ... [保持其他方法不变] ... def evaluate( self, eval_dataset: Optional[Dataset] = None, ignore_keys: Optional[List[str]] = None, metric_key_prefix: str = "eval", ) -> Dict[str, float]: """重写评估方法以使用专用的数据收集器""" # 创建评估专用的数据收集器 eval_data_collator = EvalDataCollator( tokenizer=self.tokenizer, max_length=256, padding="max_length" ) # 临时保存原始数据收集器 original_collator = self.data_collator try: # 使用评估专用的数据收集器 self.data_collator = eval_data_collator # 调用父类的评估方法 return super().evaluate( eval_dataset=eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix ) finally: # 恢复原始数据收集器 self.data_collator = original_collator # 设置日志 logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # 内存优化工具函数 def clear_memory(): """清除Python和CUDA缓存""" gc.collect() if torch.cuda.is_available(): torch.cuda.empty_cache() torch.cuda.reset_peak_memory_stats() def print_memory_usage(): """打印当前内存使用情况""" if torch.cuda.is_available(): allocated = torch.cuda.memory_allocated() / (1024 ** 3) reserved = torch.cuda.memory_reserved() / (1024 ** 3) logger.info(f"GPU内存使用: 已分配 {allocated:.2f}GB, 保留 {reserved:.2f}GB") else: logger.info("未检测到GPU") def tokenize_function(examples, tokenizer, max_length=256): """将文本转换为token IDs""" tokenized = {} # 对每个字段进行分词 for key in ['anchor', 'positive', 'negative']: if key in examples: # 使用分词器处理文本 result = tokenizer( examples[key], max_length=max_length, truncation=True, padding=False, return_tensors=None ) tokenized[f"{key}_input_ids"] = result["input_ids"] return tokenized @dataclass class ContrastiveDataCollator: """内存优化的数据收集器""" tokenizer: PreTrainedTokenizerBase padding: Union[bool, str, PaddingStrategy] = True max_length: Optional[int] = None pad_to_multiple_of: Optional[int] = None return_tensors: str = "pt" def __call__(self, features: List[Dict[str, Any]]) -> Dict[str, torch.Tensor]: # 分离出三元组的各个部分 anchor_features = [{"input_ids": f["anchor_input_ids"]} for f in features] positive_features = [{"input_ids": f["positive_input_ids"]} for f in features] negative_features = [{"input_ids": f["negative_input_ids"]} for f in features] # 对每个部分分别进行填充 batch_anchor = self.tokenizer.pad( anchor_features, padding=self.padding, max_length=self.max_length, pad_to_multiple_of=self.pad_to_multiple_of, return_tensors=self.return_tensors, ) batch_positive = self.tokenizer.pad( positive_features, padding=self.padding, max_length=self.max_length, pad_to_multiple_of=self.pad_to_multiple_of, return_tensors=self.return_tensors, ) batch_negative = self.tokenizer.pad( negative_features, padding=self.padding, max_length=self.max_length, pad_to_multiple_of=self.pad_to_multiple_of, return_tensors=self.return_tensors, ) # 创建注意力掩码 def create_attention_mask(input_ids): return (input_ids != self.tokenizer.pad_token_id).int() # 释放中间变量内存 del anchor_features, positive_features, negative_features clear_memory() return { "anchor_input_ids": batch_anchor["input_ids"], "anchor_attention_mask": create_attention_mask(batch_anchor["input_ids"]), "positive_input_ids": batch_positive["input_ids"], "positive_attention_mask": create_attention_mask(batch_positive["input_ids"]), "negative_input_ids": batch_negative["input_ids"], "negative_attention_mask": create_attention_mask(batch_negative["input_ids"]), } class ContrastiveTrainer(Trainer): """内存优化的训练器""" def __init__(self, tokenizer=None, *args, contrastive_config=None, **kwargs): # 首先调用父类初始化 super().__init__(*args, **kwargs) # 关键修复:设置tokenizer self.tokenizer = tokenizer if contrastive_config is None: contrastive_config = {} # 设置默认值 self.temperature = contrastive_config.get("temperature", 0.07) self.margin = contrastive_config.get("margin", 0.3) self.contrastive_weight = contrastive_config.get("weight", 0.8) self.repr_layer = contrastive_config.get("repr_layer", -1) # 验证必要参数 if not hasattr(self.model.config, "output_hidden_states") or not self.model.config.output_hidden_states: raise ValueError("模型必须设置output_hidden_states=True") self.cross_entropy = nn.CrossEntropyLoss() def compute_contrastive_loss(self, anchor_emb, pos_emb, neg_emb): """计算对比损失""" # 计算余弦相似度 pos_sim = F.cosine_similarity(anchor_emb, pos_emb) neg_sim = F.cosine_similarity(anchor_emb, neg_emb) # 计算InfoNCE损失 numerator = torch.exp(pos_sim / self.temperature) denominator = numerator + torch.exp(neg_sim / self.temperature) info_nce_loss = -torch.log(numerator / (denominator + 1e-8)).mean() # 计算三元组损失 triplet_loss = F.relu(neg_sim - pos_sim + self.margin).mean() return info_nce_loss + triplet_loss def get_sequence_representation(self, outputs, attention_mask): """获取序列表示(内存优化版)""" # 只获取需要的隐藏状态层 hidden_states = outputs.hidden_states[self.repr_layer] # 获取每个序列的最后一个非填充token seq_lengths = attention_mask.sum(dim=1) - 1 batch_indices = torch.arange(hidden_states.size(0)) # 返回对应位置的隐藏状态 return hidden_states[batch_indices, seq_lengths] def compute_loss(self, model, inputs, return_outputs=False): """内存优化的损失计算""" # 确保模型处于训练模式 model.train() # 提取输入 anchor_ids = inputs["anchor_input_ids"] anchor_mask = inputs["anchor_attention_mask"] positive_ids = inputs["positive_input_ids"] positive_mask = inputs["positive_attention_mask"] negative_ids = inputs["negative_input_ids"] negative_mask = inputs["negative_attention_mask"] # 前向传播获取隐藏状态 def get_embeddings(input_ids, attention_mask): outputs = model( input_ids=input_ids, attention_mask=attention_mask, output_hidden_states=True, return_dict=True ) return self.get_sequence_representation(outputs, attention_mask) # 获取三元组的嵌入表示 anchor_emb = get_embeddings(anchor_ids, anchor_mask) pos_emb = get_embeddings(positive_ids, positive_mask) neg_emb = get_embeddings(negative_ids, negative_mask) # 计算对比损失 cl_loss = self.compute_contrastive_loss(anchor_emb, pos_emb, neg_emb) cl_loss = cl_loss * self.contrastive_weight # 关键修复:确保tokenizer已设置 if self.tokenizer is None: raise ValueError("Tokenizer未设置!") # 计算语言建模损失 lm_labels = positive_ids.clone() # 关键修复:使用tokenizer的pad_token_id pad_token_id = self.tokenizer.pad_token_id lm_labels[lm_labels == pad_token_id] = -100 # 计算语言建模损失 lm_outputs = model( input_ids=positive_ids, attention_mask=positive_mask, labels=lm_labels ) lm_loss = lm_outputs.loss # 总损失 = LM损失 + 对比损失 total_loss = lm_loss + cl_loss # 记录内存使用 print_memory_usage() return (total_loss, lm_outputs) if return_outputs else total_loss # ================ 主程序 ================ # if __name__ == "__main__": # 配置量化以减少内存使用 bnb_config = BitsAndBytesConfig( load_in_4bit=True, # 使用4位量化 bnb_4bit_quant_type="nf4", # 使用NF4量化类型 bnb_4bit_use_double_quant=True, # 双重量化 bnb_4bit_compute_dtype=torch.float16 # 计算使用FP16 ) # 加载模型和分词器(使用量化) model = AutoModelForCausalLM.from_pretrained( "model/Qwen/Qwen1.5-1.8B", quantization_config=bnb_config, # 应用量化配置 device_map="auto", # 自动选择设备 output_hidden_states=True, # 必须设置以获取隐藏状态 return_dict_in_generate=True, use_cache=False # 禁用缓存以节省内存 ) tokenizer = AutoTokenizer.from_pretrained("model/Qwen/Qwen1.5-1.8B") tokenizer.pad_token = tokenizer.eos_token # 设置填充token # 为量化模型添加LoRA适配器 lora_config = LoraConfig( r=8, lora_alpha=32, target_modules=["q_proj", "v_proj"], # 针对Qwen1.5-1.8B模型 lora_dropout=0.05, bias="none", task_type="CAUSAL_LM" ) # 关键修复:准备模型用于k位训练 model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True) # 添加LoRA适配器 model = get_peft_model(model, lora_config) # 关键修复:显式启用LoRA参数的梯度 for param in model.parameters(): if param.requires_grad: param.requires_grad = True model.print_trainable_parameters() # 打印可训练参数数量 # 加载数据集 def load_and_tokenize_dataset(file_path, tokenizer): """加载数据集并进行分词处理""" # 加载原始数据集 dataset_dict = load_dataset('json', data_files=file_path) raw_dataset = dataset_dict['train'] # 应用分词函数 tokenized_dataset = raw_dataset.map( lambda ex: tokenize_function(ex, tokenizer, max_length=256), batched=True, batch_size=8, # 减小批处理大小 remove_columns=['anchor', 'positive', 'negative'] ) return tokenized_dataset train_dataset = load_and_tokenize_dataset('data/processed/train_style_triplets.json', tokenizer) val_dataset = load_and_tokenize_dataset('data/processed/val_style_triplets.json', tokenizer) # 验证数据集格式 print("训练集样本示例:", train_dataset[0]) print("验证集样本示例:", val_dataset[0]) # 训练参数配置(内存优化) training_args = TrainingArguments( output_dir="./model/lora_adapter", per_device_train_batch_size=1, # 减小批量大小 gradient_accumulation_steps=8, # 增加梯度累积步数 num_train_epochs=3, learning_rate=2e-4, logging_steps=10, # 更频繁的日志记录以监控内存 save_steps=500, fp16=True, report_to="none", remove_unused_columns=False, gradient_checkpointing=True, # 启用梯度检查点 optim="adafactor", # 使用内存更少的优化器 ) # 对比学习配置 contrastive_config = { "temperature": 0.07, "margin": 0.3, "weight": 0.8, "repr_layer": -1 } # 初始化数据收集器 data_collator = ContrastiveDataCollator( tokenizer=tokenizer, max_length=256, # 减少最大长度 padding="max_length" ) # 初始化训练器 - 关键修复:传递tokenizer trainer = ContrastiveTrainer( model=model, args=training_args, tokenizer=tokenizer, # 传递tokenizer data_collator=data_collator, train_dataset=train_dataset, eval_dataset=val_dataset, contrastive_config=contrastive_config ) # 开始训练前打印内存状态 print_memory_usage() # 关键修复:验证可训练参数 print("可训练参数列表:") for name, param in model.named_parameters(): if param.requires_grad: print(f"- {name}") # 开始训练 trainer.train() # 保存LoRA适配器 model.save_pretrained("./model/lora_adapter") # 评估模型 try: eval_results = trainer.evaluate() print("评估结果:", eval_results) except Exception as e: print(f"评估过程中发生错误: {e}") import traceback traceback.print_exc()

import torch import re import numpy as np from typing import List, Tuple, Dict, Any from transformers import ( AutoTokenizer, PreTrainedModel, AutoConfig, LlamaForCausalLM, GenerationConfig ) import torch.nn as nn from tqdm import tqdm from collections import defaultdict import pandas as pd # -------------------------- # 1. 常量与预处理函数(采用新的数据处理方式) # -------------------------- VALID_ELEMENTS = ["C", "N", "P", "O", "S", "Si", "I", "H", "Cl", "F", "Br", "B", "Se", "Fe", "Co", "As", "K", "Na"] element_to_idx = {elem: idx for idx, elem in enumerate(VALID_ELEMENTS)} CHEM_FORMULA_SIZE = r"([A-Z][a-z]*)([0-9]*)" # 新增的分子公式解析函数 def parse_chem_formula(formula): pattern = r'([A-Z][a-z]?)(\d*)' matches = re.findall(pattern, formula) element_counts = defaultdict(int) for (element, count) in matches: count = int(count) if count else 1 element_counts[element] += count return element_counts def generate_element_list(formula): element_counts = parse_chem_formula(formula) elements = [] for element, count in element_counts.items(): # 跳过氢元素 if element != "H": elements.extend([element] * count) return ''.join(elements) # 化学式转密集向量 def formula_to_dense(chem_formula: str) -> torch.Tensor: dense_vec = torch.zeros(len(VALID_ELEMENTS), dtype=torch.float32) matches = re.findall(CHEM_FORMULA_SIZE, chem_formula) for chem_symbol, num_str in matches: num = 1 if num_str == "" else int(num_str) if chem_symbol in element_to_idx: idx = element_to_idx[chem_symbol] dense_vec[idx] += num return dense_vec # 位置编码生成 (PyTorch实现) def positional_encoding(max_position: int, d_model: int, min_freq: float = 1e-4) -> torch.Tensor: position = torch.arange(max_position).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2) * (-torch.log(torch.tensor(min_freq)) / d_model)) pos_enc = torch.zeros(max_position, d_model) pos_enc[:, 0::2] = torch.sin(position * div_term) pos_enc[:, 1::2] = torch.cos(position * div_term) return pos_enc # 初始化位置编码矩阵 P = positional_encoding(2000000, 254) dimn = 254 # 与位置编码维度一致 # 质谱数据编码 - 优化短数据处理:仅截断过长数据,不填充短数据 def encode_spectra(rag_tensor: list, P: torch.Tensor, dimn: int) -> list: # 返回列表而非堆叠张量 encoded_list = [] max_len = 501 # 仅对过长数据截断,不强制填充短数据 for sample in rag_tensor: mz_list, intensity_list = sample # 创建基础特征矩阵 [m/z, intensity] base_features = torch.tensor([mz_list, intensity_list], dtype=torch.float32).T # 添加位置编码特征(保留原始m/z的位置信息) pos_enc = torch.stack([P[min(int(mz), P.size(0)-1)] for mz in mz_list]) # 组合所有特征 [m/z, intensity, pos_enc...] features = torch.cat([base_features, pos_enc], dim=1) # 仅截断过长数据,短数据保持原始长度(不填充) if features.size(0) > max_len: features = features[:max_len] encoded_list.append(features) # 保留原始长度特征 return encoded_list # 质谱数据预处理 - 确保短数据完整保留 def preprocess_spectra_for_inference(spectrum_str: str, total_mass: float) -> list: # 解析质谱字符串 pairs = spectrum_str.split() mz_list, intensity_list = [], [] for pair in pairs: mz, intensity = pair.split(':') mz_list.append(float(mz)) intensity_list.append(float(intensity)) # 对于仅含一组数据的情况,额外保留原始精度(不四舍五入) if len(pairs) == 1: # 保留原始精度,不进行四舍五入 mz_list = [float(mz) for mz, _ in [pair.split(':') for pair in pairs]] intensity_list = [float(intensity) for _, intensity in [pair.split(':') for pair in pairs]] # 添加总精确质量(作为补充特征,不影响原始数据长度) mz_list.append(total_mass) intensity_list.append(0.0) # 仅对长数据进行四舍五入,短数据保留更多精度 if len(mz_list) > 5: # 数据较长时才简化 mz_list = [round(mz, 2) for mz in mz_list] intensity_list = [round(intensity, 2) for intensity in intensity_list] return [[mz_list, intensity_list]] # -------------------------- # 2. 模型类定义(保持结构,采用新实现) # -------------------------- from transformers.modeling_outputs import CausalLMOutputWithPast from transformers.generation.utils import GenerationMixin class LlamaWithEncoder(PreTrainedModel, GenerationMixin): config_class = AutoConfig _no_split_modules = ["LlamaDecoderLayer", "TransformerEncoderLayer"] def __init__(self, config, base_model=None, encoder1_dim=18, encoder2_dim=256, hidden_dim=512): # 添加config属性 self.config = config super().__init__(self.config) # 如果未提供base_model,则从config初始化 if base_model is None: self.model = LlamaForCausalLM(config) else: self.model = base_model # 第一个Transformer Encoder(处理分子式向量) encoder1_layer = nn.TransformerEncoderLayer( d_model=encoder1_dim, nhead=3, dim_feedforward=hidden_dim, batch_first=True ) self.encoder1 = nn.TransformerEncoder(encoder1_layer, num_layers=2) # 第二个Transformer Encoder(处理质谱矩阵) encoder2_layer = nn.TransformerEncoderLayer( d_model=encoder2_dim, nhead=4, dim_feedforward=hidden_dim, batch_first=True ) self.encoder2 = nn.TransformerEncoder(encoder2_layer, num_layers=2) # 投影层:将编码器输出映射到模型隐藏层维度 self.proj1 = nn.Linear(encoder1_dim, base_model.config.hidden_size) self.proj2 = nn.Linear(encoder2_dim, base_model.config.hidden_size) # 嵌入层(复制基础模型权重但不共享) self.embed_tokens = nn.Embedding( num_embeddings=base_model.config.vocab_size, embedding_dim=base_model.config.hidden_size, padding_idx=base_model.config.pad_token_id ) self.embed_tokens.weight.data = base_model.get_input_embeddings().weight.data.clone() # 必要接口实现 def get_input_embeddings(self): return self.embed_tokens def set_input_embeddings(self, value): self.embed_tokens = value def get_output_embeddings(self): return self.model.get_output_embeddings() def set_output_embeddings(self, new_embeddings): self.model.set_output_embeddings(new_embeddings) def get_base_model(self): return self.model def forward( self, input_ids=None, attention_mask=None, encoder1_inputs=None, encoder2_inputs=None, labels=None, past_key_values=None, output_attentions=None, output_hidden_states=None, return_dict=None,** kwargs ) -> CausalLMOutputWithPast: # 1. 编码器处理 # 分子式编码器输出 enc1_out = self.encoder1(encoder1_inputs) # (batch_size, 1, 18) enc1_out = enc1_out.mean(dim=1) # (batch_size, 18) enc1_proj = self.proj1(enc1_out) # (batch_size, hidden_size) # 质谱编码器输出 enc2_out = self.encoder2(encoder2_inputs) # (batch_size, seq_len, 256) enc2_out = enc2_out.mean(dim=1) # (batch_size, 256) enc2_proj = self.proj2(enc2_out) # (batch_size, hidden_size) # 合并编码器输出(用于替换<mask>) mask_replacement = (enc1_proj + enc2_proj) / 2 # (batch_size, hidden_size) # 2. 获取原始嵌入(避免inplace,全程用新张量) embeddings = self.embed_tokens(input_ids) # (batch_size, seq_len, hidden_size) batch_size, seq_len, hidden_size = embeddings.size() # 3. 替换<mask> token(第三个token,索引=2):用拼接替代inplace赋值 if seq_len > 2: mask_embed = mask_replacement.unsqueeze(1) # (batch_size, 1, hidden_size) # 拆分张量并拼接(前2个token + 替换的mask_embed + 剩余token) part1 = embeddings[:, :2, :] # (batch_size, 2, hidden_size) part2 = mask_embed # (batch_size, 1, hidden_size) part3 = embeddings[:, 3:, :] # (batch_size, seq_len-3, hidden_size) # 拼接为新张量(无inplace操作) new_embeddings = torch.cat([part1, part2, part3], dim=1) # (batch_size, seq_len, hidden_size) else: new_embeddings = embeddings # 序列过短时直接使用原始嵌入 # 4. 调用基础模型 return self.model( inputs_embeds=new_embeddings, attention_mask=attention_mask, labels=labels, past_key_values=past_key_values, output_attentions=output_attentions, output_hidden_states=output_hidden_states, return_dict=return_dict, ) def prepare_inputs_for_generation(self, input_ids, **kwargs): return { "input_ids": input_ids, "attention_mask": kwargs.get("attention_mask", None), "encoder1_inputs": kwargs.get("encoder1_inputs", None), "encoder2_inputs": kwargs.get("encoder2_inputs", None), } def _get_generation_device(self): return next(self.parameters()).device # -------------------------- # 3. 加载模型和Tokenizer(修复核心错误) # -------------------------- model_path = "./llama3.2-SELFIES" # 模型保存路径 # 加载分词器 tokenizer = AutoTokenizer.from_pretrained(model_path) # 确保mask token存在 if tokenizer.mask_token is None: tokenizer.add_special_tokens({"mask_token": "<mask>"}) # 加载模型配置 config = AutoConfig.from_pretrained(model_path) # 设备配置(优先使用GPU) device = "cuda:0" if torch.cuda.is_available() else "cpu" # 修复:先加载基础模型,再传入自定义模型 base_model = LlamaForCausalLM.from_pretrained( model_path, config=config, torch_dtype=torch.bfloat16, # 确保基础模型为bfloat16精度 device_map=device ) # 使用基础模型初始化自定义模型 model = LlamaWithEncoder( config=config, base_model=base_model, encoder1_dim=18, encoder2_dim=256, hidden_dim=512 ) model = model.to(device) # 先转移到设备 model.eval() # 推理模式 # -------------------------- # 4. 推理函数(适配新的数据处理方式) # -------------------------- def generate_selfies( formula: str, spectrum_str: str, total_mass: float, max_length: int = 512, temperature: float = 0.7, top_p: float = 0.9 ) -> str: """生成SELFIES字符串""" model_device = next(model.parameters()).device # 1. 生成element_list element_list = generate_element_list(formula) # 2. 处理分子式向量 formula_vec = formula_to_dense(formula).unsqueeze(0).unsqueeze(0) # (1,1,18) formula_vec = formula_vec.to(model_device, dtype=torch.bfloat16) # 3. 处理质谱数据(使用新的预处理和编码方式) spectra_data = preprocess_spectra_for_inference(spectrum_str, total_mass) spec_encoded = encode_spectra(spectra_data, P, dimn) # 得到列表形式的编码结果 spec_matrix = spec_encoded[0].to(model_device, dtype=torch.bfloat16).unsqueeze(0) # 添加批次维度 # 4. 构造输入提示 prompt = f"<|User|><|Spectrum|>{element_list}" input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model_device) attention_mask = torch.ones_like(input_ids).to(model_device) # 5. 模型生成 with torch.no_grad(): # 关闭梯度计算 outputs = model.generate( input_ids=input_ids, attention_mask=attention_mask, encoder1_inputs=formula_vec, # 分子式特征 encoder2_inputs=spec_matrix, # 质谱特征 max_length=max_length, temperature=temperature, top_p=top_p, ) # 6. 解码生成结果(去除特殊token) generated = tokenizer.decode(outputs[0], skip_special_tokens=False) return generated # -------------------------- # 5. 推理示例 # -------------------------- if __name__ == "__main__": # 示例输入 example_formula = "C9H9N3O2S2" # 分子式 example_spectrum_str = "256.0153:100.000000" # mz:intensity格式 example_total_mass = 255.0136185 # 总精确质量 # 生成SELFIES result = generate_selfies( formula=example_formula, spectrum_str=example_spectrum_str, total_mass=example_total_mass, max_length=512, temperature=0.7, top_p=0.95 ) print("生成的SELFIES字符串:") print(result)修改代码,解决问题Some weights of LlamaForCausalLM were not initialized from the model checkpoint at ./llama3.2-SELFIES and are newly initialized: ['lm_head.weight', 'model.embed_tokens.weight', 'model.layers.0.input_layernorm.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.0.post_attention_layernorm.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.1.input_layernorm.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.1.post_attention_layernorm.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.10.input_layernorm.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.10.post_attention_layernorm.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.11.input_layernorm.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.11.post_attention_layernorm.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.input_layernorm.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.12.post_attention_layernorm.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.13.input_layernorm.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.13.post_attention_layernorm.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.14.input_layernorm.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.14.post_attention_layernorm.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.15.input_layernorm.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.15.post_attention_layernorm.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.16.input_layernorm.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.16.post_attention_layernorm.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.17.input_layernorm.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.17.post_attention_layernorm.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.18.input_layernorm.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.18.post_attention_layernorm.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.19.input_layernorm.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.19.post_attention_layernorm.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.2.input_layernorm.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.2.post_attention_layernorm.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.20.input_layernorm.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.20.post_attention_layernorm.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.21.input_layernorm.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.21.post_attention_layernorm.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.22.input_layernorm.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.22.post_attention_layernorm.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.23.input_layernorm.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.23.post_attention_layernorm.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.24.input_layernorm.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.24.post_attention_layernorm.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.25.input_layernorm.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.25.post_attention_layernorm.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.26.input_layernorm.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.26.post_attention_layernorm.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.27.input_layernorm.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.27.post_attention_layernorm.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.3.input_layernorm.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.3.post_attention_layernorm.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.4.input_layernorm.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.4.post_attention_layernorm.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.5.input_layernorm.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.5.post_attention_layernorm.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.6.input_layernorm.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.6.post_attention_layernorm.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.7.input_layernorm.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.7.post_attention_layernorm.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.8.input_layernorm.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.8.post_attention_layernorm.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.9.input_layernorm.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.9.post_attention_layernorm.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.norm.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

这段transformer的代码有什么用:#位置编码 class TransformerEmbedding(nn.Module): def __init__(self, config): super().__init__() # hyper params self.vocab_size = config["vocab_size"] self.hidden_size = config["d_model"] # 词向量维度 self.pad_idx = config["pad_idx"] dropout_rate = config["dropout"] self.max_length = config["max_length"] # layers,设置padding_idx可以让pad的词向量全为0 self.word_embedding = nn.Embedding( self.vocab_size, self.hidden_size, padding_idx=self.pad_idx ) self.pos_embedding = nn.Embedding( self.max_length, self.hidden_size, _weight=self.get_positional_encoding( self.max_length, self.hidden_size ),# 位置编码,权重通过get_positional_encoding函数计算得到 ) self.pos_embedding.weight.requires_grad_(False) # 不更新位置编码的权重 self.dropout = nn.Dropout(dropout_rate) # 随机失活层 def get_word_embedding_weights(self): return self.word_embedding.weight # 计算位置信息 @classmethod def get_positional_encoding(self, max_length, hidden_size):#max_length是最大长度,hidden_size是embedding维度相等 # Compute the positional encodings once in log space. pe = torch.zeros(max_length, hidden_size) # 初始化位置编码 # .unsqueeze(1) 是将这个一维张量转换为二维张量,即将其形状从 (max_length,) 变为 (max_length, 1)。这个操作在张量的维度上增加了一个维度,使其从一维变为二维,第二维的大小为 1。 position = torch.arange(0, max_length).unsqueeze(1) # 位置信息,从0到max_length-1 div_term = torch.exp( torch.arange(0, hidden_size, 2) * -(torch.log(torch.Tensor([10000.0])) / hidden_size) )# 计算位置编码的权重,为了性能考量(是数学上的对数函数分解) pe[:, 0::2] = torch.sin(position * div_term) pe[:, 1::2] = torch.cos(position * div_term) return pe def forward(self, input_ids): # input_ids: [batch_size, seq_len] seq_len = input_ids.shape[1] assert ( seq_len <= self.max_length ), f"input sequence length should no more than {self.max_length} but got {seq_len}" position_ids = torch.arange(seq_len, dtype=torch.long, device=input_ids.device) position_ids = position_ids.unsqueeze(0).expand_as(input_ids) print(position_ids) #为了调试 # embedding word_embeds = self.word_embedding(input_ids) # 词嵌入 pos_embeds = self.pos_embedding(position_ids) # 位置编码 embeds = word_embeds + pos_embeds embeds = self.dropout(embeds) return embeds def plot_position_embedding(position_embedding):# 绘制位置编码 plt.pcolormesh(position_embedding) # 绘制位置编码矩阵 plt.xlabel('Depth') plt.ylabel('Position') plt.colorbar() # 颜色条,-1到1的颜色范围 plt.show() position_embedding = TransformerEmbedding.get_positional_encoding(64, 128) plot_position_embedding(position_embedding)

# Copyright 2018 DeepMind Technologies Limited # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # https://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================ """Code defining LEO inner loop. See "Meta-Learning with Latent Embedding Optimization" by Rusu et al. (https://arxiv.org/pdf/1807.05960.pdf). """ from __future__ import absolute_import from __future__ import division from __future__ import print_function import numpy as np from six.moves import range from six.moves import zip import sonnet as snt import tensorflow as tf import tensorflow_probability as tfp import data as data_module def get_orthogonality_regularizer(orthogonality_penalty_weight): """Returns the orthogonality regularizer.""" def orthogonality(weight): """Calculates the layer-wise penalty encouraging orthogonality.""" with tf.name_scope(None, "orthogonality", [weight]) as name: w2 = tf.matmul(weight, weight, transpose_b=True) wn = tf.norm(weight, ord=2, axis=1, keepdims=True) + 1e-32 correlation_matrix = w2 / tf.matmul(wn, wn, transpose_b=True) matrix_size = correlation_matrix.get_shape().as_list()[0] base_dtype = weight.dtype.base_dtype identity = tf.eye(matrix_size, dtype=base_dtype) weight_corr = tf.reduce_mean( tf.squared_difference(correlation_matrix, identity)) return tf.multiply( tf.cast(orthogonality_penalty_weight, base_dtype), weight_corr, name=name) return orthogonality class LEO(snt.AbstractModule): """Sonnet module implementing the inner loop of LEO.""" def __init__(self, config=None, use_64bits_dtype=True, name="leo"): super(LEO, self).__init__(name=name) self._float_dtype = tf.float64 if use_64bits_dtype else tf.float32 self._int_dtype = tf.int64 if use_64bits_dtype else tf.int32 self._inner_unroll_length = config["inner_unroll_length"] self._finetuning_unroll_length = config["finetuning_unroll_length"] self._inner_lr_init = config["inner_lr_init"] self._finetuning_lr_init = config["finetuning_lr_init"] self._num_latents = config["num_latents"] self._dropout_rate = config["dropout_rate"] self._kl_weight = config["kl_weight"] # beta self._encoder_penalty_weight = config["encoder_penalty_weight"] # gamma self._l2_penalty_weight = config["l2_penalty_weight"] # lambda_1 # lambda_2 self._orthogonality_penalty_weight = config["orthogonality_penalty_weight"] assert self._inner_unroll_length > 0, ("Positive unroll length is necessary" " to create the graph") def _build(self, data, is_meta_training=True): """Connects the LEO module to the graph, creating the variables. Args: data: A data_module.ProblemInstance constaining Tensors with the following shapes: - tr_input: (N, K, dim) - tr_output: (N, K, 1) - tr_info: (N, K) - val_input: (N, K_valid, dim) - val_output: (N, K_valid, 1) - val_info: (N, K_valid) where N is the number of classes (as in N-way) and K and the and K_valid are numbers of training and validation examples within a problem instance correspondingly (as in K-shot), and dim is the dimensionality of the embedding. is_meta_training: A boolean describing whether we run in the training mode. Returns: Tensor with the inner validation loss of LEO (include both adaptation in the latent space and finetuning). """ if isinstance(data, list): data = data_module.ProblemInstance(*data) self.is_meta_training = is_meta_training self.save_problem_instance_stats(data.tr_input) latents, kl = self.forward_encoder(data) tr_loss, adapted_classifier_weights, encoder_penalty = self.leo_inner_loop( data, latents) val_loss, val_accuracy = self.finetuning_inner_loop( data, tr_loss, adapted_classifier_weights) val_loss += self._kl_weight * kl val_loss += self._encoder_penalty_weight * encoder_penalty # The l2 regularization is is already added to the graph when constructing # the snt.Linear modules. We pass the orthogonality regularizer separately, # because it is not used in self.grads_and_vars. regularization_penalty = ( self._l2_regularization + self._decoder_orthogonality_reg) batch_val_loss = tf.reduce_mean(val_loss) batch_val_accuracy = tf.reduce_mean(val_accuracy) return batch_val_loss + regularization_penalty, batch_val_accuracy @snt.reuse_variables def leo_inner_loop(self, data, latents): with tf.variable_scope("leo_inner"): inner_lr = tf.get_variable( "lr", [1, 1, self._num_latents], dtype=self._float_dtype, initializer=tf.constant_initializer(self._inner_lr_init)) starting_latents = latents loss, _ = self.forward_decoder(data, latents) for _ in range(self._inner_unroll_length): loss_grad = tf.gradients(loss, latents) # dLtrain/dz latents -= inner_lr * loss_grad[0] loss, classifier_weights = self.forward_decoder(data, latents) if self.is_meta_training: encoder_penalty = tf.losses.mean_squared_error( labels=tf.stop_gradient(latents), predictions=starting_latents) encoder_penalty = tf.cast(encoder_penalty, self._float_dtype) else: encoder_penalty = tf.constant(0., self._float_dtype) return loss, classifier_weights, encoder_penalty @snt.reuse_variables def finetuning_inner_loop(self, data, leo_loss, classifier_weights): tr_loss = leo_loss with tf.variable_scope("finetuning"): finetuning_lr = tf.get_variable( "lr", [1, 1, self.embedding_dim], dtype=self._float_dtype, initializer=tf.constant_initializer(self._finetuning_lr_init)) for _ in range(self._finetuning_unroll_length): loss_grad = tf.gradients(tr_loss, classifier_weights) classifier_weights -= finetuning_lr * loss_grad[0] tr_loss, _ = self.calculate_inner_loss(data.tr_input, data.tr_output, classifier_weights) val_loss, val_accuracy = self.calculate_inner_loss( data.val_input, data.val_output, classifier_weights) return val_loss, val_accuracy @snt.reuse_variables def forward_encoder(self, data): encoder_outputs = self.encoder(data.tr_input) relation_network_outputs = self.relation_network(encoder_outputs) latent_dist_params = self.average_codes_per_class(relation_network_outputs) latents, kl = self.possibly_sample(latent_dist_params) return latents, kl @snt.reuse_variables def forward_decoder(self, data, latents): weights_dist_params = self.decoder(latents) # Default to glorot_initialization and not stddev=1. fan_in = self.embedding_dim.value fan_out = self.num_classes.value stddev_offset = np.sqrt(2. / (fan_out + fan_in)) classifier_weights, _ = self.possibly_sample(weights_dist_params, stddev_offset=stddev_offset) tr_loss, _ = self.calculate_inner_loss(data.tr_input, data.tr_output, classifier_weights) return tr_loss, classifier_weights @snt.reuse_variables def encoder(self, inputs): with tf.variable_scope("encoder"): after_dropout = tf.nn.dropout(inputs, rate=self.dropout_rate) regularizer = tf.contrib.layers.l2_regularizer(self._l2_penalty_weight) initializer = tf.initializers.glorot_uniform(dtype=self._float_dtype) encoder_module = snt.Linear( self._num_latents, use_bias=False, regularizers={"w": regularizer}, initializers={"w": initializer}, ) outputs = snt.BatchApply(encoder_module)(after_dropout) return outputs @snt.reuse_variables def relation_network(self, inputs): with tf.variable_scope("relation_network"): regularizer = tf.contrib.layers.l2_regularizer(self._l2_penalty_weight) initializer = tf.initializers.glorot_uniform(dtype=self._float_dtype) relation_network_module = snt.nets.MLP( [2 * self._num_latents] * 3, use_bias=False, regularizers={"w": regularizer}, initializers={"w": initializer}, ) total_num_examples = self.num_examples_per_class*self.num_classes inputs = tf.reshape(inputs, [total_num_examples, self._num_latents]) left = tf.tile(tf.expand_dims(inputs, 1), [1, total_num_examples, 1]) right = tf.tile(tf.expand_dims(inputs, 0), [total_num_examples, 1, 1]) concat_codes = tf.concat([left, right], axis=-1) outputs = snt.BatchApply(relation_network_module)(concat_codes) outputs = tf.reduce_mean(outputs, axis=1) # 2 * latents, because we are returning means and variances of a Gaussian outputs = tf.reshape(outputs, [self.num_classes, self.num_examples_per_class, 2 * self._num_latents]) return outputs @snt.reuse_variables def decoder(self, inputs): with tf.variable_scope("decoder"): l2_regularizer = tf.contrib.layers.l2_regularizer(self._l2_penalty_weight) orthogonality_reg = get_orthogonality_regularizer( self._orthogonality_penalty_weight) initializer = tf.initializers.glorot_uniform(dtype=self._float_dtype) # 2 * embedding_dim, because we are returning means and variances decoder_module = snt.Linear( 2 * self.embedding_dim, use_bias=False, regularizers={"w": l2_regularizer}, initializers={"w": initializer}, ) outputs = snt.BatchApply(decoder_module)(inputs) self._orthogonality_reg = orthogonality_reg(decoder_module.w) return outputs def average_codes_per_class(self, codes): codes = tf.reduce_mean(codes, axis=1, keep_dims=True) # K dimension # Keep the shape (N, K, *) codes = tf.tile(codes, [1, self.num_examples_per_class, 1]) return codes def possibly_sample(self, distribution_params, stddev_offset=0.): means, unnormalized_stddev = tf.split(distribution_params, 2, axis=-1) stddev = tf.exp(unnormalized_stddev) stddev -= (1. - stddev_offset) stddev = tf.maximum(stddev, 1e-10) distribution = tfp.distributions.Normal(loc=means, scale=stddev) if not self.is_meta_training: return means, tf.constant(0., dtype=self._float_dtype) samples = distribution.sample() kl_divergence = self.kl_divergence(samples, distribution) return samples, kl_divergence def kl_divergence(self, samples, normal_distribution): random_prior = tfp.distributions.Normal( loc=tf.zeros_like(samples), scale=tf.ones_like(samples)) kl = tf.reduce_mean( normal_distribution.log_prob(samples) - random_prior.log_prob(samples)) return kl def predict(self, inputs, weights): after_dropout = tf.nn.dropout(inputs, rate=self.dropout_rate) # This is 3-dimensional equivalent of a matrix product, where we sum over # the last (embedding_dim) dimension. We get [N, K, N, K] tensor as output. per_image_predictions = tf.einsum("ijk,lmk->ijlm", after_dropout, weights) # Predictions have shape [N, K, N]: for each image ([N, K] of them), what # is the probability of a given class (N)? predictions = tf.reduce_mean(per_image_predictions, axis=-1) return predictions def calculate_inner_loss(self, inputs, true_outputs, classifier_weights): model_outputs = self.predict(inputs, classifier_weights) model_predictions = tf.argmax( model_outputs, -1, output_type=self._int_dtype) accuracy = tf.contrib.metrics.accuracy(model_predictions, tf.squeeze(true_outputs, axis=-1)) return self.loss_fn(model_outputs, true_outputs), accuracy def save_problem_instance_stats(self, instance): num_classes, num_examples_per_class, embedding_dim = instance.get_shape() if hasattr(self, "num_classes"): assert self.num_classes == num_classes, ( "Given different number of classes (N in N-way) in consecutive runs.") if hasattr(self, "num_examples_per_class"): assert self.num_examples_per_class == num_examples_per_class, ( "Given different number of examples (K in K-shot) in consecutive" "runs.") if hasattr(self, "embedding_dim"): assert self.embedding_dim == embedding_dim, ( "Given different embedding dimension in consecutive runs.") self.num_classes = num_classes self.num_examples_per_class = num_examples_per_class self.embedding_dim = embedding_dim @property def dropout_rate(self): return self._dropout_rate if self.is_meta_training else 0.0 def loss_fn(self, model_outputs, original_classes): original_classes = tf.squeeze(original_classes, axis=-1) # Tensorflow doesn't handle second order gradients of a sparse_softmax yet. one_hot_outputs = tf.one_hot(original_classes, depth=self.num_classes) return tf.nn.softmax_cross_entropy_with_logits_v2( labels=one_hot_outputs, logits=model_outputs) def grads_and_vars(self, metatrain_loss): """Computes gradients of metatrain_loss, avoiding NaN. Uses a fixed penalty of 1e-4 to enforce only the l2 regularization (and not minimize the loss) when metatrain_loss or any of its gradients with respect to trainable_vars are NaN. In practice, this approach pulls the variables back into a feasible region of the space when the loss or its gradients are not defined. Args: metatrain_loss: A tensor with the LEO meta-training loss. Returns: A tuple with: metatrain_gradients: A list of gradient tensors. metatrain_variables: A list of variables for this LEO model. """ metatrain_variables = self.trainable_variables metatrain_gradients = tf.gradients(metatrain_loss, metatrain_variables) nan_loss_or_grad = tf.logical_or( tf.is_nan(metatrain_loss), tf.reduce_any([tf.reduce_any(tf.is_nan(g)) for g in metatrain_gradients])) regularization_penalty = ( 1e-4 / self._l2_penalty_weight * self._l2_regularization) zero_or_regularization_gradients = [ g if g is not None else tf.zeros_like(v) for v, g in zip(tf.gradients(regularization_penalty, metatrain_variables), metatrain_variables)] metatrain_gradients = tf.cond(nan_loss_or_grad, lambda: zero_or_regularization_gradients, lambda: metatrain_gradients, strict=True) return metatrain_gradients, metatrain_variables @property def _l2_regularization(self): return tf.cast( tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)), dtype=self._float_dtype) @property def _decoder_orthogonality_reg(self): return self._orthogonality_reg 中解码器将潜在代码映射为分类器参数,然后将分类器参数用于更新分类器,再通过更新后的分类器求损失,这些对应哪段代码?

# 这是一个示例 Python 脚本。 # 按 Shift+F10 执行或将其替换为您的代码。 # 按 双击 Shift 在所有地方搜索类、文件、工具窗口、操作和设置。 import argparse import math import pickle import torch import torch.nn as nn import torch.nn.functional as F from tqdm import tqdm from omegaconf import OmegaConf from sklearn.metrics import f1_score from torch.utils.data import Dataset, DataLoader from torch.nn import TransformerEncoderLayer, TransformerEncoder restypes = [ 'A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V' ] unsure_restype = 'X' unknown_restype = 'U' def make_dataset(data_config, train_rate=0.7, valid_rate=0.2): data_path = data_config.data_path with open(data_path, 'rb') as f: data = pickle.load(f) total_number = len(data) train_sep = int(total_number * train_rate) valid_sep = int(total_number * (train_rate + valid_rate)) train_data_dicts = data[:train_sep] valid_data_dicts = data[train_sep:valid_sep] test_data_dicts = data[valid_sep:] train_dataset = DisProtDataset(train_data_dicts) valid_dataset = DisProtDataset(valid_data_dicts) test_dataset = DisProtDataset(test_data_dicts) return train_dataset, valid_dataset, test_dataset class DisProtDataset(Dataset): def __init__(self, dict_data): sequences = [d['sequence'] for d in dict_data] labels = [d['label'] for d in dict_data] assert len(sequences) == len(labels) self.sequences = sequences self.labels = labels self.residue_mapping = {'X':20} self.residue_mapping.update(dict(zip(restypes, range(len(restypes))))) def __len__(self): return len(self.sequences) def __getitem__(self, idx): sequence = torch.zeros(len(self.sequences[idx]), len(self.residue_mapping)) for i, c in enumerate(self.sequences[idx]): if c not in restypes: c = 'X' sequence[i][self.residue_mapping[c]] = 1 label = torch.tensor([int(c) for c in self.labels[idx]], dtype=torch.long) return sequence, label class PositionalEncoding(nn.Module): def __init__(self, d_model, dropout=0.0, max_len=40): super().__init__() position = torch.arange(max_len).unsqueeze(1) div_term = torch.exp( torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model) ) pe = torch.zeros(1, max_len, d_model) pe[0, :, 0::2] = torch.sin(position * div_term) pe[0, :, 1::2] = torch.cos(position * div_term) self.register_buffer("pe", pe) self.dropout = nn.Dropout(p=dropout) def forward(self, x): if len(x.shape) == 3: x = x + self.pe[:, : x.size(1)] elif len(x.shape) == 4: x = x + self.pe[:, :x.size(1), None, :] return self.dropout(x) class DisProtModel(nn.Module): def __init__(self, model_config): super().__init__() self.d_model = model_config.d_model self.n_head = model_config.n_head self.n_layer = model_config.n_layer self.input_layer = nn.Linear(model_config.i_dim, self.d_model) self.position_embed = PositionalEncoding(self.d_model, max_len=20000) self.input_norm = nn.LayerNorm(self.d_model) self.dropout_in = nn.Dropout(p=0.1) encoder_layer = TransformerEncoderLayer( d_model=self.d_model, nhead=self.n_head, activation='gelu', batch_first=True) self.transformer = TransformerEncoder(encoder_layer, num_layers=self.n_layer) self.output_layer = nn.Sequential( nn.Linear(self.d_model, self.d_model), nn.GELU(), nn.Dropout(p=0.1), nn.Linear(self.d_model, model_config.o_dim) ) def forward(self, x): x = self.input_layer(x) x = self.position_embed(x) x = self.input_norm(x) x = self.dropout_in(x) x = self.transformer(x) x = self.output_layer(x) return x def metric_fn(pred, gt): pred = pred.detach().cpu() gt = gt.detach().cpu() pred_labels = torch.argmax(pred, dim=-1).view(-1) gt_labels = gt.view(-1) score = f1_score(y_true=gt_labels, y_pred=pred_labels, average='micro') return score if __name__ == '__main__': device = 'cuda' if torch.cuda.is_available() else 'cpu' parser = argparse.ArgumentParser('IDRs prediction') parser.add_argument('--config_path', default='./config.yaml') args = parser.parse_args() config = OmegaConf.load(args.config_path) train_dataset, valid_dataset, test_dataset = make_dataset(config.data) train_dataloader = DataLoader(dataset=train_dataset, **config.train.dataloader) valid_dataloader = DataLoader(dataset=valid_dataset, batch_size=1, shuffle=False) model = DisProtModel(config.model) model = model.to(device) optimizer = torch.optim.AdamW(model.parameters(), lr=config.train.optimizer.lr, weight_decay=config.train.optimizer.weight_decay) loss_fn = nn.CrossEntropyLoss() model.eval() metric = 0. with torch.no_grad(): for sequence, label in valid_dataloader: sequence = sequence.to(device) label = label.to(device) pred = model(sequence) metric += metric_fn(pred, label) print("init f1_score:", metric / len(valid_dataloader)) for epoch in range(config.train.epochs): # train loop progress_bar = tqdm( train_dataloader, initial=0, desc=f"epoch:{epoch:03d}", ) model.train() total_loss = 0. for sequence, label in progress_bar: sequence = sequence.to(device) label = label.to(device) pred = model(sequence) loss = loss_fn(pred.permute(0, 2, 1), label) progress_bar.set_postfix(loss=loss.item()) total_loss += loss.item() optimizer.zero_grad() loss.backward() optimizer.step() avg_loss = total_loss / len(train_dataloader) # valid loop model.eval() metric = 0. with torch.no_grad(): for sequence, label in valid_dataloader: sequence = sequence.to(device) label = label.to(device) pred = model(sequence) metric += metric_fn(pred, label) print(f"avg_training_loss: {avg_loss}, f1_score: {metric / len(valid_dataloader)}") # 保存当前 epoch 的模型 save_path = f"model.pkl" torch.save(model.state_dict(), save_path) print(f"Model saved to {save_path}") 帮我分析一下这个代码是干什么的

import tensorflow as tf import numpy as np from keras import datasets import cv2 (train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data() import matplotlib.pyplot as plt class PositionEmbedding(tf.keras.layers.Layer): """位置编码层,为序列添加位置信息""" def __init__(self, max_len, embed_dim): super().__init__() self.max_len = max_len self.embed_dim = embed_dim # 可学习的位置编码矩阵:max_len × embed_dim self.pos_emb = tf.keras.layers.Embedding( input_dim=max_len, output_dim=embed_dim ) def call(self, x): # 生成位置索引序列(0到序列长度-1) positions = tf.range(start=0, limit=tf.shape(x)[1], delta=1) # 将位置编码加到输入特征上 return x + self.pos_emb(positions) # 添加get_config方法 def get_config(self): config = super().get_config() # 获取父类配置 config.update({ "max_len": self.max_len, "embed_dim": self.embed_dim }) return config class TransformerBlock(tf.keras.layers.Layer): """Transformer编码块,包含多头注意力和前馈网络""" def __init__(self, embed_dim, num_heads): super().__init__() self.embed_dim = embed_dim self.num_heads = num_heads # 多头注意力机制(核心组件) self.att = tf.keras.layers.MultiHeadAttention( num_heads=num_heads, key_dim=embed_dim # 每个头的维度 ) # 前馈网络(两全连接层) self.ffn = tf.keras.Sequential([ tf.keras.layers.Dense(embed_dim*4, activation="relu"), # 扩展维度 tf.keras.layers.Dense(embed_dim) # 恢复维度 ]) # 层归一化组件 self.layernorm1 = tf.keras.layers.LayerNormalization() self.layernorm2 = tf.keras.layers.LayerNormalization() def call(self, inputs): # 自注意力机制(q=k=v=inputs) attn_output = self.att(inputs, inputs) # 残差连接 + 层归一化 out1 = self.layernorm1(inputs + attn_output) # 前馈网络处理 ffn_output = self.ffn(out1) # 再次残差连接 + 层归一化 return self.layernorm2(out1 + ffn_output) # 添加get_config方法 def get_config(self): config = super().get_config() # 获取父类配置 config.update({ "embed_dim": self.embed_dim, "num_heads": self.num_heads, }) return config model = tf.keras.models.load_model('transform_model.keras', custom_objects={ "PositionEmbedding": PositionEmbedding, "TransformerBlock": TransformerBlock } ) # Evaluate the restored model loss, acc = model.evaluate(test_images, test_labels, verbose=2) pre = model.predict(test_images) # 对所有测试图片进行预测 print('Restored model, accuracy: {:5.2f}%'.format(100 * acc)) # 输出第一张图片的预测结果 print( pre[1]*100) print( np.argmax(pre[3])) plt.imshow(test_images[3]) plt.show() cv2.waitKey(0)PS D:\source\test3> & C:/ProgramData/anaconda3/envs/LSTM-TESFLOW/python.exe d:/source/test3/predict.py 2025-03-11 16:01:35.284449: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2025-03-11 16:01:35.291695: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "d:/source/test3/predict.py", line 75, in <module> model = tf.keras.models.load_model('transform_model.keras', File "C:\ProgramData\anaconda3\envs\LSTM-TESFLOW\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\ProgramData\anaconda3\envs\LSTM-TESFLOW\lib\site-packages\keras\engine\base_layer.py", line 783, in from_config return cls(**config) TypeError: __init__() got an unexpected keyword argument 'name'

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. !pip install transformers datasets torch rouge-score matplotlib import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import Dataset, DataLoader from transformers import BertTokenizerFast import time import numpy as np from datasets import load_dataset from rouge_score import rouge_scorer import matplotlib.pyplot as plt from IPython.display import clear_output # 设备配置 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"使用设备: {device}") # 数据预处理(严格过滤无效样本) class SummaryDataset(Dataset): def __init__(self, dataset_split, tokenizer, max_article_len=384, max_summary_len=96, subset_size=0.01): self.tokenizer = tokenizer self.max_article_len = max_article_len self.max_summary_len = max_summary_len self.subset = dataset_split.select(range(int(len(dataset_split) * subset_size))) # 严格过滤无效样本 self.articles = [] self.summaries = [] self.vocab = set(tokenizer.vocab.keys()) for item in self.subset: article = item['article'].strip() summary = item['highlights'].strip() if len(article) > 20 and len(summary) > 10: article_tokens = tokenizer.tokenize(article) summary_tokens = tokenizer.tokenize(summary) if all(t in self.vocab for t in article_tokens) and all(t in self.vocab for t in summary_tokens): self.articles.append(article) self.summaries.append(summary) self.pad_token_id = tokenizer.pad_token_id self.unk_token_id = tokenizer.unk_token_id def __len__(self): return len(self.articles) def __getitem__(self, idx): src = self.tokenizer( self.articles[idx], max_length=self.max_article_len, truncation=True, padding='max_length', return_tensors='pt', add_special_tokens=True ) tgt = self.tokenizer( self.summaries[idx], max_length=self.max_summary_len, truncation=True, padding='max_length', return_tensors='pt', add_special_tokens=True ) tgt_labels = tgt['input_ids'].squeeze() tgt_labels[tgt_labels == self.pad_token_id] = -100 # 忽略填充 tgt_labels[tgt_labels >= len(self.tokenizer.vocab)] = self.unk_token_id # 过滤无效id return { 'input_ids': src['input_ids'].squeeze(), 'attention_mask': src['attention_mask'].squeeze(), 'labels': tgt_labels } # 基础Seq2Seq模型 class BasicEncoder(nn.Module): def __init__(self, vocab_size, emb_dim=128, hidden_dim=256): super().__init__() self.embedding = nn.Embedding(vocab_size, emb_dim, padding_idx=0) self.gru = nn.GRU(emb_dim, hidden_dim, num_layers=2, batch_first=True, bidirectional=True) self.fc_hidden = nn.Linear(hidden_dim * 2, hidden_dim) def forward(self, src): embedded = self.embedding(src) outputs, hidden = self.gru(embedded) # 取第二层双向隐藏状态 forward_hidden = hidden[-2, :, :] # 第二层正向 backward_hidden = hidden[-1, :, :] # 第二层反向 hidden = torch.cat([forward_hidden, backward_hidden], dim=1) # (batch, 2*hidden_dim) hidden = self.fc_hidden(hidden).unsqueeze(0) # (1, batch, hidden_dim) return hidden class BasicDecoder(nn.Module): def __init__(self, vocab_size, emb_dim=128, hidden_dim=256): super().__init__() self.embedding = nn.Embedding(vocab_size, emb_dim, padding_idx=0) self.gru = nn.GRU(emb_dim + hidden_dim, hidden_dim, num_layers=1, batch_first=True) self.fc = nn.Linear(hidden_dim * 2 + emb_dim, vocab_size) def forward(self, input_ids, hidden, context): input_embedded = self.embedding(input_ids.unsqueeze(1)) # (batch, 1, emb_dim) input_combined = torch.cat([input_embedded, context.unsqueeze(1)], dim=2) # (batch, 1, emb_dim+hidden_dim) output, hidden = self.gru(input_combined, hidden) # (batch, 1, hidden_dim) output = output.squeeze(1) # (batch, hidden_dim) combined = torch.cat([output, context, input_embedded.squeeze(1)], dim=1) # (batch, 2*hidden_dim+emb_dim) logits = self.fc(combined) return logits, hidden class BasicSeq2Seq(nn.Module): def __init__(self, vocab_size, emb_dim=128, hidden_dim=256): super().__init__() self.encoder = BasicEncoder(vocab_size, emb_dim, hidden_dim) self.decoder = BasicDecoder(vocab_size, emb_dim, hidden_dim) self.device = device self.sos_token_id = 101 # [CLS] self.eos_token_id = 102 # [SEP] self.unk_token_id = 100 # [UNK] def forward(self, src, tgt): hidden = self.encoder(src) context = hidden.squeeze(0) batch_size, tgt_len = tgt.size() outputs = torch.zeros(batch_size, tgt_len, self.decoder.fc.out_features).to(device) input_ids = tgt[:, 0] for t in range(1, tgt_len): logits, hidden = self.decoder(input_ids, hidden, context) outputs[:, t] = logits input_ids = tgt[:, t] return outputs def generate(self, src, max_length=80): src = src.to(device) hidden = self.encoder(src) context = hidden.squeeze(0) # 修正后的生成初始化 generated = torch.full((src.size(0), 1), self.sos_token_id, device=device) # 注意这里的修正 for _ in range(max_length-1): logits, hidden = self.decoder(generated[:, -1], hidden, context) next_token = torch.argmax(logits, dim=1, keepdim=True) # 防止过早生成标点 if generated.size(1) < 5: punctuation = [',', '.', ';', ':', '!', '?', "'", '"', '', '~'] punct_ids = [self.tokenizer.convert_tokens_to_ids(p) for p in punctuation] if next_token.item() in punct_ids: # 替换为最常见的实词 next_token = torch.tensor([[self.tokenizer.convert_tokens_to_ids('the')]], device=device) generated = torch.cat([generated, next_token], dim=1) if (next_token == self.eos_token_id).all(): break return generated # 注意力Seq2Seq模型 class Attention(nn.Module): def __init__(self, hidden_dim): super().__init__() self.W = nn.Linear(2 * hidden_dim, hidden_dim) self.v = nn.Linear(hidden_dim, 1, bias=False) def forward(self, hidden, encoder_outputs): src_len = encoder_outputs.size(1) hidden = hidden.unsqueeze(1).repeat(1, src_len, 1) # (batch, src_len, hidden_dim) combined = torch.cat([hidden, encoder_outputs], dim=2) # (batch, src_len, 2*hidden_dim) energy = self.v(torch.tanh(self.W(combined))).squeeze(2) # (batch, src_len) return torch.softmax(energy, dim=1) class AttnEncoder(nn.Module): def __init__(self, vocab_size, emb_dim=128, hidden_dim=256): super().__init__() self.embedding = nn.Embedding(vocab_size, emb_dim, padding_idx=0) self.lstm = nn.LSTM(emb_dim, hidden_dim, num_layers=2, batch_first=True, bidirectional=True, dropout=0.1) self.fc_hidden = nn.Linear(hidden_dim * 2, hidden_dim) # 双向输出拼接 self.fc_cell = nn.Linear(hidden_dim * 2, hidden_dim) def forward(self, src): embedded = self.embedding(src) outputs, (hidden, cell) = self.lstm(embedded) # outputs: (batch, src_len, 2*hidden_dim) # 取第二层双向隐藏状态 hidden = torch.cat([hidden[-2, :, :], hidden[-1, :, :]], dim=1) # (batch, 2*hidden_dim) cell = torch.cat([cell[-2, :, :], cell[-1, :, :]], dim=1) hidden = self.fc_hidden(hidden).unsqueeze(0) # (1, batch, hidden_dim) cell = self.fc_cell(cell).unsqueeze(0) return outputs, (hidden, cell) class AttnDecoder(nn.Module): def __init__(self, vocab_size, emb_dim=128, hidden_dim=256): super().__init__() self.embedding = nn.Embedding(vocab_size, emb_dim, padding_idx=0) self.attention = Attention(hidden_dim) self.lstm = nn.LSTM(emb_dim + 2 * hidden_dim, hidden_dim, num_layers=1, batch_first=True) self.fc = nn.Linear(hidden_dim + emb_dim, vocab_size) def forward(self, input_ids, hidden, cell, encoder_outputs): input_embedded = self.embedding(input_ids.unsqueeze(1)) # (batch, 1, emb_dim) attn_weights = self.attention(hidden.squeeze(0), encoder_outputs) # (batch, src_len) context = torch.bmm(attn_weights.unsqueeze(1), encoder_outputs) # (batch, 1, 2*hidden_dim) lstm_input = torch.cat([input_embedded, context], dim=2) # (batch, 1, emb_dim+2*hidden_dim) output, (hidden, cell) = self.lstm(lstm_input, (hidden, cell)) # output: (batch, 1, hidden_dim) logits = self.fc(torch.cat([output.squeeze(1), input_embedded.squeeze(1)], dim=1)) # (batch, vocab_size) return logits, hidden, cell class AttnSeq2Seq(nn.Module): def __init__(self, vocab_size, emb_dim=128, hidden_dim=256): super().__init__() self.encoder = AttnEncoder(vocab_size, emb_dim, hidden_dim) self.decoder = AttnDecoder(vocab_size, emb_dim, hidden_dim) self.device = device self.sos_token_id = 101 # [CLS] self.eos_token_id = 102 # [SEP] self.unk_token_id = 100 # [UNK] def forward(self, src, tgt): encoder_outputs, (hidden, cell) = self.encoder(src) batch_size, tgt_len = tgt.size() outputs = torch.zeros(batch_size, tgt_len, self.decoder.fc.out_features).to(device) input_ids = tgt[:, 0] for t in range(1, tgt_len): logits, hidden, cell = self.decoder(input_ids, hidden, cell, encoder_outputs) outputs[:, t] = logits input_ids = tgt[:, t] return outputs def generate(self, src, max_length=80): encoder_outputs, (hidden, cell) = self.encoder(src) # 修正后的生成初始化 generated = torch.full((src.size(0), 1), self.sos_token_id, device=device) # 注意这里的修正 for _ in range(max_length-1): logits, hidden, cell = self.decoder(generated[:, -1], hidden, cell, encoder_outputs) next_token = torch.argmax(logits, dim=1, keepdim=True) # 防止过早生成标点 if generated.size(1) < 5: punctuation = [',', '.', ';', ':', '!', '?', "'", '"', '', '~'] punct_ids = [self.tokenizer.convert_tokens_to_ids(p) for p in punctuation] if next_token.item() in punct_ids: # 替换为最常见的实词 next_token = torch.tensor([[self.tokenizer.convert_tokens_to_ids('the')]], device=device) generated = torch.cat([generated, next_token], dim=1) if (next_token == self.eos_token_id).all(): break return generated # Transformer模型 class PositionalEncoding(nn.Module): def __init__(self, d_model, max_len=5000): super().__init__() pe = torch.zeros(max_len, d_model) position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-np.log(10000.0) / d_model)) pe[:, 0::2] = torch.sin(position * div_term) pe[:, 1::2] = torch.cos(position * div_term) self.register_buffer('pe', pe.unsqueeze(0)) def forward(self, x): return x + self.pe[:, :x.size(1)] class TransformerModel(nn.Module): def __init__(self, vocab_size, d_model=128, nhead=8, num_layers=3, dim_feedforward=512, max_len=5000): super().__init__() self.embedding = nn.Embedding(vocab_size, d_model, padding_idx=0) self.pos_encoder = PositionalEncoding(d_model, max_len) # 编码器 encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout=0.1) self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers) # 解码器 decoder_layer = nn.TransformerDecoderLayer(d_model, nhead, dim_feedforward, dropout=0.1) self.transformer_decoder = nn.TransformerDecoder(decoder_layer, num_layers) self.fc = nn.Linear(d_model, vocab_size) self.d_model = d_model self.sos_token_id = 101 # [CLS] self.eos_token_id = 102 # [SEP] def _generate_square_subsequent_mask(self, sz): mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1) mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0)) return mask def forward(self, src, tgt): src_mask = None tgt_mask = self._generate_square_subsequent_mask(tgt.size(1)).to(device) src_key_padding_mask = (src == 0) tgt_key_padding_mask = (tgt == 0) src = self.embedding(src) * np.sqrt(self.d_model) src = self.pos_encoder(src) tgt = self.embedding(tgt) * np.sqrt(self.d_model) tgt = self.pos_encoder(tgt) memory = self.transformer_encoder(src.transpose(0, 1), src_mask, src_key_padding_mask) output = self.transformer_decoder( tgt.transpose(0, 1), memory, tgt_mask, None, tgt_key_padding_mask, src_key_padding_mask ) output = self.fc(output.transpose(0, 1)) return output def generate(self, src, max_length=80): src_mask = None src_key_padding_mask = (src == 0) src = self.embedding(src) * np.sqrt(self.d_model) src = self.pos_encoder(src) memory = self.transformer_encoder(src.transpose(0, 1), src_mask, src_key_padding_mask) batch_size = src.size(0) generated = torch.full((batch_size, 1), self.sos_token_id, device=device) for i in range(max_length-1): tgt_mask = self._generate_square_subsequent_mask(generated.size(1)).to(device) tgt_key_padding_mask = (generated == 0) tgt = self.embedding(generated) * np.sqrt(self.d_model) tgt = self.pos_encoder(tgt) output = self.transformer_decoder( tgt.transpose(0, 1), memory, tgt_mask, None, tgt_key_padding_mask, src_key_padding_mask ) output = self.fc(output.transpose(0, 1)[:, -1, :]) next_token = torch.argmax(output, dim=1, keepdim=True) generated = torch.cat([generated, next_token], dim=1) if (next_token == self.eos_token_id).all(): break return generated # 训练函数 def train_model(model, train_loader, optimizer, criterion, epochs=3): model.train() optimizer = optim.Adam(model.parameters(), lr=1e-4) scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=1, factor=0.5) start_time = time.time() for epoch in range(epochs): total_loss = 0 model.train() for i, batch in enumerate(train_loader): src = batch['input_ids'].to(device) tgt = batch['labels'].to(device) optimizer.zero_grad() outputs = model(src, tgt[:, :-1]) # 检查模型输出有效性 if torch.isnan(outputs).any(): print("警告:模型输出包含NaN,跳过此批次") continue loss = criterion(outputs.reshape(-1, outputs.size(-1)), tgt[:, 1:].reshape(-1)) loss.backward() torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5) # 梯度裁剪 optimizer.step() total_loss += loss.item() if (i+1) % 10 == 0: print(f"Epoch {epoch+1}/{epochs} | Batch {i+1}/{len(train_loader)} | Loss: {loss.item():.4f}") avg_loss = total_loss / len(train_loader) scheduler.step(avg_loss) print(f"Epoch {epoch+1} | 平均损失: {avg_loss:.4f}") torch.cuda.empty_cache() total_time = time.time() - start_time print(f"训练完成!总耗时: {total_time:.2f}s ({total_time/60:.2f}分钟)") return model, total_time # 评估函数 def evaluate_model(model, val_loader, tokenizer, num_examples=2): model.eval() scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True) rouge_scores = {'rouge1': [], 'rouge2': [], 'rougeL': []} valid_count = 0 with torch.no_grad(): for i, batch in enumerate(val_loader): src = batch['input_ids'].to(device) tgt = batch['labels'].to(device) generated = model.generate(src) for s, p, t in zip(src, generated, tgt): src_txt = tokenizer.decode(s, skip_special_tokens=True) pred_txt = tokenizer.decode(p, skip_special_tokens=True) true_txt = tokenizer.decode(t[t != -100], skip_special_tokens=True) if len(pred_txt.split()) > 3 and len(true_txt.split()) > 3: valid_count += 1 if valid_count <= num_examples: print(f"\n原文: {src_txt[:100]}...") print(f"生成: {pred_txt}") print(f"参考: {true_txt[:80]}...") print("-"*60) if true_txt and pred_txt: scores = scorer.score(true_txt, pred_txt) for key in rouge_scores: rouge_scores[key].append(scores[key].fmeasure) if valid_count > 0: avg_scores = {key: sum(rouge_scores[key])/len(rouge_scores[key]) for key in rouge_scores} print(f"\n评估结果 (基于{valid_count}个样本):") print(f"ROUGE-1: {avg_scores['rouge1']*100:.2f}%") print(f"ROUGE-2: {avg_scores['rouge2']*100:.2f}%") print(f"ROUGE-L: {avg_scores['rougeL']*100:.2f}%") else: print("警告:未生成有效摘要") avg_scores = {key: 0.0 for key in rouge_scores} return avg_scores # 可视化模型性能 def visualize_model_performance(model_names, train_times, rouge_scores): plt.figure(figsize=(15, 6)) # 训练时间对比图 plt.subplot(1, 2, 1) bars = plt.bar(model_names, train_times) plt.title('模型训练时间对比') plt.ylabel('时间 (分钟)') for bar in bars: height = bar.get_height() plt.text(bar.get_x() + bar.get_width()/2., height, f'{height:.1f} min', ha='center', va='bottom') # ROUGE分数对比图 plt.subplot(1, 2, 2) x = np.arange(len(model_names)) width = 0.25 plt.bar(x - width, [scores['rouge1'] for scores in rouge_scores], width, label='ROUGE-1') plt.bar(x, [scores['rouge2'] for scores in rouge_scores], width, label='ROUGE-2') plt.bar(x + width, [scores['rougeL'] for scores in rouge_scores], width, label='ROUGE-L') plt.title('模型ROUGE分数对比') plt.ylabel('F1分数') plt.xticks(x, model_names) plt.legend() plt.tight_layout() plt.savefig('performance_comparison.png') plt.show() print("性能对比图已保存为 performance_comparison.png") # 交互式文本摘要生成 def interactive_summarization(models, tokenizer, model_names, max_length=80): while True: print("\n" + "="*60) print("文本摘要交互式测试 (输入 'q' 退出)") print("="*60) input_text = input("请输入要摘要的文本:\n") if input_text.lower() == 'q': break if len(input_text) < 50: print("请输入更长的文本(至少50个字符)") continue # 生成摘要 inputs = tokenizer( input_text, max_length=384, truncation=True, padding='max_length', return_tensors='pt' ).to(device) print("\n生成摘要中...") all_summaries = [] for i, model in enumerate(models): model.eval() with torch.no_grad(): generated = model.generate(inputs["input_ids"]) summary = tokenizer.decode(generated[0], skip_special_tokens=True) all_summaries.append(summary) # 打印结果 print(f"\n{model_names[i]} 摘要:") print("-"*50) print(summary) print("-"*50) print("\n所有模型摘要对比:") for i, (name, summary) in enumerate(zip(model_names, all_summaries)): print(f"{i+1}. {name}: {summary}") # 主程序 print("加载数据集...") dataset = load_dataset("cnn_dailymail", "3.0.0") tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased') vocab_size = len(tokenizer.vocab) # 准备训练数据 print("准备训练数据...") train_ds = SummaryDataset(dataset['train'], tokenizer, subset_size=0.01) # 使用1%的数据 val_ds = SummaryDataset(dataset['validation'], tokenizer, subset_size=0.01) train_loader = DataLoader(train_ds, batch_size=4, shuffle=True, num_workers=0) val_loader = DataLoader(val_ds, batch_size=8, shuffle=False, num_workers=0) # 定义损失函数 criterion = nn.CrossEntropyLoss(ignore_index=-100) # 训练基础Seq2Seq print("\n" + "="*60) print("训练基础Seq2Seq模型") print("="*60) basic_model = BasicSeq2Seq(vocab_size).to(device) trained_basic, basic_time = train_model(basic_model, train_loader, None, criterion, epochs=3) basic_rouge = evaluate_model(trained_basic, val_loader, tokenizer) # 训练注意力Seq2Seq print("\n" + "="*60) print("训练注意力Seq2Seq模型") print("="*60) attn_model = AttnSeq2Seq(vocab_size).to(device) trained_attn, attn_time = train_model(attn_model, train_loader, None, criterion, epochs=3) attn_rouge = evaluate_model(trained_attn, val_loader, tokenizer) # 训练Transformer print("\n" + "="*60) print("训练Transformer模型") print("="*60) transformer_model = TransformerModel(vocab_size).to(device) trained_transformer, transformer_time = train_model(transformer_model, train_loader, None, criterion, epochs=3) transformer_rouge = evaluate_model(trained_transformer, val_loader, tokenizer) # 可视化模型性能 print("\n" + "="*60) print("模型性能对比") print("="*60) model_names = ['基础Seq2Seq', '注意力Seq2Seq', 'Transformer'] train_times = [basic_time/60, attn_time/60, transformer_time/60] rouge_scores = [basic_rouge, attn_rouge, transformer_rouge] visualize_model_performance(model_names, train_times, rouge_scores) # 交互式测试 print("\n" + "="*60) print("交互式文本摘要测试") print("="*60) print("提示:输入一段文本,将同时生成三个模型的摘要结果") interactive_summarization( [trained_basic, trained_attn, trained_transformer], tokenizer, model_names ) 修改完错误后发完整代码给我

最新推荐

recommend-type

三菱FX3U三轴伺服电机与威纶通触摸屏组合程序详解:轴点动、回零与定位控制及全流程解析

三菱FX3U三轴伺服电机与威纶通触摸屏的程序编写方法及其应用。主要内容涵盖伺服电机主控程序、触摸屏程序、轴点动、回零及定位程序、通讯模块程序以及威纶显示器程序的分析。通过对各个模块的深入探讨,帮助读者理解每个部分的功能和实现方式,确保机械运动控制的准确性、高效性和稳定性。此外,文章还提供了关于程序编写过程中可能遇到的问题及解决方案。 适合人群:从事自动化控制领域的工程师和技术人员,尤其是对三菱FX3U三轴伺服电机和威纶通触摸屏有实际操作需求的专业人士。 使用场景及目标:适用于工业自动化项目中,旨在提高对三菱FX3U三轴伺服电机和威纶通触摸屏的理解和应用能力,掌握模块化编程技巧,解决实际工程中的编程难题。 其他说明:文中不仅讲解了各模块的具体实现细节,还强调了程序的安全性和可靠性,为项目的成功实施提供了有力的支持。
recommend-type

职业介绍与人才招聘综合管理系统-基于宏达数据库信息管理开发平台的专业人力资源服务软件-包含基本信息设置-用人单位管理-求职人员登记-数据查询-统计分析-报表生成-打印输出-权限控制.zip

cursor免费次数用完职业介绍与人才招聘综合管理系统_基于宏达数据库信息管理开发平台的专业人力资源服务软件_包含基本信息设置_用人单位管理_求职人员登记_数据查询_统计分析_报表生成_打印输出_权限控制.zip
recommend-type

基于Spark2x分布式计算框架的实时新闻大数据分析可视化系统-实现用户浏览日志采集与实时处理-新闻话题热度排名统计-时段流量峰值分析-新闻曝光量监控-数据可视化展示-采用Kaf.zip

基于Spark2x分布式计算框架的实时新闻大数据分析可视化系统_实现用户浏览日志采集与实时处理_新闻话题热度排名统计_时段流量峰值分析_新闻曝光量监控_数据可视化展示_采用Kaf.zip大数据实战项目
recommend-type

基于springboot小型哺乳类宠物诊所管理系统-4339s0c8【附万字论文+PPT+包部署+录制讲解视频】.zip

基于springboot小型哺乳类宠物诊所管理系统-4339s0c8【附万字论文+PPT+包部署+录制讲解视频】.zip
recommend-type

基于Simulink的风电永磁同步电机并网系统仿真模型与SVPWM控制机制探究

基于Simulink/Matlab构建的风电永磁同步电机并网系统的仿真模型。该模型主要涵盖了SVPWM控制、MPPT风能跟踪算法以及Crowbar电路的低压穿越功能。文中首先解释了机侧变流器的工作原理及其核心——MPPT算法的具体实现方法,采用了黄金分割法进行最大功率点跟踪,并提供了相应的Matlab函数代码。接着讨论了网侧变流器的电网电压定向控制和SVPWM模块的应用,强调了载波频率设置和死区补偿的重要性。对于Crowbar电路部分,则着重讲述了其触发逻辑和保护机制,确保在电网电压骤降时能够稳定运行。此外,还分享了一些仿真设置的小技巧,如选择合适的求解器和优化参数的方法。 适合人群:从事风电系统研究的技术人员、高校相关专业师生、对电力电子控制系统感兴趣的工程技术人员。 使用场景及目标:①为风电并网仿真提供可靠的模型支持;②深入理解SVPWM控制、MPPT算法和Crowbar电路的功能;③掌握风电系统关键组件的设计与优化方法。 其他说明:本文不仅提供了详细的理论解析和技术细节,还附带了具体的代码片段,便于读者实际操作和验证。
recommend-type

Pansophica开源项目:智能Web搜索代理的探索

Pansophica开源项目是一个相对较新且具有创新性的智能Web搜索代理,它突破了传统搜索引擎的界限,提供了一种全新的交互方式。首先,我们来探讨“智能Web搜索代理”这一概念。智能Web搜索代理是一个软件程序或服务,它可以根据用户的查询自动执行Web搜索,并尝试根据用户的兴趣、历史搜索记录或其他输入来提供个性化的搜索结果。 Pansophica所代表的不仅仅是搜索结果的展示,它还强调了一个交互式的体验,在动态和交互式虚拟现实中呈现搜索结果。这种呈现方式与现有的搜索体验有着根本的不同。目前的搜索引擎,如Google、Bing和Baidu等,多以静态文本和链接列表的形式展示结果。而Pansophica通过提供一个虚拟现实环境,使得搜索者可以“扭转”视角,进行“飞行”探索,以及“弹网”来浏览不同的内容。这种多维度的交互方式使得信息的浏览变得更加快速和直观,有望改变用户与网络信息互动的方式。 接着,我们关注Pansophica的“开源”属性。所谓开源,指的是软件的源代码可以被公众获取,任何个人或组织都可以自由地使用、学习、修改和分发这些代码。开源软件通常由社区进行开发和维护,这样的模式鼓励了协作创新并减少了重复性劳动,因为全世界的开发者都可以贡献自己的力量。Pansophica项目作为开源软件,意味着其他开发者可以访问和使用其源代码,进一步改进和扩展其功能,甚至可以为Pansophica构建新的应用或服务。 最后,文件名称“Pansophica-src-1.3”表明了我们讨论的特定版本的Pansophica开源代码。数字“1.3”很可能指的是该版本号,表明这是Pansophica项目的第1.3个公开版本。这个版本号对于开发者来说很重要,因为它可以帮助他们追踪更新的进度、比较版本间的差异以及选择最适合其项目的版本。 综合以上信息,Pansophica开源项目不仅在搜索代理领域引入了创新的交互方式,而且通过开源的方式,提供了技术共享和协作的可能性。对于IT行业来说,开源项目是推动技术发展的重要力量,它激发了社区的活力,促进了知识的传播和技术的进步。通过构建一个开源社区,Pansophica可以吸引来自世界各地的开发者和研究人员,共同探索如何将搜索技术与交互式虚拟现实技术结合起来,为用户带来更加丰富和有效的信息检索体验。同时,Pansophica的开源性质也使其更容易适应未来技术的发展,随着社区的不断贡献和反馈,这一平台有望在功能和性能上得到持续的优化和增强。
recommend-type

跨平台内容提取无忧:coze工作流应对社交媒体挑战

# 1. 社交媒体内容提取的必要性与挑战 ## 1.1 内容提取的必要性 在数字时代,社交媒体内容是信息与数据流动的主干道,公司、个人和研究者都在寻求从这些信息中提取价值。无论是为了市场趋势分析、品牌监控、消费者洞察还是公共舆论监测,有效的社交媒体内容提取变得至关重要。这不仅有助于企业做出数据驱动的决策,也能使个人更好地理
recommend-type

vrrp主设备发送的免费arp

<think>好的,我现在需要回答用户关于VRRP主设备发送免费ARP的目的和工作原理的问题。首先,我得回忆一下VRRP的基本概念和工作机制。VRRP,虚拟路由冗余协议,主要用于在局域网中提供默认网关的冗余备份,确保网络的高可用性。主设备(Master)负责转发流量,而备用设备(Backup)则在主设备故障时接管工作。 用户的问题集中在主设备发送免费ARP的目的和机制上。根据引用[2],免费ARP用于通知下游设备虚拟MAC地址的变更。当主设备被选举出来后,它需要让局域网内的其他设备知道虚拟IP对应的MAC地址已经指向自己,这样流量才会被正确路由到主设备。免费ARP的作用应该就是更新这些设备的
recommend-type

为Ghost博客平台打造的Meteor流星包装使用指南

从给定文件信息中,我们可以提炼出以下IT知识点: ### 标题知识点:流星Ghost软件包 1. **流星Ghost软件包的用途**:流星Ghost软件包是专为Ghost博客平台设计的流星(Meteor)应用程序。流星是一个开源的全栈JavaScript平台,用于开发高性能和易于编写的Web应用程序。Ghost是一个开源博客平台,它提供了一个简单且专业的写作环境。 2. **软件包的作用**:流星Ghost软件包允许用户在流星平台上轻松集成Ghost博客。这样做的好处是可以利用流星的实时特性以及易于开发和部署的应用程序框架,同时还能享受到Ghost博客系统的便利和美观。 ### 描述知识点:流星Ghost软件包的使用方法 1. **软件包安装方式**:用户可以通过流星的命令行工具添加名为`mrt:ghost`的软件包。`mrt`是流星的一个命令行工具,用于添加、管理以及配置软件包。 2. **初始化Ghost服务器**:描述中提供了如何在服务器启动时运行Ghost的基本代码示例。这段代码使用了JavaScript的Promise异步操作,`ghost().then(function (ghostServer) {...})`这行代码表示当Ghost服务器初始化完成后,会在Promise的回调函数中提供一个Ghost服务器实例。 3. **配置Ghost博客**:在`then`方法中,首先会获取到Ghost服务器的配置对象`config`,用户可以在此处进行自定义设置,例如修改主题、配置等。 4. **启动Ghost服务器**:在配置完成之后,通过调用`ghostServer.start()`来启动Ghost服务,使其能够处理博客相关的请求。 5. **Web浏览器导航**:一旦流星服务器启动并运行,用户便可以通过Web浏览器访问Ghost博客平台。 ### 标签知识点:JavaScript 1. **JavaScript作为流星Ghost软件包的开发语言**:标签指出流星Ghost软件包是使用JavaScript语言开发的。JavaScript是一种在浏览器端广泛使用的脚本语言,它也是流星平台的基础编程语言。 2. **流星和Ghost共同使用的语言**:JavaScript同样也是Ghost博客平台的开发语言。这表明流星Ghost软件包可以无缝集成,因为底层技术栈相同。 ### 压缩包子文件的文件名称列表知识点:meteor-ghost-master 1. **版本控制和软件包结构**:文件名称`meteor-ghost-master`暗示了该软件包可能托管在像GitHub这样的版本控制系统上。文件名中的`master`通常指的是主分支或主版本。 2. **软件包的目录结构**:通过文件名称可以推断出该软件包可能拥有一个标准的流星软件包结构,包含了初始化、配置、运行等必要的模块和文件。 3. **软件包的维护状态**:由于文件名没有包含特定的版本号,我们无法直接得知软件包的最新更新情况。通常,软件包维护者会将最新的版本代码放在`master`分支上。 ### 总结 流星Ghost软件包提供了一个有效的解决方案,使得流星平台的开发者能够在他们的应用中添加Ghost博客功能。软件包的使用简便,通过流星的命令行工具安装,并通过JavaScript代码配置和启动Ghost服务。通过流星Ghost软件包,开发者能够享受流星的实时特性以及Ghost博客系统的便利性。此外,软件包的命名和结构也暗示了其维护和版本控制的模式,有助于开发者更好地理解如何使用和维护这一软件包。
recommend-type

抖音标题生成自动化:用coze工作流释放创意

# 1. 抖音标题生成自动化的重要性 随着社交媒体平台的崛起,内容的吸引力很大程度上取决于标题的创意与精准性。抖音作为一个日活亿级的短视频平台,高质量的标题能够有效提高视频的点击率,增加内容的传播。但是,人工撰写标题不仅耗时耗力,而且很难做到快速响应热点,自动化标题生成工具应运而生。coze工作流,作为一种实现自动化生成抖音标题的工具,其重要性不言而喻。它能够利用大数据分析和机器学习技术,提高标题的吸引