活动介绍
file-type

自动化机器学习框架:模型搜索技术解析

ZIP文件

下载需积分: 10 | 241KB | 更新于2025-03-08 | 166 浏览量 | 1 下载量 举报 收藏
download 立即下载
AutoML和模型搜索框架 在现代机器学习领域,自动化机器学习(AutoML)已经成为一个热门的研究方向。AutoML的目标是自动化整个机器学习流程,包括数据预处理、特征选择、模型选择、模型训练以及模型调优等步骤,以便于非专家用户也能高效地使用机器学习技术。 在众多AutoML技术中,模型搜索(Model Search,MS)框架尤为引人注目。这一框架采用了特定的算法来自动化模型架构的搜索过程,其核心价值在于帮助研究人员和开发人员加快对于复杂模型结构的探索,尤其是针对深度神经网络(DNN)中不同类型层的设计,从而找到性能优异的模型架构。 模型搜索框架的关键特点如下: 1. 开箱即用的AutoML算法支持:该框架提供了多种内置的AutoML算法,能够自动搜索正确的模型体系结构、模型集合以及进行模型精炼优化。 2. 模型比较功能:在搜索过程中找到的多个不同模型可以被比较,以便用户能够挑选出最适合其具体问题的模型。 3. 可定制的搜索空间:用户能够创建自己的搜索空间,自定义神经网络中的层类型,以此来探索和实验不同的神经网络架构。 虽然模型搜索框架在设计上可以适用于回归问题,但目前的版本主要还是集中在分类问题上。这意味着,它能够处理诸如图像分类、文本分类、语音识别等需要从输入数据中识别和划分不同类别的任务。 模型搜索框架的入门操作相对简单,它从一个CSV文件出发,该文件包含了特征数据,且都是数字格式。通过编写简短的代码,AutoML可以自动进行一系列的分析和处理,最终为用户找到一个竞争力强的模型架构。 例如,入门代码如下所示: ```python import model_search ``` 在执行上述代码后,框架会自动读取CSV文件中的数据,然后通过内置的算法搜索和选择最合适的模型结构。 为了充分利用模型搜索框架,需要掌握Python编程语言,因为当前的框架实现主要基于Python。Python作为一种高级编程语言,具有强大的库支持和良好的社区生态,非常适合机器学习和数据科学的开发工作。 在技术文档中,我们可以找到更详细的框架功能说明,包括如何安装和使用框架、各个模块的具体功能以及高级用法等。技术文档是理解和应用模型搜索框架不可或缺的部分。 模型搜索框架的设计和实现体现了机器学习领域中对于自动化和智能化趋势的追求。随着技术的发展,未来可能会有更多的AutoML工具和框架出现,进一步简化机器学习流程,并使得机器学习更加普及和易于使用。

相关推荐

filetype

修改上述错误import os import sys import time import glob import numpy as np import torch import utils import logging import argparse import torch.nn as nn import torch.utils import torch.nn.functional as F import torchvision.datasets as dset import torch.backends.cudnn as cudnn from torch.autograd import Variable from model_search import Network from architect import Architect parser = argparse.ArgumentParser("cifar") parser.add_argument('--data', type=str, default='/data/datasets/cifar-10', help='location of the data corpus') parser.add_argument('--set', type=str, default='cifar10', help='location of the data corpus') parser.add_argument('--batch_size', type=int, default=64, help='batch size') parser.add_argument('--learning_rate', type=float, default=0.025, help='init learning rate') parser.add_argument('--learning_rate_min', type=float, default=0.0, help='min learning rate') parser.add_argument('--momentum', type=float, default=0.9, help='momentum') parser.add_argument('--weight_decay', type=float, default=3e-4, help='weight decay') parser.add_argument('--report_freq', type=float, default=50, help='report frequency') parser.add_argument('--gpu', type=int, default=0, help='gpu device id') parser.add_argument('--epochs', type=int, default=80, help='num of training epochs') parser.add_argument('--init_channels', type=int, default=16, help='num of init channels') parser.add_argument('--layers', type=int, default=8, help='total number of layers') parser.add_argument('--model_path', type=str, default='saved_models', help='path to save the model') parser.add_argument('--cutout', action='store_true', default=False, help='use cutout') parser.add_argument('--cutout_length', type=int, default=16, help='cutout length') parser.add_argument('--drop_path_prob', type=float, default=0.3, help='drop path probability') parser.add_argument('--save', type=str, default='EXP', help='experiment name') parser.add_argument('--seed', type=int, default=2, help='random seed') parser.add_argument('--grad_clip', type=float, default=5, help='gradient clipping') parser.add_argument('--train_portion', type=float, default=0.5, help='portion of training data') parser.add_argument('--unrolled', action='store_true', default=False, help='use one-step unrolled validation loss') parser.add_argument('--arch_learning_rate', type=float, default=6e-4, help='learning rate for arch encoding') parser.add_argument('--arch_weight_decay', type=float, default=1e-3, help='weight decay for arch encoding') args = parser.parse_args() args.save = 'search-{}-{}'.format(args.save, time.strftime("%Y%m%d-%H%M%S")) utils.create_exp_dir(args.save, scripts_to_save=glob.glob('*.py')) log_format = '%(asctime)s %(message)s' logging.basicConfig(stream=sys.stdout, level=logging.INFO, format=log_format, datefmt='%m/%d %I:%M:%S %p') fh = logging.FileHandler(os.path.join(args.save, 'log.txt')) fh.setFormatter(logging.Formatter(log_format)) logging.getLogger().addHandler(fh) CIFAR_CLASSES = 10 if args.set=='cifar100': CIFAR_CLASSES = 100 def main(): if not torch.cuda.is_available(): logging.info('no gpu device available') sys.exit(1) np.random.seed(args.seed) torch.cuda.set_device(args.gpu) cudnn.benchmark = True torch.manual_seed(args.seed) cudnn.enabled=True torch.cuda.manual_seed(args.seed) logging.info('gpu device = %d' % args.gpu) logging.info("args = %s", args) criterion = nn.CrossEntropyLoss() criterion = criterion.cuda() model = Network(args.init_channels, CIFAR_CLASSES, args.layers, criterion) model = model.cuda() 'model = torch.nn.DataParallel(model, device_ids=[0, 1, 2])' logging.info("param size = %fMB", utils.count_parameters_in_MB(model)) optimizer = torch.optim.SGD( model.parameters(), args.learning_rate, momentum=args.momentum, weight_decay=args.weight_decay) arch_optimizer = torch.optim.Adam([model.alphas], lr=args.arch_learning_rate, betas=(0.9, 0.999), weight_decay=args.arch_weight_decay) train_transform, valid_transform = utils._data_transforms_cifar10(args) if args.set=='cifar100': train_data = dset.CIFAR100(root=args.data, train=True, download=True, transform=train_transform) else: train_data = dset.CIFAR10(root=args.data, train=True, download=True, transform=train_transform) num_train = len(train_data) indices = list(range(num_train)) split = int(np.floor(args.train_portion * num_train)) train_queue = torch.utils.data.DataLoader( train_data, batch_size=args.batch_size, sampler=torch.utils.data.sampler.SubsetRandomSampler(indices[:split]), pin_memory=True, num_workers=0) valid_queue = torch.utils.data.DataLoader( train_data, batch_size=args.batch_size, sampler=torch.utils.data.sampler.SubsetRandomSampler(indices[split:num_train]), pin_memory=True, num_workers=0) scheduler = torch.optim.lr_scheduler.CosineAnnealingLR( optimizer, float(args.epochs), eta_min=args.learning_rate_min) architect = Architect(model, args) for epoch in range(args.epochs): scheduler.step() lr = scheduler.get_lr()[0] logging.info('epoch %d lr %e', epoch, lr) alpha = model.alphas train_acc, train_obj, alpha_grad_sum, weight_grad_sum, zz_grad_sum = train( train_queue, valid_queue, model, architect, criterion, optimizer, lr, arch_optimizer) genotype = model.genotype(alpha_grad_sum) logging.info('genotype = %s', genotype) # training logging.info('train_acc %f', train_acc) # validation valid_acc, valid_obj = infer(valid_queue, model, criterion, epoch) logging.info('valid_acc %f', valid_acc) utils.save(model, os.path.join(args.save, 'weights.pt')) """ def train(train_queue, valid_queue, model, architect, criterion, optimizer, lr,epoch): objs = utils.AvgrageMeter() top1 = utils.AvgrageMeter() top5 = utils.AvgrageMeter() for step, (input, target) in enumerate(train_queue): model.train() n = input.size(0) input = Variable(input, requires_grad=False).cuda() target = Variable(target, requires_grad=False).cuda() input_search, target_search = next(iter(valid_queue)) input_search = Variable(input_search, requires_grad=False).cuda() target_search = Variable(target_search, requires_grad=False).cuda() if epoch>=15: architect.step(input, target, input_search, target_search, lr, optimizer, unrolled=args.unrolled) optimizer.zero_grad() logits = model(input) loss = criterion(logits, target) loss.backward() nn.utils.clip_grad_norm(model.parameters(), args.grad_clip) optimizer.step() prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5)) objs.update(loss.item(), n) top1.update(prec1.item(), n) top5.update(prec5.item(), n) if step % args.report_freq == 0: logging.info('train %03d %e %f %f', step, objs.avg, top1.avg, top5.avg) return top1.avg, objs.avg """ def train(train_queue, valid_queue, model, architect, criterion, optimizer, lr, arch_optimizer): objs = utils.AvgrageMeter() top1 = utils.AvgrageMeter() top5 = utils.AvgrageMeter() Loss = utils.AvgrageMeter() arch_grads_sum = torch.zeros_like(model.alphas).cuda() weight_grads_sum = torch.zeros_like(model.weights).cuda() zz_grads_sum = torch.zeros_like(model.c).cuda() for step, (input, target) in enumerate(train_queue): model.train() n = input.size(0) input = input.cuda(non_blocking=True) target = target.cuda(non_blocking=True) if not args.single_level: try: input_search, target_search = next(valid_queue_iter) except: valid_queue_iter = iter(valid_queue) input_search, target_search = next(valid_queue_iter) input_search = input_search.cuda() target_search = target_search.cuda(non_blocking=True) arch_optimizer.zero_grad() logits = model(input_search) loss = criterion(logits, target_search) model.weights.retain_grad() Loss.update(loss.data.item(), n) loss.backward() sum_grad(model, arch_grads_sum, weight_grads_sum, zz_grads_sum) arch_optimizer.step() model.alphas.grad.zero_() model.weights.grad.zero_() optimizer.zero_grad() logits = model(input) loss = criterion(logits, target) model.weights.retain_grad() loss.backward() if args.single_level: sum_grad(model, arch_grads_sum, weight_grads_sum, zz_grads_sum) nn.utils.clip_grad_norm_(model.parameters(), args.grad_clip) optimizer.step() model.alphas.grad.zero_() model.weights.grad.zero_() prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5)) objs.update(loss.data.item(), n) top1.update(prec1.data.item(), n) top5.update(prec5.data.item(), n) if step % args.report_freq == 0: logging.info('train %03d %e %f %f', step, objs.avg, top1.avg, top5.avg) return top1.avg, objs.avg, arch_grads_sum, weight_grads_sum, zz_grads_sum def infer(valid_queue, model, criterion, epoch): objs = utils.AvgrageMeter() top1 = utils.AvgrageMeter() top5 = utils.AvgrageMeter() model.eval() for step, (input, target) in enumerate(valid_queue): input = Variable(input, volatile=True).cuda() target = Variable(target, volatile=True).cuda() logits = model(input) loss = criterion(logits, target) prec1, prec5 = utils.accuracy(logits, target, topk=(1, 5)) n = input.size(0) objs.update(loss.item(), n) top1.update(prec1.item(), n) top5.update(prec5.item(), n) if step % args.report_freq == 0: logging.info('valid %03d %e %f %f', step, objs.avg, top1.avg, top5.avg) return top1.avg, objs.avg def sum_grad(model, arch_grads_sum, weight_grads_sum, zz_grads_sum): arch_grads_sum += torch.abs(model.alphas.grad) weight_grads_sum += torch.abs(model.weights.grad) zz_grads_sum += torch.abs(model.weights.grad - torch.sum(model.c.grad, dim=-1, keepdim=True)) if __name__ == '__main__': main()

filetype

``` @torch.no_grad() 2226 def generate( 2227 self, 2228 inputs: Optional[torch.Tensor] = None, 2229 generation_config: Optional[GenerationConfig] = None, 2230 logits_processor: Optional[LogitsProcessorList] = None, 2231 stopping_criteria: Optional[StoppingCriteriaList] = None, 2232 prefix_allowed_tokens_fn: Optional[Callable[[int, torch.Tensor], List[int]]] = None, 2233 synced_gpus: Optional[bool] = None, 2234 assistant_model: Optional["PreTrainedModel"] = None, 2235 streamer: Optional["BaseStreamer"] = None, 2236 negative_prompt_ids: Optional[torch.Tensor] = None, 2237 negative_prompt_attention_mask: Optional[torch.Tensor] = None, 2238 use_model_defaults: Optional[bool] = None, 2239 custom_generate: Optional[str] = None, 2240 **kwargs, 2241 ) -> Union[GenerateOutput, torch.LongTensor]: 2242 r""" 2243 2244 Generates sequences of token ids for models with a language modeling head. 2245 2246 <Tip warning={true}> 2247 2248 Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the 2249 model's default generation configuration. You can override any `generation_config` by passing the corresponding 2250 parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. 2251 2252 For an overview of generation strategies and code examples, check out the [following 2253 guide](../generation_strategies). 2254 2255 </Tip> 2256 2257 Parameters: 2258 inputs (`torch.Tensor` of varying shape depending on the modality, *optional*): 2259 The sequence used as a prompt for the generation or as model inputs to the encoder. If `None` the 2260 method initializes it with `bos_token_id` and a batch size of 1. For decoder-only models `inputs` 2261 should be in the format of `input_ids`. For encoder-decoder models *inputs* can represent any of 2262 `input_ids`, `input_values`, `input_features`, or `pixel_values`. 2263 generation_config ([`~generation.GenerationConfig`], *optional*): 2264 The generation configuration to be used as base parametrization for the generation call. `**kwargs` 2265 passed to generate matching the attributes of `generation_config` will override them. If 2266 `generation_config` is not provided, the default will be used, which has the following loading 2267 priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model 2268 configuration. Please note that unspecified parameters will inherit [`~generation.GenerationConfig`]'s 2269 default values, whose documentation should be checked to parameterize generation. 2270 logits_processor (`LogitsProcessorList`, *optional*): 2271 Custom logits processors that complement the default logits processors built from arguments and 2272 generation config. If a logit processor is passed that is already created with the arguments or a 2273 generation config an error is thrown. This feature is intended for advanced users. 2274 stopping_criteria (`StoppingCriteriaList`, *optional*): 2275 Custom stopping criteria that complements the default stopping criteria built from arguments and a 2276 generation config. If a stopping criteria is passed that is already created with the arguments or a 2277 generation config an error is thrown. If your stopping criteria depends on the `scores` input, make 2278 sure you pass `return_dict_in_generate=True, output_scores=True` to `generate`. This feature is 2279 intended for advanced users. 2280 prefix_allowed_tokens_fn (`Callable[[int, torch.Tensor], List[int]]`, *optional*): 2281 If provided, this function constraints the beam search to allowed tokens only at each step. If not 2282 provided no constraint is applied. This function takes 2 arguments: the batch ID `batch_id` and 2283 `input_ids`. It has to return a list with the allowed tokens for the next generation step conditioned 2284 on the batch ID `batch_id` and the previously generated tokens `inputs_ids`. This argument is useful 2285 for constrained generation conditioned on the prefix, as described in [Autoregressive Entity 2286 Retrieval](https://arxiv.org/abs/2010.00904). 2287 synced_gpus (`bool`, *optional*): 2288 Whether to continue running the while loop until max_length. Unless overridden, this flag will be set 2289 to `True` if using `FullyShardedDataParallel` or DeepSpeed ZeRO Stage 3 with multiple GPUs to avoid 2290 deadlocking if one GPU finishes generating before other GPUs. Otherwise, defaults to `False`. 2291 assistant_model (`PreTrainedModel`, *optional*): 2292 An assistant model that can be used to accelerate generation. The assistant model must have the exact 2293 same tokenizer. The acceleration is achieved when forecasting candidate tokens with the assistant model 2294 is much faster than running generation with the model you're calling generate from. As such, the 2295 assistant model should be much smaller. 2296 streamer (`BaseStreamer`, *optional*): 2297 Streamer object that will be used to stream the generated sequences. Generated tokens are passed 2298 through `streamer.put(token_ids)` and the streamer is responsible for any further processing. 2299 negative_prompt_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): 2300 The negative prompt needed for some processors such as CFG. The batch size must match the input batch 2301 size. This is an experimental feature, subject to breaking API changes in future versions. 2302 negative_prompt_attention_mask (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): 2303 Attention_mask for `negative_prompt_ids`. 2304 use_model_defaults (`bool`, *optional*): 2305 When it is `True`, unset parameters in `generation_config` will be set to the model-specific default 2306 generation configuration (`model.generation_config`), as opposed to the global defaults 2307 (`GenerationConfig()`). If unset, models saved starting from `v4.50` will consider this flag to be 2308 `True`. 2309 custom_generate (`str`, *optional*): 2310 A string containing the name of a huggingface.co repository. If provided, the custom `generate` 2311 function defined in that reposity's `custom_generate/generate.py` file will be executed instead of the 2312 standard `generate` method. Note that the logic is for generation is entirely defined in that 2313 repository, and the return type may be different from the standard `generate` method. 2314 kwargs (`Dict[str, Any]`, *optional*): 2315 Ad hoc parametrization of `generation_config` and/or additional model-specific kwargs that will be 2316 forwarded to the `forward` function of the model. If the model is an encoder-decoder model, encoder 2317 specific kwargs should not be prefixed and decoder specific kwargs should be prefixed with *decoder_*. 2318 2319 Return: 2320 [`~utils.ModelOutput`] or `torch.LongTensor`: A [`~utils.ModelOutput`] (if `return_dict_in_generate=True` 2321 or when `config.return_dict_in_generate=True`) or a `torch.LongTensor`. 2322 2323 If the model is *not* an encoder-decoder model (`model.config.is_encoder_decoder=False`), the possible 2324 [`~utils.ModelOutput`] types are: 2325 2326 - [`~generation.GenerateDecoderOnlyOutput`], 2327 - [`~generation.GenerateBeamDecoderOnlyOutput`] 2328 2329 If the model is an encoder-decoder model (`model.config.is_encoder_decoder=True`), the possible 2330 [`~utils.ModelOutput`] types are: 2331 2332 - [`~generation.GenerateEncoderDecoderOutput`], 2333 - [`~generation.GenerateBeamEncoderDecoderOutput`] 2334 """ 2335 # 0. If requested, load an arbitrary generation recipe from the Hub and run it instead 2336 if custom_generate is not None: 2337 trust_remote_code = kwargs.pop("trust_remote_code", None) 2338 # Get all `generate` arguments in a single variable. Custom functions are responsible for handling them: 2339 # they receive the same inputs as `generate`, only with `model` instead of `self`. They can access to 2340 # methods from `GenerationMixin` through `model`. 2341 global_keys_to_exclude = {"self", "kwargs"} 2342 generate_arguments = {key: value for key, value in locals().items() if key not in global_keys_to_exclude} 2343 generate_arguments.update(kwargs) 2344 2345 custom_generate_function = self.load_custom_generate( 2346 custom_generate, trust_remote_code=trust_remote_code, **kwargs 2347 ) 2348 return custom_generate_function(model=self, **generate_arguments) 2349 2350 # 1. Handle `generation_config` and kwargs that might update it, and validate the `.generate()` call 2351 tokenizer = kwargs.pop("tokenizer", None) # Pull this out first, we only use it for stopping criteria 2352 assistant_tokenizer = kwargs.pop("assistant_tokenizer", None) # only used for assisted generation 2353 2354 generation_config, model_kwargs = self._prepare_generation_config( 2355 generation_config, use_model_defaults, **kwargs 2356 ) 2357 self._validate_model_kwargs(model_kwargs.copy()) 2358 self._validate_assistant(assistant_model, tokenizer, assistant_tokenizer) 2359 2360 # 2. Set generation parameters if not already defined 2361 if synced_gpus is None: 2362 synced_gpus = (is_deepspeed_zero3_enabled() or is_fsdp_managed_module(self)) and dist.get_world_size() > 1 2363 2364 logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList() 2365 stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList() 2366 2367 accepts_attention_mask = "attention_mask" in set(inspect.signature(self.forward).parameters.keys()) 2368 requires_attention_mask = "encoder_outputs" not in model_kwargs 2369 kwargs_has_attention_mask = model_kwargs.get("attention_mask", None) is not None 2370 2371 # 3. Define model inputs 2372 inputs_tensor, model_input_name, model_kwargs = self._prepare_model_inputs( 2373 inputs, generation_config.bos_token_id, model_kwargs 2374 ) 2375 batch_size = inputs_tensor.shape[0] 2376 2377 device = inputs_tensor.device 2378 self._prepare_special_tokens(generation_config, kwargs_has_attention_mask, device=device) 2379 2380 # decoder-only models must use left-padding for batched generation. 2381 if not self.config.is_encoder_decoder: 2382 # If `input_ids` was given, check if the last id in any sequence is `pad_token_id` 2383 # Note: If using, `inputs_embeds` this check does not work, because we want to be more hands-off. 2384 if ( 2385 generation_config._pad_token_tensor is not None 2386 and batch_size > 1 2387 and len(inputs_tensor.shape) == 2 2388 and torch.sum(inputs_tensor[:, -1] == generation_config._pad_token_tensor) > 0 2389 ): 2390 logger.warning( 2391 "A decoder-only architecture is being used, but right-padding was detected! For correct " 2392 "generation results, please set `padding_side='left'` when initializing the tokenizer." 2393 ) 2394 2395 # 4. Define other model kwargs 2396 # decoder-only models with inputs_embeds forwarding must use caching (otherwise we can't detect whether we are 2397 # generating the first new token or not, and we only want to use the embeddings for the first new token) 2398 if not self.config.is_encoder_decoder and model_input_name == "inputs_embeds": 2399 generation_config.use_cache = True 2400 2401 if not kwargs_has_attention_mask and requires_attention_mask and accepts_attention_mask: 2402 model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation( 2403 inputs_tensor, generation_config, model_kwargs 2404 ) 2405 elif kwargs_has_attention_mask: 2406 # TODO (joao): generalize this check with other types of inputs 2407 if model_input_name == "input_ids" and len(model_kwargs["attention_mask"].shape) > 2: 2408 raise ValueError("`attention_mask` passed to `generate` must be 2D.") 2409 2410 if self.config.is_encoder_decoder and "encoder_outputs" not in model_kwargs: 2411 # if model is encoder decoder encoder_outputs are created and added to `model_kwargs` 2412 model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation( 2413 inputs_tensor, model_kwargs, model_input_name, generation_config 2414 ) 2415 2416 # 5. Prepare `input_ids` which will be used for auto-regressive generation 2417 if self.config.is_encoder_decoder: 2418 input_ids, model_kwargs = self._prepare_decoder_input_ids_for_generation( 2419 batch_size=batch_size, 2420 model_input_name=model_input_name, 2421 model_kwargs=model_kwargs, 2422 decoder_start_token_id=generation_config._decoder_start_token_tensor, 2423 device=inputs_tensor.device, 2424 ) 2425 else: 2426 input_ids = inputs_tensor if model_input_name == "input_ids" else model_kwargs.pop("input_ids") 2427 2428 if generation_config.token_healing: 2429 input_ids = self.heal_tokens(input_ids, tokenizer) 2430 2431 if streamer is not None: 2432 streamer.put(input_ids.cpu()) 2433 2434 # 6. Prepare `max_length` depending on other stopping criteria. 2435 input_ids_length = input_ids.shape[1] 2436 has_default_max_length = kwargs.get("max_length") is None and generation_config.max_length is not None 2437 has_default_min_length = kwargs.get("min_length") is None and generation_config.min_length is not None 2438 generation_config = self._prepare_generated_length( 2439 generation_config=generation_config, 2440 has_default_max_length=has_default_max_length, 2441 has_default_min_length=has_default_min_length, 2442 model_input_name=model_input_name, 2443 inputs_tensor=inputs_tensor, 2444 input_ids_length=input_ids_length, 2445 ) 2446 2447 # If the model supports `logits_to_keep` in forward(), set it to 1 to avoid computing the whole 2448 # logit matrix. This can save a lot of memory during the first forward pass. Note that assisted decoding 2449 # dynamically overrides this value as it can need more than the last token logits 2450 if self._supports_logits_to_keep() and "logits_to_keep" not in model_kwargs: 2451 model_kwargs["logits_to_keep"] = 1 2452 2453 self._validate_generated_length(generation_config, input_ids_length, has_default_max_length) 2454 2455 # 7. Prepare the cache. 2456 # - `model_kwargs` may be updated in place with a cache as defined by the parameters in `generation_config`. 2457 # - different models have a different cache name expected by the model (default = "past_key_values") 2458 # - `max_length`, prepared above, is used to determine the maximum cache length 2459 max_cache_length = generation_config.max_length - 1 2460 if ( 2461 inputs_tensor.shape[1] != input_ids_length 2462 and model_input_name == "inputs_embeds" 2463 and not self.config.is_encoder_decoder 2464 ): 2465 max_cache_length += inputs_tensor.shape[1] 2466 self._prepare_cache_for_generation( 2467 generation_config, model_kwargs, assistant_model, batch_size, max_cache_length, device 2468 ) 2469 2470 # 8. determine generation mode 2471 generation_mode = generation_config.get_generation_mode(assistant_model) 2472 2473 if streamer is not None and (generation_config.num_beams > 1): 2474 raise ValueError( 2475 "`streamer` cannot be used with beam search (yet!). Make sure that `num_beams` is set to 1." 2476 ) 2477 2478 if self.device.type != input_ids.device.type: 2479 warnings.warn( 2480 "You are calling .generate() with the `input_ids` being on a device type different" 2481 f" than your model's device. `input_ids` is on {input_ids.device.type}, whereas the model" 2482 f" is on {self.device.type}. You may experience unexpected behaviors or slower generation." 2483 " Please make sure that you have put `input_ids` to the" 2484 f" correct device by calling for example input_ids = input_ids.to('{self.device.type}') before" 2485 " running `.generate()`.", 2486 UserWarning, 2487 ) 2488 2489 # 9. prepare logits processors and stopping criteria 2490 prepared_logits_processor = self._get_logits_processor( 2491 generation_config=generation_config, 2492 input_ids_seq_length=input_ids_length, 2493 encoder_input_ids=inputs_tensor, 2494 prefix_allowed_tokens_fn=prefix_allowed_tokens_fn, 2495 logits_processor=logits_processor, 2496 device=inputs_tensor.device, 2497 model_kwargs=model_kwargs, 2498 negative_prompt_ids=negative_prompt_ids, 2499 negative_prompt_attention_mask=negative_prompt_attention_mask, 2500 ) 2501 prepared_stopping_criteria = self._get_stopping_criteria( 2502 generation_config=generation_config, stopping_criteria=stopping_criteria, tokenizer=tokenizer, **kwargs 2503 ) 2504 2505 # Set model_kwargs `use_cache` so we can use it later in forward runs 2506 model_kwargs["use_cache"] = generation_config.use_cache 2507 2508 # 10. go into different generation modes 2509 if generation_mode == GenerationMode.ASSISTED_GENERATION: 2510 if generation_config.num_return_sequences > 1: 2511 raise ValueError( 2512 "num_return_sequences has to be 1 when doing assisted generate, " 2513 f"but is {generation_config.num_return_sequences}." 2514 ) 2515 if batch_size > 1: 2516 raise ValueError("assisted generate is only supported for batch_size = 1") 2517 if not model_kwargs["use_cache"]: 2518 raise ValueError("assisted generate requires `use_cache=True`") 2519 if generation_config.cache_implementation in ["static", "hybrid", "sliding_window"]: 2520 raise ValueError("assisted generate is not supported with Static cache classes`") 2521 if self._is_stateful: 2522 # In assisted generation we need the ability to confirm whether the model would pick certain tokens, 2523 # which is not possible with stateful models (they can't reset to a previous subset of generated text) 2524 raise ValueError( 2525 f"assisted generation is not supported with stateful models, such as {self.__class__.__name__}" 2526 ) 2527 2528 # 11. Get the candidate generator, given the parameterization 2529 candidate_generator = self._get_candidate_generator( 2530 generation_config=generation_config, 2531 input_ids=input_ids, 2532 inputs_tensor=inputs_tensor, 2533 assistant_model=assistant_model, 2534 logits_processor=logits_processor, 2535 target_tokenizer=tokenizer, 2536 assistant_tokenizer=assistant_tokenizer, 2537 model_kwargs=model_kwargs, 2538 ) 2539 2540 # 12. run assisted generate 2541 result = self._assisted_decoding( 2542 input_ids, 2543 candidate_generator=candidate_generator, 2544 logits_processor=prepared_logits_processor, 2545 stopping_criteria=prepared_stopping_criteria, 2546 generation_config=generation_config, 2547 synced_gpus=synced_gpus, 2548 streamer=streamer, 2549 **model_kwargs, 2550 ) 2551 elif generation_mode == GenerationMode.DOLA_GENERATION: 2552 if self._is_stateful: 2553 # DoLa decoding was not designed for stateful models, and would require some changes 2554 raise ValueError( 2555 f"dola decoding is not supported with stateful models, such as {self.__class__.__name__}" 2556 ) 2557 result = self._dola_decoding( 2558 input_ids, 2559 dola_layers=generation_config.dola_layers, 2560 logits_processor=prepared_logits_processor, 2561 stopping_criteria=prepared_stopping_criteria, 2562 generation_config=generation_config, 2563 synced_gpus=synced_gpus, 2564 streamer=streamer, 2565 **model_kwargs, 2566 ) 2567 2568 elif generation_mode == GenerationMode.CONTRASTIVE_SEARCH: 2569 if not model_kwargs["use_cache"]: 2570 raise ValueError("Contrastive search requires `use_cache=True`") 2571 if self._is_stateful: 2572 # Just like assisted generation, we need to be able to rollback to a previous state (see comment above) 2573 raise ValueError( 2574 f"contrastive search is not supported with stateful models, such as {self.__class__.__name__}" 2575 ) 2576 2577 result = self._contrastive_search( 2578 input_ids, 2579 logits_processor=prepared_logits_processor, 2580 stopping_criteria=prepared_stopping_criteria, 2581 generation_config=generation_config, 2582 synced_gpus=synced_gpus, 2583 streamer=streamer, 2584 **model_kwargs, 2585 ) 2586 2587 elif generation_mode in (GenerationMode.SAMPLE, GenerationMode.GREEDY_SEARCH): 2588 # 11. expand input_ids with `num_return_sequences` additional sequences per batch 2589 input_ids, model_kwargs = self._expand_inputs_for_generation( 2590 input_ids=input_ids, 2591 expand_size=generation_config.num_return_sequences, 2592 is_encoder_decoder=self.config.is_encoder_decoder, 2593 **model_kwargs, 2594 ) 2595 2596 # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`) 2597 result = self._sample( 2598 input_ids, 2599 logits_processor=prepared_logits_processor, 2600 stopping_criteria=prepared_stopping_criteria, 2601 generation_config=generation_config, 2602 synced_gpus=synced_gpus, 2603 streamer=streamer, 2604 **model_kwargs, 2605 ) 2606 2607 elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH): 2608 # 11. interleave input_ids with `num_beams` additional sequences per batch 2609 input_ids, model_kwargs = self._expand_inputs_for_generation( 2610 input_ids=input_ids, 2611 expand_size=generation_config.num_beams, 2612 is_encoder_decoder=self.config.is_encoder_decoder, 2613 **model_kwargs, 2614 ) 2615 # 12. run beam sample 2616 result = self._beam_search( 2617 input_ids, 2618 logits_processor=prepared_logits_processor, 2619 stopping_criteria=prepared_stopping_criteria, 2620 generation_config=generation_config, 2621 synced_gpus=synced_gpus, 2622 **model_kwargs, 2623 ) 2624 2625 elif generation_mode == GenerationMode.GROUP_BEAM_SEARCH: 2626 # 11. prepare beam search scorer 2627 beam_scorer = BeamSearchScorer( 2628 batch_size=batch_size, 2629 num_beams=generation_config.num_beams, 2630 device=inputs_tensor.device, 2631 length_penalty=generation_config.length_penalty, 2632 do_early_stopping=generation_config.early_stopping, 2633 num_beam_hyps_to_keep=generation_config.num_return_sequences, 2634 num_beam_groups=generation_config.num_beam_groups, 2635 max_length=generation_config.max_length, 2636 ) 2637 # 12. interleave input_ids with `num_beams` additional sequences per batch 2638 input_ids, model_kwargs = self._expand_inputs_for_generation( 2639 input_ids=input_ids, 2640 expand_size=generation_config.num_beams, 2641 is_encoder_decoder=self.config.is_encoder_decoder, 2642 **model_kwargs, 2643 ) 2644 # 13. run beam search 2645 result = self._group_beam_search( 2646 input_ids, 2647 beam_scorer, 2648 logits_processor=prepared_logits_processor, 2649 stopping_criteria=prepared_stopping_criteria, 2650 generation_config=generation_config, 2651 synced_gpus=synced_gpus, 2652 **model_kwargs, 2653 ) 2654 2655 elif generation_mode == GenerationMode.CONSTRAINED_BEAM_SEARCH: 2656 final_constraints = [] 2657 if generation_config.constraints is not None: 2658 final_constraints = generation_config.constraints 2659 2660 if generation_config.force_words_ids is not None: 2661 2662 def typeerror(): 2663 raise ValueError( 2664 "`force_words_ids` has to either be a `List[List[List[int]]]` or `List[List[int]]` " 2665 f"of positive integers, but is {generation_config.force_words_ids}." 2666 ) 2667 2668 if ( 2669 not isinstance(generation_config.force_words_ids, list) 2670 or len(generation_config.force_words_ids) == 0 2671 ): 2672 typeerror() 2673 2674 for word_ids in generation_config.force_words_ids: 2675 if isinstance(word_ids[0], list): 2676 if not isinstance(word_ids, list) or len(word_ids) == 0: 2677 typeerror() 2678 if any(not isinstance(token_ids, list) for token_ids in word_ids): 2679 typeerror() 2680 if any( 2681 any((not isinstance(token_id, int) or token_id < 0) for token_id in token_ids) 2682 for token_ids in word_ids 2683 ): 2684 typeerror() 2685 2686 constraint = DisjunctiveConstraint(word_ids) 2687 else: 2688 if not isinstance(word_ids, list) or len(word_ids) == 0: 2689 typeerror() 2690 if any((not isinstance(token_id, int) or token_id < 0) for token_id in word_ids): 2691 typeerror() 2692 2693 constraint = PhrasalConstraint(word_ids) 2694 final_constraints.append(constraint) 2695 2696 # 11. prepare beam search scorer 2697 constrained_beam_scorer = ConstrainedBeamSearchScorer( 2698 constraints=final_constraints, 2699 batch_size=batch_size, 2700 num_beams=generation_config.num_beams, 2701 device=inputs_tensor.device, 2702 length_penalty=generation_config.length_penalty, 2703 do_early_stopping=generation_config.early_stopping, 2704 num_beam_hyps_to_keep=generation_config.num_return_sequences, 2705 max_length=generation_config.max_length, 2706 ) 2707 # 12. interleave input_ids with `num_beams` additional sequences per batch 2708 input_ids, model_kwargs = self._expand_inputs_for_generation( 2709 input_ids=input_ids, 2710 expand_size=generation_config.num_beams, 2711 is_encoder_decoder=self.config.is_encoder_decoder, 2712 **model_kwargs, 2713 ) 2714 # 13. run beam search 2715 result = self._constrained_beam_search( 2716 input_ids, 2717 constrained_beam_scorer=constrained_beam_scorer, 2718 logits_processor=prepared_logits_processor, 2719 stopping_criteria=prepared_stopping_criteria, 2720 generation_config=generation_config, 2721 synced_gpus=synced_gpus, 2722 **model_kwargs, 2723 ) 2724 2725 # Convert to legacy cache format if requested 2726 if ( 2727 generation_config.return_legacy_cache is True 2728 and hasattr(result, "past_key_values") 2729 and getattr(result.past_key_values, "to_legacy_cache") is not None 2730 ): 2731 result.past_key_values = result.past_key_values.to_legacy_cache() 2732 return result ``` 请对这个函数进行分析,代码逻辑以及功能

filetype

Traceback (most recent call last): File "c:\VSCODE\尝试.py", line 115, in <module> best_model = train_model(X_train, X_test, y_train, y_test) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\VSCODE\尝试.py", line 76, in train_model random_search.fit( File "C:\Users\aze\anaconda3\Lib\site-packages\sklearn\base.py", line 1389, in wrapper return fit_method(estimator, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\aze\anaconda3\Lib\site-packages\sklearn\model_selection\_search.py", line 1024, in fit self._run_search(evaluate_candidates) File "C:\Users\aze\anaconda3\Lib\site-packages\sklearn\model_selection\_search.py", line 1951, in _run_search evaluate_candidates( File "C:\Users\aze\anaconda3\Lib\site-packages\sklearn\model_selection\_search.py", line 1001, in evaluate_candidates _warn_or_raise_about_fit_failures(out, self.error_score) File "C:\Users\aze\anaconda3\Lib\site-packages\sklearn\model_selection\_validation.py", line 517, in _warn_or_raise_about_fit_failures raise ValueError(all_fits_failed_message) ValueError: All the 150 fits failed. It is very likely that your model is misconfigured. You can try to debug the error by setting error_score='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 150 fits failed with the following error: Traceback (most recent call last): File "C:\Users\aze\anaconda3\Lib\site-packages\sklearn\model_selection\_validation.py", line 866, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\aze\anaconda3\Lib\site-packages\xgboost\core.py", line 726, in inner_f return func(**kwargs) ^^^^^^^^^^^^^^ TypeError: XGBModel.fit() got an unexpected keyword argument 'early_stopping_rounds'

filetype

import os from langchain_huggingface import HuggingFaceEmbeddings from langchain_community.vectorstores import FAISS # ===== 配置参数 ===== cache_dir = "./local_embeddings_cache" # 缓存目录 model_name = "sentence-transformers/all-MiniLM-L6-v2" # 原始模型名称 # ===== 自动构建本地模型路径 ===== # HuggingFace 缓存时会把 '/' 替换为 '--' hf_style_model_folder = model_name.replace("/", "--") # 加上 models-- 前缀 model_path = os.path.join(cache_dir, f"models--{hf_style_model_folder}") # 检查模型是否存在 if not os.path.exists(model_path): print(f"❌ 错误:模型路径不存在: {model_path}") print("请确保模型已成功缓存。") exit(1) else: print(f"✅ 使用本地 Embedding 模型路径: {os.path.abspath(model_path)}") # ===== 加载 Embedding 模型 ===== embeddings = HuggingFaceEmbeddings(model_name=model_path) # ===== 加载 FAISS 向量库 ===== try: vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True) print("✅ 成功加载 FAISS 向量数据库") except Exception as e: print(f"❌ 加载 FAISS 数据库失败: {e}") exit(1) # ===== 用户输入查询 ===== query = input("🔍 请输入你要搜索的内容: ") # ===== 执行相似性搜索 ===== try: docs = vectorstore.similarity_search(query, k=3) print("\n🔎 最相关的文档如下:") for i, doc in enumerate(docs): title = doc.metadata.get("title", "无标题") content_snippet = doc.page_content[:200] # 只显示前200字符 print(f"\n{i+1}. 标题: {title}") print(f" 内容片段: {content_snippet}...") except Exception as e: print(f"❌ 搜索过程中发生错误: {e}") 改吧

filetype

“http_request_duration_highr_seconds_bucket{le="0.01"} : "4847.0" http_request_duration_highr_seconds_bucket{le="0.1"} : "5045.0" http_request_duration_highr_seconds_bucket{le="0.05"} : "4859.0" http_request_duration_highr_seconds_bucket{le="0.5"} : "5931.0" http_request_duration_highr_seconds_bucket{le="0.025"} : "4853.0" http_request_duration_highr_seconds_bucket{le="0.25"} : "5278.0" http_request_duration_highr_seconds_bucket{le="0.075"} : "4866.0" http_request_duration_highr_seconds_bucket{le="0.75"} : "6660.0" http_request_duration_highr_seconds_bucket{le="1.0"} : "7405.0" http_request_duration_highr_seconds_bucket{le="1.5"} : "8334.0" http_request_duration_highr_seconds_bucket{le="2.0"} : "9196.0" http_request_duration_highr_seconds_bucket{le="2.5"} : "10073.0" http_request_duration_highr_seconds_bucket{le="3.0"} : "10949.0" http_request_duration_highr_seconds_bucket{le="3.5"} : "11679.0" http_request_duration_highr_seconds_bucket{le="4.0"} : "12333.0" http_request_duration_highr_seconds_bucket{le="4.5"} : "12829.0" http_request_duration_highr_seconds_bucket{le="5.0"} : "13181.0" http_request_duration_highr_seconds_bucket{le="7.5"} : "14314.0" http_request_duration_highr_seconds_bucket{le="10.0"} : "15519.0" http_request_duration_highr_seconds_bucket{le="30.0"} : "25617.0" http_request_duration_highr_seconds_bucket{le="60.0"} : "26110.0" http_request_duration_highr_seconds_bucket{le="+Inf"} : "26346.0" http_request_duration_highr_seconds_count : "26346.0" http_request_duration_highr_seconds_created : "1.7512560890372858e+09" http_request_duration_highr_seconds_sum : "248594.49104921706" http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="0.1",method="POST"} : "226.0" http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="0.5",method="POST"} : "1112.0" http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="1.0",method="POST"} : "2586.0" http_request_duration_seconds_bucket{handler="/v1/chat/completions",le="+Inf",method="POST"} : "21527.0" http_request_duration_seconds_bucket{handler="/v1/models",le="0.1",method="GET"} : "1.0" http_request_duration_seconds_bucket{handler="/v1/models",le="0.5",method="GET"} : "1.0" http_request_duration_seconds_bucket{handler="/v1/models",le="1.0",method="GET"} : "1.0" http_request_duration_seconds_bucket{handler="/v1/models",le="+Inf",method="GET"} : "1.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="GET"} : "4693.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="HEAD"} : "6.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="OPTIONS"} : "12.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="POST"} : "95.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="PROPFIND"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="PUT"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="SEARCH"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="0.1",method="TRACE"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="GET"} : "4693.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="HEAD"} : "6.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="OPTIONS"} : "12.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="POST"} : "95.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="PROPFIND"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="PUT"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="SEARCH"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="0.5",method="TRACE"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="GET"} : "4693.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="HEAD"} : "6.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="OPTIONS"} : "12.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="POST"} : "95.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="PROPFIND"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="PUT"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="SEARCH"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="1.0",method="TRACE"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="GET"} : "4693.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="HEAD"} : "6.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="OPTIONS"} : "12.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="POST"} : "95.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="PROPFIND"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="PUT"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="SEARCH"} : "3.0" http_request_duration_seconds_bucket{handler="none",le="+Inf",method="TRACE"} : "3.0" http_request_duration_seconds_count{handler="/v1/chat/completions",method="POST"} : "21527.0" http_request_duration_seconds_count{handler="/v1/models",method="GET"} : "1.0" http_request_duration_seconds_count{handler="none",method="GET"} : "4693.0" http_request_duration_seconds_count{handler="none",method="HEAD"} : "6.0" http_request_duration_seconds_count{handler="none",method="OPTIONS"} : "12.0" http_request_duration_seconds_count{handler="none",method="POST"} : "95.0" http_request_duration_seconds_count{handler="none",method="PROPFIND"} : "3.0" http_request_duration_seconds_count{handler="none",method="PUT"} : "3.0" http_request_duration_seconds_count{handler="none",method="SEARCH"} : "3.0" http_request_duration_seconds_count{handler="none",method="TRACE"} : "3.0" http_request_duration_seconds_created{handler="/v1/chat/completions",method="POST"} : "1.7512560967123778e+09" http_request_duration_seconds_created{handler="/v1/models",method="GET"} : "1.7536925794242406e+09" http_request_duration_seconds_created{handler="none",method="GET"} : "1.7516341020108707e+09" http_request_duration_seconds_created{handler="none",method="HEAD"} : "1.751634176119915e+09" http_request_duration_seconds_created{handler="none",method="OPTIONS"} : "1.7516341579990425e+09" http_request_duration_seconds_created{handler="none",method="POST"} : "1.7516341771295128e+09" http_request_duration_seconds_created{handler="none",method="PROPFIND"} : "1.7516341696153226e+09" http_request_duration_seconds_created{handler="none",method="PUT"} : "1.7516349058165367e+09" http_request_duration_seconds_created{handler="none",method="SEARCH"} : "1.7516341693599503e+09" http_request_duration_seconds_created{handler="none",method="TRACE"} : "1.751634165566383e+09" http_request_duration_seconds_sum{handler="/v1/chat/completions",method="POST"} : "248593.6503553912" http_request_duration_seconds_sum{handler="/v1/models",method="GET"} : "0.0027880221605300903" http_request_duration_seconds_sum{handler="none",method="GET"} : "0.8171688430011272" http_request_duration_seconds_sum{handler="none",method="HEAD"} : "0.0009557865560054779" http_request_duration_seconds_sum{handler="none",method="OPTIONS"} : "0.0028338953852653503" http_request_duration_seconds_sum{handler="none",method="POST"} : "0.014691390097141266" http_request_duration_seconds_sum{handler="none",method="PROPFIND"} : "0.000380123034119606" http_request_duration_seconds_sum{handler="none",method="PUT"} : "0.00042458251118659973" http_request_duration_seconds_sum{handler="none",method="SEARCH"} : "0.0005713216960430145" http_request_duration_seconds_sum{handler="none",method="TRACE"} : "0.0008798614144325256" http_request_size_bytes_count{handler="/v1/chat/completions"} : "21527.0" http_request_size_bytes_count{handler="/v1/models"} : "1.0" http_request_size_bytes_count{handler="none"} : "4818.0" http_request_size_bytes_created{handler="/v1/chat/completions"} : "1.7512560967123284e+09" http_request_size_bytes_created{handler="/v1/models"} : "1.753692579424021e+09" http_request_size_bytes_created{handler="none"} : "1.7516341020104244e+09" http_request_size_bytes_sum{handler="/v1/chat/completions"} : "802493.0" http_request_size_bytes_sum{handler="/v1/models"} : "0.0" http_request_size_bytes_sum{handler="none"} : "32625.0" http_requests_created{handler="/v1/chat/completions",method="POST",status="2xx"} : "1.7512560967123055e+09" http_requests_created{handler="/v1/chat/completions",method="POST",status="4xx"} : "1.7514186825033803e+09" http_requests_created{handler="/v1/models",method="GET",status="2xx"} : "1.753692579423783e+09" http_requests_created{handler="none",method="GET",status="4xx"} : "1.7516341020101185e+09" http_requests_created{handler="none",method="HEAD",status="4xx"} : "1.7516341761198838e+09" http_requests_created{handler="none",method="OPTIONS",status="4xx"} : "1.7516341579990091e+09" http_requests_created{handler="none",method="POST",status="4xx"} : "1.7516341771294773e+09" http_requests_created{handler="none",method="PROPFIND",status="4xx"} : "1.7516341696152897e+09" http_requests_created{handler="none",method="PUT",status="4xx"} : "1.7516349058164842e+09" http_requests_created{handler="none",method="SEARCH",status="4xx"} : "1.7516341693599005e+09" http_requests_created{handler="none",method="TRACE",status="4xx"} : "1.7516341655663416e+09" http_requests_total{handler="/v1/chat/completions",method="POST",status="2xx"} : "21474.0" http_requests_total{handler="/v1/chat/completions",method="POST",status="4xx"} : "53.0" http_requests_total{handler="/v1/models",method="GET",status="2xx"} : "1.0" http_requests_total{handler="none",method="GET",status="4xx"} : "4693.0" http_requests_total{handler="none",method="HEAD",status="4xx"} : "6.0" http_requests_total{handler="none",method="OPTIONS",status="4xx"} : "12.0" http_requests_total{handler="none",method="POST",status="4xx"} : "95.0" http_requests_total{handler="none",method="PROPFIND",status="4xx"} : "3.0" http_requests_total{handler="none",method="PUT",status="4xx"} : "3.0" http_requests_total{handler="none",method="SEARCH",status="4xx"} : "3.0" http_requests_total{handler="none",method="TRACE",status="4xx"} : "3.0" http_response_size_bytes_count{handler="/v1/chat/completions"} : "21527.0" http_response_size_bytes_count{handler="/v1/models"} : "1.0" http_response_size_bytes_count{handler="none"} : "4818.0" http_response_size_bytes_created{handler="/v1/chat/completions"} : "1.7512560967123535e+09" http_response_size_bytes_created{handler="/v1/models"} : "1.7536925794240377e+09" http_response_size_bytes_created{handler="none"} : "1.751634102010456e+09" http_response_size_bytes_sum{handler="/v1/chat/completions"} : "3.539877e+06" http_response_size_bytes_sum{handler="/v1/models"} : "538.0" http_response_size_bytes_sum{handler="none"} : "105996.0" process_cpu_seconds_total : "2379.04" process_max_fds : "1.073741816e+09" process_open_fds : "48.0" process_resident_memory_bytes : "4.28404736e+08" process_start_time_seconds : "1.75125604907e+09" process_virtual_memory_bytes : "1.2146741248e+010" python_gc_collections_total{generation="0"} : "5120.0" python_gc_collections_total{generation="1"} : "464.0" python_gc_collections_total{generation="2"} : "29.0" python_gc_objects_collected_total{generation="0"} : "7970.0" python_gc_objects_collected_total{generation="1"} : "1332.0" python_gc_objects_collected_total{generation="2"} : "994.0" python_gc_objects_uncollectable_total{generation="0"} : "0.0" python_gc_objects_uncollectable_total{generation="1"} : "0.0" python_gc_objects_uncollectable_total{generation="2"} : "0.0" python_info{implementation="CPython",major="3",minor="12",patchlevel="10",version="3.12.10"} : "1.0" vllm:cache_config_info{block_size="16",cache_dtype="auto",calculate_kv_scales="False",cpu_offload_gb="0",enable_prefix_caching="True",gpu_memory_utilization="0.95",is_attention_free="False",num_gpu_blocks_override="None",prefix_caching_hash_algo="builtin",sliding_window="None",swap_space="4",swap_space_bytes="4294967296"} : "1.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"} : "489.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1085.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1940.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2553.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "3480.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "4342.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5216.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "8314.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "10648.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "15688.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "19682.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20745.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20989.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21156.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21237.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21414.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21459.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21462.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21463.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:e2e_request_latency_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:e2e_request_latency_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:e2e_request_latency_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660305371e+09" vllm:e2e_request_latency_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "247822.51679587364" vllm:generation_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660302482e+09" vllm:generation_tokens_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.107384e+06" vllm:gpu_cache_usage_perc{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.00012135922330092086" vllm:gpu_prefix_cache_hits_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.751256066030216e+09" vllm:gpu_prefix_cache_hits_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "613379.0" vllm:gpu_prefix_cache_queries_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660302012e+09" vllm:gpu_prefix_cache_queries_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.143769e+06" vllm:iteration_tokens_total_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.243007e+06" vllm:iteration_tokens_total_bucket{engine="0",le="8.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.084752e+06" vllm:iteration_tokens_total_bucket{engine="0",le="16.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.140075e+06" vllm:iteration_tokens_total_bucket{engine="0",le="32.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.140732e+06" vllm:iteration_tokens_total_bucket{engine="0",le="64.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.140779e+06" vllm:iteration_tokens_total_bucket{engine="0",le="128.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.141224e+06" vllm:iteration_tokens_total_bucket{engine="0",le="256.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.142381e+06" vllm:iteration_tokens_total_bucket{engine="0",le="512.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.15563e+06" vllm:iteration_tokens_total_bucket{engine="0",le="1024.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.158366e+06" vllm:iteration_tokens_total_bucket{engine="0",le="2048.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.160035e+06" vllm:iteration_tokens_total_bucket{engine="0",le="4096.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.160951e+06" vllm:iteration_tokens_total_bucket{engine="0",le="8192.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.161328e+06" vllm:iteration_tokens_total_bucket{engine="0",le="16384.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.161551e+06" vllm:iteration_tokens_total_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.161608e+06" vllm:iteration_tokens_total_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.161608e+06" vllm:iteration_tokens_total_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660303833e+09" vllm:iteration_tokens_total_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.3555388e+07" vllm:num_preemptions_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.751256066030228e+09" vllm:num_preemptions_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:num_requests_running{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:num_requests_waiting{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:prompt_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660302384e+09" vllm:prompt_tokens_total{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.8448004e+07" vllm:request_decode_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1031.0" vllm:request_decode_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1755.0" vllm:request_decode_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2641.0" vllm:request_decode_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "3193.0" vllm:request_decode_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "3930.0" vllm:request_decode_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "4858.0" vllm:request_decode_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5855.0" vllm:request_decode_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "8843.0" vllm:request_decode_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "11018.0" vllm:request_decode_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "16042.0" vllm:request_decode_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "19932.0" vllm:request_decode_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20880.0" vllm:request_decode_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21085.0" vllm:request_decode_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21219.0" vllm:request_decode_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21287.0" vllm:request_decode_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21421.0" vllm:request_decode_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21459.0" vllm:request_decode_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21462.0" vllm:request_decode_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21463.0" vllm:request_decode_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_decode_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_decode_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_decode_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_decode_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660307164e+09" vllm:request_decode_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "234210.93255270552" vllm:request_generation_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_generation_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "374.0" vllm:request_generation_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "679.0" vllm:request_generation_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1399.0" vllm:request_generation_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2597.0" vllm:request_generation_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "4845.0" vllm:request_generation_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "8414.0" vllm:request_generation_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "10524.0" vllm:request_generation_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20497.0" vllm:request_generation_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21182.0" vllm:request_generation_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21386.0" vllm:request_generation_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21460.0" vllm:request_generation_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21462.0" vllm:request_generation_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21463.0" vllm:request_generation_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_generation_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_generation_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660303566e+09" vllm:request_generation_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.092714e+06" vllm:request_inference_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"} : "495.0" vllm:request_inference_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1110.0" vllm:request_inference_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1956.0" vllm:request_inference_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2567.0" vllm:request_inference_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "3493.0" vllm:request_inference_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "4363.0" vllm:request_inference_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5252.0" vllm:request_inference_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "8356.0" vllm:request_inference_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "10677.0" vllm:request_inference_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "15741.0" vllm:request_inference_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "19718.0" vllm:request_inference_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20769.0" vllm:request_inference_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21011.0" vllm:request_inference_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21174.0" vllm:request_inference_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21246.0" vllm:request_inference_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21415.0" vllm:request_inference_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21459.0" vllm:request_inference_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21462.0" vllm:request_inference_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21463.0" vllm:request_inference_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_inference_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_inference_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_inference_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_inference_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660305977e+09" vllm:request_inference_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "246084.0233336063" vllm:request_max_num_generation_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "374.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "679.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1399.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2597.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "4845.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "8414.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "10524.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20497.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21182.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21386.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21460.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21462.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21463.0" vllm:request_max_num_generation_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_max_num_generation_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_max_num_generation_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.751256066030409e+09" vllm:request_max_num_generation_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.092714e+06" vllm:request_params_max_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_params_max_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.0" vllm:request_params_max_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2.0" vllm:request_params_max_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "4.0" vllm:request_params_max_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "7.0" vllm:request_params_max_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "134.0" vllm:request_params_max_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "420.0" vllm:request_params_max_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "437.0" vllm:request_params_max_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "915.0" vllm:request_params_max_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_params_max_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_params_max_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660304518e+09" vllm:request_params_max_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "6.23433974e+08" vllm:request_params_n_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_params_n_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_params_n_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_params_n_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_params_n_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_params_n_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_params_n_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_params_n_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660304315e+09" vllm:request_params_n_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"} : "17829.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "18462.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"} : "19014.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "19415.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20007.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20328.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20485.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20996.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21206.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21366.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21423.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21437.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21463.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prefill_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prefill_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prefill_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.751256066030625e+09" vllm:request_prefill_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "11873.090780900791" vllm:request_prompt_tokens_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_prompt_tokens_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_prompt_tokens_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_prompt_tokens_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_prompt_tokens_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_prompt_tokens_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "52.0" vllm:request_prompt_tokens_bucket{engine="0",le="100.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "109.0" vllm:request_prompt_tokens_bucket{engine="0",le="200.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1283.0" vllm:request_prompt_tokens_bucket{engine="0",le="500.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "15752.0" vllm:request_prompt_tokens_bucket{engine="0",le="1000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "18203.0" vllm:request_prompt_tokens_bucket{engine="0",le="2000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "19866.0" vllm:request_prompt_tokens_bucket{engine="0",le="5000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20966.0" vllm:request_prompt_tokens_bucket{engine="0",le="10000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21314.0" vllm:request_prompt_tokens_bucket{engine="0",le="20000.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21420.0" vllm:request_prompt_tokens_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prompt_tokens_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_prompt_tokens_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660303123e+09" vllm:request_prompt_tokens_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.8447846e+07" vllm:request_queue_time_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21388.0" vllm:request_queue_time_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21388.0" vllm:request_queue_time_seconds_bucket{engine="0",le="0.8",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21388.0" vllm:request_queue_time_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21388.0" vllm:request_queue_time_seconds_bucket{engine="0",le="1.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21388.0" vllm:request_queue_time_seconds_bucket{engine="0",le="2.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21388.0" vllm:request_queue_time_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21396.0" vllm:request_queue_time_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21417.0" vllm:request_queue_time_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21429.0" vllm:request_queue_time_seconds_bucket{engine="0",le="15.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21439.0" vllm:request_queue_time_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21448.0" vllm:request_queue_time_seconds_bucket{engine="0",le="30.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21466.0" vllm:request_queue_time_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_queue_time_seconds_bucket{engine="0",le="50.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_queue_time_seconds_bucket{engine="0",le="60.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_queue_time_seconds_bucket{engine="0",le="120.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_queue_time_seconds_bucket{engine="0",le="240.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_queue_time_seconds_bucket{engine="0",le="480.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_queue_time_seconds_bucket{engine="0",le="960.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_queue_time_seconds_bucket{engine="0",le="1920.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_queue_time_seconds_bucket{engine="0",le="7680.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_queue_time_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_queue_time_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21472.0" vllm:request_queue_time_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660305698e+09" vllm:request_queue_time_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1093.2847901340574" vllm:request_success_created{engine="0",finished_reason="abort",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660302787e+09" vllm:request_success_created{engine="0",finished_reason="length",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.751256066030273e+09" vllm:request_success_created{engine="0",finished_reason="stop",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660302663e+09" vllm:request_success_total{engine="0",finished_reason="abort",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:request_success_total{engine="0",finished_reason="length",model_name="qwen2.5-72b-instruct-gptq-int4"} : "16.0" vllm:request_success_total{engine="0",finished_reason="stop",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21456.0" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.01",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.1",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.076944e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.2",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.081527e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.3",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.082037e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.4",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.082299e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.05",model_name="qwen2.5-72b-instruct-gptq-int4"} : "4.434515e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.082539e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.15",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.081115e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.025",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.075",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.058093e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="0.75",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.083005e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.083381e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.084889e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.085911e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="7.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.085911e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.085911e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.085911e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.085911e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="80.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.085911e+06" vllm:time_per_output_token_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.085911e+06" vllm:time_per_output_token_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5.085911e+06" vllm:time_per_output_token_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660305083e+09" vllm:time_per_output_token_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "234810.7781156646" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.001",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.01",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.1",model_name="qwen2.5-72b-instruct-gptq-int4"} : "10741.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.02",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.04",model_name="qwen2.5-72b-instruct-gptq-int4"} : "22.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.005",model_name="qwen2.5-72b-instruct-gptq-int4"} : "0.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "18368.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.06",model_name="qwen2.5-72b-instruct-gptq-int4"} : "2294.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.08",model_name="qwen2.5-72b-instruct-gptq-int4"} : "5674.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.25",model_name="qwen2.5-72b-instruct-gptq-int4"} : "17446.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="0.75",model_name="qwen2.5-72b-instruct-gptq-int4"} : "18837.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="1.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "19336.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="2.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20428.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="5.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "20938.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="7.5",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21079.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="10.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21155.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="20.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21393.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="40.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21458.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="80.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21473.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="160.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21473.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="640.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21473.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="2560.0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21473.0" vllm:time_to_first_token_seconds_bucket{engine="0",le="+Inf",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21473.0" vllm:time_to_first_token_seconds_count{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "21473.0" vllm:time_to_first_token_seconds_created{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "1.7512560660304754e+09" vllm:time_to_first_token_seconds_sum{engine="0",model_name="qwen2.5-72b-instruct-gptq-int4"} : "13614.933543205261" ”分析归类做这些数据

filetype
日月龙腾
  • 粉丝: 49
上传资源 快速赚钱