Python机器学习笔记（二十五、算法链与管道）

FreedomLeo1

于 2025-05-16 20:28:39 发布

阅读量273

点赞数

CC 4.0 BY-SA版权

分类专栏： Python机器学习文章标签：机器学习算法 python make_pipeline Pipeline named_steps属性

本文链接：https://blog.csdn.net/FreedomLeo1/article/details/148015647

Python机器学习专栏收录该内容

25 篇文章 ¥19.90 ¥99.00

订阅专栏

超级会员免费看

对于许多机器学习算法，特定数据表示非常重要。首先对数据进行缩放，然后手动合并特征，再利用无监督机器学习来学习特征。因此，大多数机器学习应用不仅需要应用单个算法，而且还需要将许多不同的处理步骤和机器学习模型链接在一起。Pipeline类可以用来简化构建变换和模型链的过程。将Pipeline和GridSearchCV结合起来，可以同时搜索所有处理步骤中的参数。

举例：使用MinMaxScaler对cancer数据集进行预处理，提高核SVM在cancer数据集上的性能，实现划分数据、计算最小值和最大值、缩放数据与训练SVM：

from sklearn.svm import SVC 
from sklearn.datasets import load_breast_cancer 
from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import MinMaxScaler 
# 加载并划分数据 
cancer = load_breast_cancer() 
X_train, X_test, y_train, y_test = train_test_split( 
    cancer.data, cancer.target, random_state=0) 
# 计算训练数据的最小值和最大值 
scaler = MinMaxScaler().fit