线性分类的Jupyter实践

最新推荐文章于 2021-11-05 20:01:04 发布

原创最新推荐文章于 2021-11-05 20:01:04 发布 · 215 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#java #机器学习 #人工智能

学习专栏收录该内容

70 篇文章

订阅专栏

本文通过Iris数据集，利用LogisticRegression进行线性多分类实践，分别选取萼片和花瓣的长宽作为特征进行分类，详细介绍了数据处理、图形绘制和模型预测的步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

线性分类的Jupyter实践

一、简介
二、实现线性多分类
- 1、取萼片的长宽作为特征进行分类
- 2、取花瓣的长宽作特征分类
三、参考资料

)

一、简介

Iris数据集（鸢尾花卉数据集）

是常用的分类实验数据集，由Fisher在1936年收集整理。
多重变量分析的数据集，包含150个数据样本，分为3类（Setosa,Versicolour,Virginica），每类50个数据，每个数据包含4个属性（花萼长度，花萼宽度，花瓣长度，花瓣宽度）。
可通过4个属性预测鸢尾花卉属于三个种类中的哪一类。

LogisticRegression（逻辑回归）

逻辑回归（Logistic Regression）与线性回归（Linear Regression）都是一种广义线性模型（generalized linear model）。
逻辑回归（Logistic Regression）是用于处理因变量为分类变量的回归问题，常见的是二分类或二项分布问题，也可以处理多分类问题。

二、实现线性多分类

1、取萼片的长宽作为特征进行分类

（1）导入相关包：

import numpy as np
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn import datasets
from sklearn import preprocessing
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

（2）获取数据集

df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=0)
x = df.values[:, :-1]
y = df.values[:, -1]
print('x = \n', x)
print('y = \n', y)
le = preprocessing.LabelEncoder()
le.fit(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])
print(le.classes_)
y = le.transform(y)
print('Last Version, y = \n', y)

在这里插入图片描述

（3）数据处理

x = x[:, :2] 
print(x)
print(y)
x = StandardScaler().fit_transform(x)
lr = LogisticRegression()   # Logistic回归模型
lr.fit(x, y.ravel())        # 根据数据[x,y]，计算回归参数

在这里插入图片描述

（4）绘制图形

N, M = 500, 500     # 横纵各采样多少个值
x1_min, x1_max = x[:, 0].min(), x[:, 0].max()   # 第0列的范围
x2_min, x2_max = x[:, 1].min(), x[:, 1].max()   # 第1列的范围
t1 = np.linspace(x1_min, x1_max, N)
t2 = np.linspace(x2_min, x2_max, M)
x1, x2 = np.meshgrid(t1, t2)                    # 生成网格采样点
x_test = np.stack((x1.flat, x2.flat), axis=1)   # 测试点

cm_light = mpl.colors.ListedColormap(['#77E0A0', '#FF8080', '#A0A0FF'])
cm_dark = mpl.colors.ListedColormap(['g', 'r', 'b'])
y_hat = lr.predict(x_test)       # 预测值
y_hat = y_hat.reshape(x1.shape)                 # 使之与输入的形状相同
plt.pcolormesh(x1, x2, y_hat, cmap=cm_light)     # 预测值的显示
plt.scatter(x[:, 0], x[:, 1], c=y.ravel(), edgecolors='k', s=50, cmap=cm_dark)    
plt.xlabel('petal length')
plt.ylabel('petal width')
plt.xlim(x1_min, x1_max)
plt.ylim(x2_min, x2_max)
plt.grid()
plt.show()

在这里插入图片描述

（5）预测模型

y_hat = lr.predict(x)
y = y.reshape(-1)
result = y_hat == y
print(y_hat)
print(result)
acc = np.mean(result)
print('准确度: %.2f%%' % (100 * acc))

在这里插入图片描述

2、取花瓣的长宽作特征分类

代码同上，做处理数据处的修改

x = x[:, 2:]    #原本为x = x[:, :2]，后改为x = x[:, 2:]
print(x)
print(y)
x = StandardScaler().fit_transform(x)
lr = LogisticRegression()   # Logistic回归模型
lr.fit(x, y.ravel())        # 根据数据[x,y]，计算回归参数