归一化
使得特征数值转换为0-1之间的数值,降低特征的特征重要性差异
公式:
newValue=oldValue−minmax−min newValue = \frac{oldValue - min}{max - min} newValue=max−minoldValue−min
实现代码
def autoNorm(dataSet):
'''
dataSet : 样本,array类型
数据归一化
'''
def autoNorm(dataSet):
'''
dataSet : 样本,array类型
数据归一化
'''
minVals = dataSet.min(axis=0)
maxVals = dataSet.max(axis=0)
ranges = maxVals - minVals # 得到分母值max - min
normDataSet = np.zeros(np.shape(dataSet)) # 构造一个数组,用来存放归一化后的特征值
m = dataSet.shape[0] # 获取样本数
normDataSet = dataSet - np.tile(minVals ,(m,1)) # 得到分子上的值oldValue−min
normDataSet = normDataSet/np.tile(ranges, (m, 1)) # 归一化后的数据
return normDataSet, ranges, minVals
# 调用
a = np.random.uniform(1,20,(6,3))
print(a)
autoNorm(a)
运行结果
[[14.6099423 18.04161268 12.25492127]
[14.50232783 9.64654342 17.23818201]
[ 8.3023203 14.94606789 13.94815491]
[ 1.14427536 17.33806978 3.60550385]
[13.51703253 19.27076143 1.17823014]
[ 3.110118 1.28159506 13.4893258 ]]
(array([[1. , 0.93167283, 0.68970886],
[0.99200823, 0.46499922, 1. ],
[0.53157745, 0.75959456, 0.79514091],
[0. , 0.89256358, 0.15113829],
[0.91883731, 1. , 0. ],
[0.14598925, 0. , 0.76657114]]),
array([13.46566694, 17.98916637, 16.05995187]),
array([1.14427536, 1.28159506, 1.17823014]))