ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64‘)

本文介绍使用Python的Sklearn库进行数据预处理时遇到的缺失值处理问题及解决方法。针对SimpleImputer处理缺失值时出现的错误,通过调整missing_values参数为np.nan并改变策略(strategy)为most_frequent来解决含有NaN的问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

问题

刚开始学习 sklearn ,运行下面的代码时报错,

from sklearn.feature_extraction import DictVectorizer
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from  sklearn.preprocessing import  MinMaxScaler,StandardScaler ,Normalizer
from sklearn.impute import SimpleImputer
import  numpy as np

import jieba

def im():
    """
    缺失值处理
    :return:
    """
    im  = SimpleImputer(missing_values='NaN',strategy='mean')
    data = im.fit_transform([[1,2],[np.nan,3],[7,6]])
    print(data)

if __name__ == "__main__":   
    im()

运行报错,

ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’)。具体如下:

Traceback (most recent call last):
  File "E:/pycharm_workspace/matplotlibDemo/feature.py", line 104, in <module>
    im()
  File "E:/pycharm_workspace/matplotlibDemo/feature.py", line 95, in im
    data = im.fit_transform([[1,2],[np.nan,3],[7,6]])
  File "D:\skl3\lib\site-packages\sklearn\base.py", line 699, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "D:\skl3\lib\site-packages\sklearn\impute\_base.py", line 288, in fit
    X = self._validate_input(X, in_fit=True)
  File "D:\skl3\lib\site-packages\sklearn\impute\_base.py", line 262, in _validate_input
    raise ve
  File "D:\skl3\lib\site-packages\sklearn\impute\_base.py", line 255, in _validate_input
    copy=self.copy)
  File "D:\skl3\lib\site-packages\sklearn\base.py", line 421, in _validate_data
    X = check_array(X, **check_params)
  File "D:\skl3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "D:\skl3\lib\site-packages\sklearn\utils\validation.py", line 664, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "D:\skl3\lib\site-packages\sklearn\utils\validation.py", line 106, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
    
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Input contains NaN, infinity or a value too large for dtype('float64') 表示 Input 的值包含太长了。

解决方法

from sklearn.feature_extraction import DictVectorizer
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from  sklearn.preprocessing import  MinMaxScaler,StandardScaler ,Normalizer
from sklearn.impute import SimpleImputer
import  numpy as np

import jieba

def im():
    """
    缺失值处理
    :return:
    """
    im  = SimpleImputer(missing_values=np.nan,strategy='most_frequent')
    data = im.fit_transform([[1,2],[np.nan,3],[7,6]])
    print(data)

if __name__ == "__main__":   
    im()

运行结果报错:

[[1. 2.]
 [1. 3.]
 [7. 6.]]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值