Python 数据分析实战:从数据清洗到建模的完整流程

数据分析不仅是画图,更是一个完整的流程:获取数据 → 清洗整理 → 分析建模 → 结果可视化
本篇文章将用一个真实数据集,手把手带你完成从零到一的数据分析项目,工具链基于 pandas + matplotlib + scikit-learn


一、数据准备:读取 CSV 文件

以泰坦尼克号乘客生存数据为例(kaggle 公开数据集):


python

复制编辑

import pandas as pd df = pd.read_csv("titanic.csv") print(df.head())


二、数据初探:了解字段与缺失值


python

复制编辑

print(df.info()) print(df.describe()) print(df.isnull().sum())

常见问题:

  • 年龄有缺失?

  • 船舱 Cabin 信息严重缺失

  • 类别字段未编码


三、数据清洗与预处理

1. 填充缺失值


python

复制编辑

df['Age'].fillna(df['Age'].median(), inplace=True) df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)

2. 删除无用字段


python

复制编辑

df.drop(['Cabin', 'Ticket', 'Name'], axis=1, inplace=True)

3. 类别编码


python

复制编辑

df['Sex'] = df['Sex'].map({'male': 0, 'female': 1}) df = pd.get_dummies(df, columns=['Embarked'], drop_first=True)


四、数据分析与可视化

1. 生存率与性别关系


python

复制编辑

import seaborn as sns sns.barplot(x='Sex', y='Survived', data=df)

2. 年龄与生存关系


python

复制编辑

sns.histplot(df[df['Survived'] == 1]['Age'], bins=20, label='Survived', kde=True) sns.histplot(df[df['Survived'] == 0]['Age'], bins=20, label='Not Survived', kde=True)


五、建模与预测

使用 scikit-learn 训练逻辑回归模型进行预测。

1. 分割数据


python

复制编辑

from sklearn.model_selection import train_test_split X = df.drop("Survived", axis=1) y = df["Survived"] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

2. 建立模型


python

复制编辑

from sklearn.linear_model import LogisticRegression model = LogisticRegression(max_iter=1000) model.fit(X_train, y_train)

3. 模型评估


python

复制编辑

from sklearn.metrics import classification_report, accuracy_score y_pred = model.predict(X_test) print(accuracy_score(y_test, y_pred)) print(classification_report(y_test, y_pred))


六、总结

本项目完整展示了数据分析流程:

阶段技术
数据获取pandas
清洗预处理缺失处理、编码等
可视化matplotlib / seaborn
建模预测scikit-learn
模型评估精度 + 分类报告

你可以将此流程迁移到任何现实数据分析场景中,如:

  • 用户留存分析

  • 商品销量预测

  • 员工离职预测

  • 客户价值分析(RFM)等

https://bigu.wang

https://www.bigu.wang

https://binm.wang

https://www.binm.wang

https://bint.wang

https://www.bint.wang

https://biop.wang

https://www.biop.wang

https://bits.wang

https://www.bits.wang

https://bjqb.wang

https://www.bjqb.wang

https://bjsm.wang

https://www.bjsm.wang

https://bleo.wang

https://www.bleo.wang

https://ono.wang

https://www.ono.wang

https://onz.wang

https://www.onz.wang

https://opo.wang

https://www.opo.wang

https://osm.wang

https://www.osm.wang

https://osn.wang

https://www.osn.wang

https://ovi.wang

https://www.ovi.wang

https://oxq.wang

https://www.oxq.wang

https://oti.wang

https://www.oti.wang

https://owu.wang

https://www.owu.wang

https://piq.wang

https://www.piq.wang

https://qmi.wang

https://www.qmi.wang

https://qki.wang

https://www.qki.wang

https://ref.wang

https://www.ref.wang

https://sak.wang

https://www.sak.wang

https://sar.wang

https://www.sar.wang

https://sfa.wang

https://www.sfa.wang

https://sfe.wang

https://www.sfe.wang

https://sgo.wang

https://www.sgo.wang

https://sku.wang

https://www.sku.wang

https://ycxjz.cn

https://www.ycxjz.cn

https://bnbmhomes.cn

https://www.bnbmhomes.cn

https://jinjianzuche.com

https://www.jinjianzuche.com

https://ahswt.cn

https://www.ahswt.cn

https://szwandaj.cn

https://www.szwandaj.cn

https://psbest.cn

https://www.psbest.cn

https://shanghai-arnold.cn

https://www.shanghai-arnold.cn

https://zgsscw.com

https://www.zgsscw.com

https://shxqth.cn

https://www.shxqth.cn

https://wdxj.cn

https://www.wdxj.cn

https://jad168.com

https://www.jad168.com

https://ultratrailms.cn

https://www.ultratrailms.cn

https://tztsjd.cn

https://www.tztsjd.cn

https://csqcbx.cn

https://www.csqcbx.cn

https://qazit.cn

https://www.qazit.cn

https://ahzjyl.cn

https://www.ahzjyl.cn

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值